### Advanced Python Training at ZeOmega - Day 3

Feb 07-11, 2022<br>
09:30 AM - 01:00 PM

[Anand Chitipothu](https://pipal.in/trainers/anand)

These notes are available online at https://bit.ly/zeomega-py22

© Pipal Academy LLP

[Home](.) | [Day 1](day1.html) | [Day 2](day2.html) | **Day 3** | [Day 4](day4.html) | [Day 5](day5.html) 

[Download this notebook](day3.ipynb)

## Topics
* Writing Custom Modules
* Testing Python Programs
* String Formatting
* Working with Files
* Sets & Dictionaries

## Writing Custom Modules

In [3]:
%%file mymodule.py
print("BEGIN mymodule")

x = 2

def add(a, b):
    return a+b

print(add(3, 4))
print("END mymodule")

Overwriting mymodule.py


In [4]:
!python mymodule.py

BEGIN mymodule
7
END mymodule


Let's say we want to reuse the add function written in mymodule.

In [5]:
%%file a.py
import mymodule
print(mymodule.add(10, 20))

Writing a.py


In [6]:
!python a.py

BEGIN mymodule
7
END mymodule
30


We can do the same without creating a new `a.py` file.

In [8]:
!python -c "import mymodule; print(mymodule.add(10, 20))"

BEGIN mymodule
7
END mymodule
30


### The `__name__` magic variable

In [9]:
%%file mymodule2.py
x = 2

def add(a, b):
    return a+b

print(add(3, 4))
print(__name__)

Writing mymodule2.py


In [10]:
!python mymodule2.py

7
__main__


In [12]:
!python -c "import mymodule2"

7
mymodule2


When the file is executed as a script, the value of `__name__` is set to `"__main__"`.

When the file is imported as a module, the value of `__name__` is set to the module name.

In [15]:
%%file mymodule3.py
x = 2

def add(a, b):
    return a+b

if __name__ == "__main__":
    print("you are running this file as a script")
    print(add(3, 4))  
else:
    print("you are importing this file as a module")
    

Overwriting mymodule3.py


In [16]:
!python mymodule3.py

you are running this file as a script
7


In [17]:
!python -c "import mymodule3"

you are importing this file as a module


In [18]:
%%file mymodule4.py
x = 2

def add(a, b):
    return a+b

if __name__ == "__main__":
    print(add(3, 4))  


Writing mymodule4.py


In [19]:
!python mymodule4.py

7


In [21]:
!python -c "import mymodule4; print(mymodule4.add(10, 20))"

30


#### Example: Square Module

Let's write a square program which can also be used as a module.

In [23]:
%%file sq.py

import sys

def square(x):
    return x*x

def main():
    n = int(sys.argv[1])
    print(square(n))
    
if __name__ == "__main__":
    main()

Writing sq.py


In [25]:
!python sq.py 5

25


In [26]:
!python -c "import sq; print(sq.square(10))"

100


In [27]:
%%file sum_of_squares.py
import sq
print(sq.square(3) + sq.square(4))

Writing sum_of_squares.py


In [28]:
!python sum_of_squares.py

25


### Docstrings

In [29]:
!pydoc os.listdir

Help on built-in function listdir in os:

ooss..lliissttddiirr = listdir(...)
    listdir(path) -> list_of_strings
    
    Return a list containing the names of the entries in the directory.
    
        path: path of directory to list
    
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.



In [30]:
help("os.listdir")

Help on built-in function listdir in os:

os.listdir = listdir(path=None)
    Return a list containing the names of the files in the directory.
    
    path can be specified as either str, bytes, or a path-like object.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    If path is None, uses the path='.'.
    On some platforms, path may also be specified as an open file descriptor;\
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.
    
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.



In [32]:
!pydoc sq

Help on module sq:

NNAAMMEE
    sq

FFIILLEE
    /Users/anand/trainings/2022/zeomega-python/sq.py

FFUUNNCCTTIIOONNSS
    mmaaiinn()
    
    ssqquuaarree(x)




In [33]:
!pydoc sq.square

Help on function square in sq:

ssqq..ssqquuaarree = square(x)



In [36]:
%%file sq.py
"""
The sq module.

Provides the square function.

USAGE:

    $ python sq.py 5 
    25
"""
import sys

def square(x):
    """Computes the square of a numnber.
    
        >>> square(4)
        16
    """
    return x*x

def main():
    n = int(sys.argv[1])
    print(square(n))
    
if __name__ == "__main__":
    main()

Overwriting sq.py


In [38]:
!pydoc sq#

Help on module sq:

NNAAMMEE
    sq - The sq module.

FFIILLEE
    /Users/anand/trainings/2022/zeomega-python/sq.py

DDEESSCCRRIIPPTTIIOONN
    Provides the square function.
    
    USAGE:
    
        $ python sq.py 5 
        25

FFUUNNCCTTIIOONNSS
    mmaaiinn()
    
    ssqquuaarree(x)
        Computes the square of a numnber.
        
        >>> square(4)
        16




**Problem:** Implememnt a `cube` module.

## Testing Python Programs

In [42]:
def square(x):
    return x*x

In [43]:
print(square(4))

17


In [45]:
def square(x):
    return x*x+1

if square(4) == 16:
    print("PASS")
else:
    print("FAIL")

FAIL


In [48]:
def square(x):
    return x*x+1

def test_square():
    assert square(0) == 0
    assert square(4) == 16    

In [49]:
test_square()

AssertionError: 

In [59]:
%%file sq2.py

def square(x):
    return x*x

def sum_of_squares(x, y):
    return square(x) + square(y)

def test_square():
    assert square(0) == 0
    assert square(4) == 16    
    
def test_sum_of_squares():
    assert sum_of_squares(0, 0) == 0
    assert sum_of_squares(3, 4) == 24 # fail

Overwriting sq2.py


In [60]:
!py.test sq2.py

platform darwin -- Python 3.10.2, pytest-7.0.0, pluggy-1.0.0
rootdir: /Users/anand/trainings/2022/zeomega-python
plugins: anyio-3.5.0
collected 2 items                                                              [0m

sq2.py [32m.[0m[31mF[0m[31m                                                                [100%][0m

[31m[1m_____________________________ test_sum_of_squares ______________________________[0m

    [94mdef[39;49;00m [92mtest_sum_of_squares[39;49;00m():
        [94massert[39;49;00m sum_of_squares([94m0[39;49;00m, [94m0[39;49;00m) == [94m0[39;49;00m
>       [94massert[39;49;00m sum_of_squares([94m3[39;49;00m, [94m4[39;49;00m) == [94m24[39;49;00m [90m# fail[39;49;00m
[1m[31mE       assert 25 == 24[0m
[1m[31mE        +  where 25 = sum_of_squares(3, 4)[0m

[1m[31msq2.py[0m:14: AssertionError
FAILED sq2.py::test_sum_of_squares - assert 25 == 24


In [61]:
!py.test sq2.py -v

platform darwin -- Python 3.10.2, pytest-7.0.0, pluggy-1.0.0 -- /Users/anand/.venv/python310/bin/python
cachedir: .pytest_cache
rootdir: /Users/anand/trainings/2022/zeomega-python
plugins: anyio-3.5.0
collected 2 items                                                              [0m

sq2.py::test_square [32mPASSED[0m[32m                                               [ 50%][0m
sq2.py::test_sum_of_squares [31mFAILED[0m[31m                                       [100%][0m

[31m[1m_____________________________ test_sum_of_squares ______________________________[0m

    [94mdef[39;49;00m [92mtest_sum_of_squares[39;49;00m():
        [94massert[39;49;00m sum_of_squares([94m0[39;49;00m, [94m0[39;49;00m) == [94m0[39;49;00m
>       [94massert[39;49;00m sum_of_squares([94m3[39;49;00m, [94m4[39;49;00m) == [94m24[39;49;00m [90m# fail[39;49;00m
[1m[31mE       assert 25 == 24[0m
[1m[31mE         +25[0m
[1m[31mE         -24[0m

[1m[31msq2.py[0m:14: Assertion

Let's try another example.

In [66]:
%%file lastword.py
def get_last_word(sentence):
    if not sentence:
        return ""
    return sentence.split()[-1]

def test_get_last_word():
    assert get_last_word("") == ""
    assert get_last_word("one two three") == 'three'

Overwriting lastword.py


In [67]:
!py.test lastword.py

platform darwin -- Python 3.10.2, pytest-7.0.0, pluggy-1.0.0
rootdir: /Users/anand/trainings/2022/zeomega-python
plugins: anyio-3.5.0
collected 1 item                                                               [0m

lastword.py [32m.[0m[32m                                                            [100%][0m



Another common approach people follow is to separate source and test files.

In [68]:
%%file sq2.py

def square(x):
    return x*x

def sum_of_squares(x, y):
    return square(x) + square(y)

Overwriting sq2.py


In [72]:
%%file test_sq2.py
from sq2 import square, sum_of_squares

def test_square():
    assert square(0) == 0
    assert square(4) == 16    
    
def test_sum_of_squares():
    assert sum_of_squares(0, 0) == 0
    assert sum_of_squares(3, 4) == 25

Overwriting test_sq2.py


In [73]:
!pytest test_sq2.py -v

platform darwin -- Python 3.10.2, pytest-7.0.0, pluggy-1.0.0 -- /Users/anand/.venv/python310/bin/python
cachedir: .pytest_cache
rootdir: /Users/anand/trainings/2022/zeomega-python
plugins: anyio-3.5.0
collected 2 items                                                              [0m

test_sq2.py::test_square [32mPASSED[0m[32m                                          [ 50%][0m
test_sq2.py::test_sum_of_squares [32mPASSED[0m[32m                                  [100%][0m



In [74]:
!pytest -v

platform darwin -- Python 3.10.2, pytest-7.0.0, pluggy-1.0.0 -- /Users/anand/.venv/python310/bin/python
cachedir: .pytest_cache
rootdir: /Users/anand/trainings/2022/zeomega-python
plugins: anyio-3.5.0
[1mcollecting ... [0m[1mcollected 2 items                                                              [0m

test_sq2.py::test_square [32mPASSED[0m[32m                                          [ 50%][0m
test_sq2.py::test_sum_of_squares [32mPASSED[0m[32m                                  [100%][0m



**Problem:** Write a test case for your `cube` module

## String Formatting

There are three different ways to do string formating in Python.

1. f-strings
2. The `format` method
3. The legacy string format using `%`

### The f-strings

In [75]:
name = "Python"
for i in range(5):
    print(str(i) + ": " + name)

0: Python
1: Python
2: Python
3: Python
4: Python


In [76]:
name = "Python"
for i in range(5):
    print(f"{i}: {name}")

0: Python
1: Python
2: Python
3: Python
4: Python


In [77]:
name = "Python"
for i in range(5):
    print(f"{i+1}: {name}")

1: Python
2: Python
3: Python
4: Python
5: Python


### The `format` method

In [79]:
name = "Python"
for i in range(5):
    print("{i}: {name}".format(i=i+1, name=name))

1: Python
2: Python
3: Python
4: Python
5: Python


In [80]:
i = 0
name = "Python"

In [83]:
# passing arguments by name
print("{i}: {name}".format(i=i+1, name=name))

1: Python


In [82]:
# passing arguments by position
print("{}: {}".format(i, name))

0: Python


In [84]:
print("{0}: {1}".format(i, name))

0: Python


In [85]:
template = """
Hello {name},

Welcome to the Advanced Python Course. 

Your username is {username}.

"""

In [87]:
msg = template.format(name="Anand", username="anand")
print(msg)


Hello Anand,

Welcome to the Advanced Python Course. 

Your username is anand.




### The legacy string format using `%`

In [89]:
i = 0
name = "Python"
msg = "%d: %s" % (i, name)
print(msg)

0: Python


In [91]:
i = 0
name = "Python"
msg = "%(index)d: %(name)s" % {"index": i, "name": name}
print(msg)

0: Python


The string formatting is very handy. Let's look at an example.

In [93]:
def make_link(url):
    return f'<a href="{url}">{url}</a>'

In [94]:
link = make_link("https://google.com/")

In [95]:
print(link)

<a href="https://google.com/">https://google.com/</a>


## Working with Files

In [96]:
%%file three.txt
one
two
three

Writing three.txt


In [100]:
f = open("three.txt")

In [101]:
f.read() # all the contents of the file

'one\ntwo\nthree\n'

In [102]:
f.read() 

''

In [103]:
print(open("three.txt").read())

one
two
three



The other common way to read a file is by reading lines.

In [104]:
open("three.txt").readlines()

['one\n', 'two\n', 'three\n']

In [105]:
for line in open("three.txt").readlines():
    print(line)

one

two

three



There is a new-line character in each line and print is adding one more. We need stop one of these.

In [106]:
for line in open("three.txt").readlines():
    print(line, end="") # ask print to not add a new line

one
two
three


In [107]:
for line in open("three.txt").readlines():
    print(line.strip("\n")) # remove the newline char from each line

one
two
three


In [175]:
# we can just loop over the file object, which goes over the lines 
for line in open("three.txt"):
    print(line.strip("\n")) # remove the newline char from each line

one
two
three


**Problem:** Write a program `cat.py` that takes a filename as command-line argument and prints all the contents of that file.

```
$ python cat.py three.txt
one
two
three
```

#### Example: word count

Unix has a word count program, wc.

In [108]:
%%file numbers.txt
1 one
2 two
3 three
4 four
5 five

Writing numbers.txt


In [109]:
!wc numbers.txt

       5      10      34 numbers.txt


What does it take to implement this in Python?

In [117]:
%%file wc.py
"""Implements the wc command of unix.

Prints the line count, wordcount, char count and the filename.

USAGE:
    python wc.py filename
"""
import sys

def linecount(f):
    return len(open(f).readlines())

def wordcount(f):
    return len(open(f).read().split())

def charcount(f):
    return len(open(f).read())

def main():
    f = sys.argv[1]
    print(linecount(f), wordcount(f), charcount(f), f)
    
if __name__ == "__main__":
    main()

Overwriting wc.py


In [118]:
!python wc.py numbers.txt

5 10 34 numbers.txt


**Problem:** Write a program `reverse_lines.py` that prints the lines in a file in the reverse order. The last line will be printed at the beginning etc.

```
$ python reverse-lines.py number.txt
5 five
4 four
3 three
2 two
1 one
```

**Problem:** Write a program reverse-words.py that prints words in each line in reverse order.

```
$ python reverse-words.py numbers.txt
one 1
two 2
three 3
four 4
five 5
```

In [119]:
%%file five.txt
five
five four
five four three
five four three two
five four three two one
zero

Writing five.txt


```
$ python reverse-words.py numbers.txt
five
four five
three four five
two three four five
one two three four five
zero
```

#### Working with binary files

In [121]:
open("numbers.txt")

<_io.TextIOWrapper name='numbers.txt' mode='r' encoding='UTF-8'>

In [122]:
open("numbers.txt").read()

'1 one\n2 two\n3 three\n4 four\n5 five\n'

In [123]:
open("numbers.txt", 'rb').read() # gives bytes

b'1 one\n2 two\n3 three\n4 four\n5 five\n'

In [124]:
%%file ka.txt
ಅ ಆ ಇ ಈ

Writing ka.txt


In [125]:
open("ka.txt").read()

'ಅ ಆ ಇ ಈ\n'

In [126]:
open("ka.txt", 'rb').read()

b'\xe0\xb2\x85 \xe0\xb2\x86 \xe0\xb2\x87 \xe0\xb2\x88\n'

In [127]:
!wget https://www.python.org/static/community_logos/python-logo-master-v3-TM-flattened.png -O python.png

--2022-02-09 12:22:51--  https://www.python.org/static/community_logos/python-logo-master-v3-TM-flattened.png
Resolving www.python.org (www.python.org)... 151.101.152.223
Connecting to www.python.org (www.python.org)|151.101.152.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11155 (11K) [image/png]
Saving to: ‘python.png’


2022-02-09 12:22:52 (1.25 MB/s) - ‘python.png’ saved [11155/11155]



In [129]:
!ls -l *.png

-rw-r--r--  1 anand  staff  11155 Feb  5 03:40 python.png


In [131]:
data = open("python.png", 'rb').read()

In [132]:
data[:100]

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x02Y\x00\x00\x00\xcb\x08\x06\x00\x00\x00]\xc9\x86&\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00\x00\tpHYs\x00\x00\n\xf0\x00\x00\n\xf0\x01B\xac4\x98\x00\x00\x00\x16tEXtCreation Time\x0006/05/04'

In [133]:
len(data)

11155

### Writing Files

Files can be opened in write mode by specifying `"w"` or `"wt"` as mode. For writing binary files, it will be `"wb"`.

In [134]:
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()

In [135]:
open("a.txt").read()

'one\ntwo\n'

Whenever we open a file in a write mode, all the contents will be overwritten. 

It is important to remember to close the file as the contents may not written to the disk until closed.

In [139]:
f = open("b.txt", "w")
f.write("1\n")
f.write("2\n")
f.write("3\n")
# not closed the file yet

2

In [140]:
open("b.txt").read()

''

In [141]:
f.close()

In [142]:
open("b.txt").read()

'1\n2\n3\n'

Sometimes we want to append something at the end of a file. In that case, we can open the file in append mode (`"a"`).

In [143]:
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()

In [144]:
f = open("a.txt", "a")
f.write("three\n")
f.write("four\n")
f.close()

In [145]:
print(open("a.txt").read())

one
two
three
four



#### The `with` statement

The `with` statement is handy when writing to files as it takes care of closing the file automatically.

In [146]:
with open("a.txt", "w") as f:
    f.write("one\n")
    f.write("two\n")
# the file f gets closes automatically

In [147]:
open("a.txt").read()

'one\ntwo\n'

In [151]:
with open("a.txt", "w") as f:
    f.write("one\n")
    f.write("two\n")
    print(f.closed)
# the file f gets closes automatically
print(f.closed)

False
True


**Problem:** write a program `copyfile.py` to copy contents of one file to another. The program should accept a source file and a destination file as arguments and copy the source to the destination.

```
$ python copyfile.py a.txt b.txt
```

Note: Don't call this file `copy.py` as that interfere with a standard library module with the same name.


#### Special files: `stdin`, `stdout` and `stderr` 

The default print goes to stdout.

In [152]:
import sys
sys.stdout.write("hello")

hello

5

In [153]:
sys.stderr.write("hello")

hello

5

In [154]:
print("hello", file=sys.stdout) # the default way

hello


In [155]:
print("hello", file=sys.stderr)

hello


In [163]:
%%file sort.py
import sys
lines = sys.stdin.readlines()
for line in sorted(lines):
    print(line, end="")

Overwriting sort.py


In [164]:
!ls | python sort.py

Makefile
Untitled.html
Untitled.ipynb
Untitled1.html
Untitled1.ipynb
__pycache__
a.py
a.txt
args.py
args2.py
b.txt
cat.py
date.py
day1.html
day1.ipynb
day2.html
day2.ipynb
day3.html
day3.ipynb
echo.py
echo2.py
files
five.txt
hello.py
index.html
index.ipynb
ka.txt
lastword.py
ls.py
mymodule.py
mymodule2.py
mymodule3.py
mymodule4.py
notes
numbers.txt
push
python.png
readinput.py
sort.py
sq.py
sq.pyc
sq2.py
square.py
sum_of_squares.py
test_sq2.py
three.txt
wc.py
working-with-files.html
working-with-files.ipynb


### Example: Parsing CSV files

In [165]:
%%file a.csv
A1,B1,C1
A2,B2,C2
A3,B3,C3
A4,B4,C4

Writing a.csv


In [166]:
open("a.csv").readlines()

['A1,B1,C1\n', 'A2,B2,C2\n', 'A3,B3,C3\n', 'A4,B4,C4\n']

In [168]:
result = []
for line in open("a.csv").readlines():
    result.append(line.strip("\n").split(","))

In [169]:
result

[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]

In [170]:
# [expr for var in alist]
[line.strip("\n").split(",") for line in open("a.csv").readlines()]

[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]

In [171]:
[line.strip("\n") for line in open("a.csv").readlines()]

['A1,B1,C1', 'A2,B2,C2', 'A3,B3,C3', 'A4,B4,C4']

In [172]:
[line.strip("\n").split(",") for line in open("a.csv").readlines()]

[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]

What if we want to ignore comments and empty lines in the csv file?

In [173]:
%%file b.csv
# this is the first line
A1,B1,C1
# second line
 A2,B2,C2

A3,B3,C3
A4,B4,C4
# THE END

Writing b.csv


In [179]:
[line.strip("\n").split(",") 
    for line in open("b.csv")
    if not line.startswith("#") and line.strip() != ""]

[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]

What if we want to support any delimiter?

In [180]:
def read_csv(filename, delimiter=","):
    return [line.strip("\n").split(delimiter) 
        for line in open(filename)
        if not line.startswith("#") and line.strip() != ""]    

In [181]:
read_csv("b.csv")

[['A1', 'B1', 'C1'],
 ['A2', 'B2', 'C2'],
 ['A3', 'B3', 'C3'],
 ['A4', 'B4', 'C4']]

In [184]:
users = read_csv("/etc/passwd", delimiter=":")

for user in users:
    print(user[0], user[4])

nobody Unprivileged User
root System Administrator
daemon System Services
_uucp Unix to Unix Copy Protocol
_taskgated Task Gate Daemon
_networkd Network Services
_installassistant Install Assistant
_lp Printing Services
_postfix Postfix Mail Server
_scsd Service Configuration Service
_ces Certificate Enrollment Service
_appstore Mac App Store Service
_mcxalr MCX AppLaunch
_appleevents AppleEvents Daemon
_geod Geo Services Daemon
_devdocs Developer Documentation
_sandbox Seatbelt
_mdnsresponder mDNSResponder
_ard Apple Remote Desktop
_www World Wide Web Server
_eppc Apple Events User
_cvs CVS Server
_svn SVN Server
_mysql MySQL Server
_sshd sshd Privilege separation
_qtss QuickTime Streaming Server
_cyrus Cyrus Administrator
_mailman Mailman List Server
_appserver Application Server
_clamav ClamAV Daemon
_amavisd AMaViS Daemon
_jabber Jabber XMPP Server
_appowner Application Owner
_windowserver WindowServer
_spotlight Spotlight
_tokend Token Daemon
_securityagent SecurityAgent
_calendar

**Q:** Can you explain how to read/write binary data once again?

In [186]:
text = '\u0c05\u0c06\u0c07\u0c08'

In [188]:
text.encode('utf-8')

b'\\u0c05\\u0c06\\u0c07\\u0c08\xe0\xb0\x85\xe0\xb0\x86\xe0\xb0\x87\xe0\xb0\x88'

In [189]:
data = text.encode('utf-8')

In [190]:
data

b'\xe0\xb0\x85\xe0\xb0\x86\xe0\xb0\x87\xe0\xb0\x88'

In [195]:
data2 = b'\xe0\xe0'

In [196]:
with open("2bytes.data", "wb") as f:
    f.write(data2)

In [197]:
!ls -l 2bytes.data

-rw-r--r--  1 anand  staff  2 Feb  9 16:18 2bytes.data


In [198]:
# opening it in textmode will fail
open("2bytes.data").read()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

## Dictionaries

In [199]:
d = {"x": 1, "y": 2, "z": 3}

In [200]:
d["x"]

1

In [201]:
d['x'] = 11

In [202]:
d

{'x': 11, 'y': 2, 'z': 3}

From Python 3.7 onwards, dictionaries maintain their insertion order, before that dictionaries were unordered.

How to use dictionaries?

1. as a record
2. as a database/lookup-table

In [203]:
# as a record
person = {
    "name": "Alice",
    "email": "alice@example.com",
    "phone": 1234
}

when using dictionary as a record, we know all the possible keys.

In [204]:
# as a lookup-table
phone_numbers = {
    "alice": 1234,
    "bob": 2345
}

when using dictionary as a lookup table, the keys are not known upfront.

## common operations on dictionries

In [205]:
phone_numbers["alice"]

1234

In [206]:
"alice" in phone_numbers

True

In [207]:
"dave" in phone_numbers

False

In [208]:
"dave" not in phone_numbers

True

There are some handy methods. 

**`get`**

In [210]:
name = "dave"

In [211]:
phone_numbers.get(name, "-")

'-'

In [212]:
phone_numbers.get("alice", "-")

1234

If the key is present in the dictioany, it gives that value, if not it returns the second argument. 

**`setdefault`**

The `setdefault` works like `get`, but also adds an entry when the key is missing.

In [213]:
d = {"x": 1, "y": 2}

In [214]:
d.get("z", 0)

0

In [215]:
d

{'x': 1, 'y': 2}

In [216]:
d.setdefault("z", 0)

0

In [217]:
d

{'x': 1, 'y': 2, 'z': 0}

**`update`**

In [218]:
d1 = {"x": 1, "y": 2}
d2 = {"x": 11, "z": 33}

In [219]:
d1.update(d2)

In [220]:
d1

{'x': 11, 'y': 2, 'z': 33}

### Iterating over dictionaries

In [221]:
d = {"x": 1, "y": 2, "z": 3}

In [222]:
d.keys()

dict_keys(['x', 'y', 'z'])

In [223]:
d.values()

dict_values([1, 2, 3])

In [224]:
d.items()

dict_items([('x', 1), ('y', 2), ('z', 3)])

Iterating over the keys:

In [225]:
for k in d.keys():
    print(k)

x
y
z


In [226]:
for k in d: # iterating over a dictionary goes over its keys
    print(k)

x
y
z


In [227]:
for k in d:
    print(k, d[k])

x 1
y 2
z 3


Iterate over values:

In [228]:
for v in d.values():
    print(v)

1
2
3


Iterate over the key-value pairs:

In [229]:
for k, v in d.items():
    print(k, v)

x 1
y 2
z 3


#### Example: Marks of a student

In [230]:
marks = {
    "english": 89,
    "maths": 87,
    "science": 45
}

In [231]:
marks

{'english': 89, 'maths': 87, 'science': 45}

In [232]:
for subject, score in marks.items():
    print(subject, score)

english 89
maths 87
science 45


In [233]:
sum(marks.values())

221

In [234]:
for subject, score in marks.items():
    print(subject, score)
print("-----")
print("Total", sum(marks.values()))

english 89
maths 87
science 45
-----
Total 221


A small puzzle now. 

In which subject the student got the highest marks?

In [235]:
marks

{'english': 89, 'maths': 87, 'science': 45}

In [236]:
max(marks.values())

89

In [237]:
marks.values()

dict_values([89, 87, 45])

In [238]:
max(marks.keys())

'science'

In [239]:
marks.keys()

dict_keys(['english', 'maths', 'science'])

In [243]:
def get_score(subject):
    return marks[subject]

max(marks.keys(), key=get_score)

'english'

In [244]:
marks

{'english': 89, 'maths': 87, 'science': 45}

In [242]:
marks['maths']

87

#### Example: Word Frequency

Write a program to compute the frequency of words in a file.

In [245]:
%%file words.txt
five
five four
five four three
five four three two
five four three two one
zero
ten zero

Writing words.txt


In [255]:
%%file wordfreq.py
"""Program to compute frequency of words in the given file.

USAGE: python wordfreq.py words.txt
"""
import sys

def read_words(filename):
    """Reads and returns all words in the given filename as a list.
    """
    return open(filename).read().split()

def wordfreq(words):
    """Takes a list of words as argument and computes 
    frquency of each unique word in those words as a dictionary.
    
        >>> wordfreq([])
        {}
        >>> wordfreq(['a', 'b', a'])
        {"a": 2, "b": 1}
    """
    freq = {}
    for w in words:
#         if w in freq:
#             freq[w] = freq[w] + 1
#         else:
#             freq[w] = 1  
        freq[w] = freq.get(w, 0) + 1
    return freq

def print_freq(freq):
    """Prints frequency of words in a nice readable format.
    """
    # FIXME
    print(freq)

def main():
    filename = sys.argv[1]
    words = read_words(filename)
    freq = wordfreq(words)
    print_freq(freq)
    
if __name__ == "__main__":
    main()

Overwriting wordfreq.py


In [256]:
!python wordfreq.py words.txt

{'five': 5, 'four': 4, 'three': 3, 'two': 2, 'one': 1, 'zero': 2, 'ten': 1}


In [253]:
!touch empty.txt

In [254]:
!python wordfreq.py empty.txt

{}


**Problem:** Improve the above program to print the frequency one word per line, as shown below. It doesn't matter which order they appear.

```
$ python wordfreq.py words.txt
zero 2
five 5
four 4
three 3
two 2
one 1
ten 1
```

**Problem:** Improve the above program further to print the words sorted by their count, with most frequenct word on the top.

```
$ python wordfreq.py words.txt
five 5
four 4
three 3
two 2
zero 2
one 1
ten 1
```