### Advanced Python Training at ZeOmega - Day 4

Feb 07-11, 2022<br>
09:30 AM - 01:00 PM

[Anand Chitipothu](https://pipal.in/trainers/anand)

These notes are available online at https://bit.ly/zeomega-py22

© Pipal Academy LLP

[Home](.) | [Day 1](day1.html) | [Day 2](day2.html) | [Day 3](day3.html) | **Day 4** | [Day 5](day5.html) 

[Download this notebook](day4.ipynb)

## Dictionaries (Contd...)

**Problem:** Improve the above program to print the frequency one word per line, as shown below. It doesn't matter which order they appear.

```
$ python wordfreq.py words.txt
zero 2
five 5
four 4
three 3
two 2
one 1
ten 1
```

**Problem:** Improve the above program further to print the words sorted by their count, with most frequent word on the top.

```
$ python wordfreq.py words.txt
five 5
four 4
three 3
two 2
zero 2
one 1
ten 1
```

In [1]:
from wordfreq import wordfreq

In [11]:
words_1k = [str(i) for i in range(1000)]
words_10k = [str(i) for i in range(10000)]
words_100k = [str(i) for i in range(100000)]
words_1m = [str(i) for i in range(1000000)]

In [13]:
%timeit x = wordfreq(words_1k)

178 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [14]:
%timeit x = wordfreq(words_10k)

1.9 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [15]:
%timeit x = wordfreq(words_100k)

24.8 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [16]:
%timeit x = wordfreq(words_1m)

354 ms ± 7.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
%%file words_count.py
# word count program by pbhuvan
# import sys
# a = sys.argv[1]
# b = open(a,'r').read().split()
def word_count(b):
    li1,li2=[],[]
    for i in b:
        if i not in li1:
            li1.append(i)
            li2.append(b.count(i))
# for i in range(len(li1)):
#     print(li1[i],li2[i])


Overwriting words_count.py


In [19]:
from words_count import word_count

In [21]:
%timeit x = wordfreq(words_1k)
%timeit x = word_count(words_1k)

177 µs ± 1.35 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
31.1 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [22]:
%timeit x = wordfreq(words_10k)
%timeit x = word_count(words_10k)

1.84 ms ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
3.16 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [23]:
%timeit x = word_count(words_100k[:20000])

12.5 s ± 680 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [36]:
%%file wordfreq.py
"""Program to compute frequency of words in the given file.

USAGE: python wordfreq.py words.txt
"""
import sys

def read_words(filename):
    """Reads and returns all words in the given filename as a list.
    """
    return open(filename).read().split()

def wordfreq(words):
    """Takes a list of words as argument and computes 
    frquency of each unique word in those words as a dictionary.
    
        >>> wordfreq([])
        {}
        >>> wordfreq(['a', 'b', a'])
        {"a": 2, "b": 1}
    """
    freq = {}
    for w in words:
#         if w in freq:
#             freq[w] = freq[w] + 1
#         else:
#             freq[w] = 1  
        freq[w] = freq.get(w, 0) + 1
    return freq

def print_freq(freq):
    """Prints frequency of words in a nice readable format.
    """    
    items = sorted(
        freq.items(), 
        key=lambda item: item[1], 
        reverse=True)
      
    for w, count in items:
        print(w, count)

def main():
    filename = sys.argv[1]
    words = read_words(filename)
    freq = wordfreq(words)
    print_freq(freq)
    
if __name__ == "__main__":
    main()

Overwriting wordfreq.py


In [37]:
!python wordfreq.py words.txt

five 5
four 4
three 3
two 2
zero 2
one 1
ten 1


In [38]:
sq = lambda x: x*x

In [39]:
sq(4)

16

In [40]:
(lambda x: x*x)(4)

16

### Dictionary Comprehensions

In [42]:
words = ["alice", "bob", "charlie", "dave"]

In [43]:
[len(w) for w in words]

[5, 3, 7, 4]

In [44]:
{w: len(w) for w in words}

{'alice': 5, 'bob': 3, 'charlie': 7, 'dave': 4}

For example, we want to make a look up table for file sizes.

In [45]:
import os

In [46]:
{f: os.path.getsize(f) for f in os.listdir(".")}

{'hello.py': 15,
 'empty.txt': 0,
 'working-with-files.html': 576494,
 'a.csv': 36,
 'day4.html': 600656,
 'wordfreq.py': 1117,
 'readinput.py': 55,
 'Untitled.html': 575899,
 'sum_of_squares.py': 45,
 'b.csv': 86,
 'wc.py': 445,
 'Untitled1.ipynb': 589,
 'numbers.txt': 34,
 'index.html': 577240,
 'ka.txt': 16,
 'python.png': 11155,
 'test_sq2.py': 223,
 '.pytest_cache': 192,
 'Untitled.ipynb': 1128,
 'Makefile': 495,
 'cat.py': 32,
 'a.py': 44,
 'sort.py': 92,
 'sq.pyc': 643,
 'day2.html': 834826,
 'three.txt': 14,
 'day3.html': 836631,
 'Untitled1.html': 574876,
 'sq2.py': 91,
 'sq.py': 323,
 'Untitled2.ipynb': 72,
 'mymodule4.py': 88,
 '__pycache__': 416,
 'ls.py': 73,
 'echo2.py': 54,
 'b.txt': 6,
 'push': 0,
 'day2.ipynb': 82836,
 '2bytes.data': 2,
 'Untitled2.html': 574343,
 'notes': 416,
 'mymodule3.py': 203,
 'a.txt': 8,
 'mymodule2.py': 71,
 'lastword.py': 215,
 'day1.html': 799312,
 'day4.ipynb': 9375,
 'args.py': 27,
 'files': 128,
 'mymodule.py': 102,
 '.ipynb_checkpoints':

**Problem:** Write a function `invertdict` to interchange the keys and values in a dictionary. For simplicity, assume the values are also unique.

```
>>> invertdict({"x": 1, "y": 2, "z": 3})
{1: "x", 2: "y", 3: "z"}
```

Write this in a file invertdict.py and write some test cases to verify that. 

In [47]:
%%file invertdict.py

def invertdict(d):
    pass

def test_empty():
    assert invertdict({}) == {}

Writing invertdict.py


In [49]:
!py.test invertdict.py

platform darwin -- Python 3.10.2, pytest-7.0.0, pluggy-1.0.0
rootdir: /Users/anand/trainings/2022/zeomega-python
plugins: anyio-3.5.0
collected 1 item                                                               [0m

invertdict.py [31mF[0m[31m                                                          [100%][0m

[31m[1m__________________________________ test_empty __________________________________[0m

    [94mdef[39;49;00m [92mtest_empty[39;49;00m():
>       [94massert[39;49;00m invertdict({}) == {}
[1m[31mE       assert None == {}[0m
[1m[31mE        +  where None = invertdict({})[0m

[1m[31minvertdict.py[0m:6: AssertionError
FAILED invertdict.py::test_empty - assert None == {}


## Sets

In [50]:
x = {1, 2, 3}

In [51]:
{1, 2, 3, 1}

{1, 2, 3}

Set always has unique values.

How to find unique elements in a list?

In [56]:
names = ["a", "b", "c", "a"]

In [60]:
def unique(values):
    d = []
    for v in values:
        if v not in d:
            d.append(v)
    return d

In [61]:
unique(names)

['a', 'b', 'c']

In [62]:
unique(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [64]:
%timeit unique(range(100))

89.5 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [67]:
%timeit unique(range(1000))

7.77 ms ± 52 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [68]:
%timeit unique(range(10000))

764 ms ± 5.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [71]:
def unique(values):
    d = set()
    for v in values:
        if v not in d:
            d.add(v)
    return d

In [72]:
%timeit unique(range(10000))

1.31 ms ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [73]:
%timeit unique(range(100000))

14.8 ms ± 283 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [76]:
def unique(values):
    return list(set(values))

In [77]:
unique(['a', 'b', 'c', 'a'])

['a', 'b', 'c']

## Understanding Python execution environment

In [78]:
x = 1

In [85]:
%%file variables.py

x = 1
name = "python"

g = globals()
print(g)

g['x'] = 2
g['z'] = 10
print(x)
print(z)

Overwriting variables.py


In [86]:
!python variables.py

{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x1075113c0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '/Users/anand/trainings/2022/zeomega-python/variables.py', '__cached__': None, 'x': 1, 'name': 'python', 'g': {...}}
2
10


In [87]:
import mymodule

BEGIN mymodule
7
END mymodule


In [88]:
mymodule.x

2

In [89]:
mymodule.add(3, 4)

7

In [91]:
mymodule.__dict__.keys()

dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__file__', '__cached__', '__builtins__', 'x', 'add'])

In [92]:
mymodule.__dict__['x']

2

In [93]:
mymodule.__dict__['x'] = 22

In [94]:
mymodule.x

22

## Writing Beautiful Code

<https://speakerdeck.com/anandology/writing-beautiful-code-europython-2017>

## Classes

In [95]:
name = "Hello"

In [96]:
name.upper()

'HELLO'

In [97]:
type(name)

str

Here, `name` is an object of type `str`, or `name` is an instance of `str`.

Classes allows us to pack the behavior along with the object.

In [99]:
dir(name)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


In [100]:
import datetime

In [101]:
d = datetime.date(2020, 10, 20)

In [102]:
d

datetime.date(2020, 10, 20)

In [104]:
d.strftime("%Y-%m-%d")

'2020-10-20'

The date is made of three properties. The year, month and day. 
Instead of keeeping track of all these three values, we get one unified date. 

In [107]:
def print_date(y, m, d):
    print(f"{y}-{m}-{d}")
    
print_date(2020, 10, 20)

2020-10-20


In [108]:
def print_date(date):
    print(f"{date.year}-{date.month}-{date.day}")
    
date = datetime.date(2020, 10, 20)
print_date(date)    

2020-10-20


### Syntax of writing classes

In [127]:
class Point:
    def __init__(self, x, y):
        print("Point.__init__")
        self.x = x
        self.y = y
        
    def getx(self):
        return self.x

In [128]:
p1 = Point(3, 4)

Point.__init__


In [129]:
p1.x

3

In [130]:
p1.y

4

Calling `Point(3, 4)` creates a new instance of Point. To do this it does the following steps:
    
* creates an empty object of type Point
* initializes it by calling the `__init__` method
* returns back the object

In [131]:
p1.__dict__

{'x': 3, 'y': 4}

In [132]:
p1.x

3

In [133]:
p1.__dict__['x'] # p1.x

3

In [134]:
p1.__dict__['x'] = 33   # p1.x = 33

In [135]:
p1.x

33

In [136]:
p1.getx()

33

In [137]:
Point

__main__.Point

In [138]:
Point.__dict__

mappingproxy({'__module__': '__main__',
              '__init__': <function __main__.Point.__init__(self, x, y)>,
              'getx': <function __main__.Point.getx(self)>,
              '__dict__': <attribute '__dict__' of 'Point' objects>,
              '__weakref__': <attribute '__weakref__' of 'Point' objects>,
              '__doc__': None})

In [139]:
Point.getx

<function __main__.Point.getx(self)>

In [141]:
Point.getx(p1) # p1.getx()

33

In [142]:
def gety(p):
    return p.y

In [143]:
gety(p1)

4

In [144]:
Point.gety = gety

In [145]:
Point.gety(p1)

4

In [147]:
p1.gety() # Point.gety(p1)

4

In [148]:
class Point2:
    def __init__(p, x, y):
        print("Point.__init__")
        p.x = x
        p.y = y
        
    def getx(p):
        return p.x

In [149]:
p2 = Point2(10, 20)

Point.__init__


In [150]:
p2.getx()

10

The argument to a method is named `self` by convention. It is possible to call it something else, but it is not usually recommended.

**Q:** Can we have another init function in a class?

In [152]:
class Point3:
    def init(self, x, y):
        self.x = x
        self.y = y
        
    def getx(self):
        return self.x

In [153]:
p3 = Point3() # no __init__ defined, so can't pass any arguments

In [154]:
p3.init(10, 20)

In [155]:
p3.x

10

In [156]:
p3.y

20

In [157]:
p1

<__main__.Point at 0x113dce350>

#### The `__str__` and `__repr__`

The `__str__` method is used to convert an object into a string.

In [162]:
class Point:
    def __init__(self, x, y):
        print("Point.__init__")
        self.x = x
        self.y = y
        
    def __str__(self):
        return f"({self.x}, {self.y})"
        
    def getx(self):
        return self.x

In [163]:
p = Point(4, 5)

Point.__init__


In [164]:
print(p)

(4, 5)


In [165]:
print("I've a created a point", p)

I've a created a point (4, 5)


The need for `repr`:

In [166]:
print(1, "1")

1 1


In [168]:
[1, "1"] # what is printed now is repr of each element

[1, '1']

The `repr` may be same as str or it could be different. Typically repr includes hints about the type of the value. 

In [170]:
print(p)

(4, 5)


In [171]:
[1, (2, 3), p]

[1, (2, 3), <__main__.Point at 0x113ffdb40>]

In [181]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __str__(self):
        return f"({self.x}, {self.y})"

    def __repr__(self):
        return f"<Point({self.x}, {self.y})>"
#         return f"Point({self.x}, {self.y})"
        
    def getx(self):
        return self.x

In [182]:
p = Point(4, 5)

In [183]:
[1, (2, 3), p]

[1, (2, 3), <Point(4, 5)>]

If we define only `__repr__` and skip `__str__`, it will use `__repr__` for str.

### More methods

In [191]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def add(self, p):
        """Adds a point to another.
        
            >>> p1 = Point(1, 2)
            >>> p2 = Point(10, 20)
            >>> p1.add(p2)
            <Point(11, 22)>
        """
        x = self.x + p.x
        y = self.y + p.y
        return Point(x, y)
        
    def __str__(self):
        return f"({self.x}, {self.y})"

    def __repr__(self):
        return f"<Point({self.x}, {self.y})>"


In [192]:
p1 = Point(1, 2)
p2 = Point(3, 4)
p3 = Point(5, 6)

How to add all these three points?

In [193]:
p12 = p1.add(p2)

In [194]:
p12

<Point(4, 6)>

In [195]:
p1.add(p2).add(p3)

<Point(9, 12)>

**Problem:** Add a method `double` to the `Point` class. It should return a new point with both x and y coordinates doubled.

```
>>> p = Point(2, 3)
>>> p2 = p.double()
>>> p2
<Point(4, 6)>
>>> p.double().double()
<Point(8, 12)>
```

### Example: Timer

In [196]:
import time

In [197]:
time.time()

1644489361.8273559

In [202]:
def unique(values):
    d = []
    for v in values:
        if v not in d:
            d.append(v)
    return d

In [205]:
t0 = time.time()
unique(range(10000))
t1 = time.time()
dt = t1-t0
print(f"Time taken: {dt:0.3f} seconds")

Time taken: 0.764 seconds


In [206]:
t0 = time.time()
unique(range(20000))
t1 = time.time()
dt = t1-t0
print(f"Time taken: {dt:0.3f} seconds")

Time taken: 3.102 seconds


One way to avoid the details time and difference etc. is to move it to a class.

```
t = Timer()
t.start()
do_something()
t.stop()
t.diplay()
```

In [213]:
import time

class Timer:
    def start(self):
        self.t0 = time.time()

    def stop(self):
        self.t1 = time.time()
    
    def get_time_taken(self):
        return self.t1-self.t0
    
    def display(self):
        dt = self.get_time_taken()
        print(f"Time taken: {dt:0.3f} seconds")

In [212]:
t = Timer()
t.start()
unique(range(20000))
t.stop()
t.display()

Time taken: 3.113 seconds


## Class Inheritance

In [214]:
%%file three.txt
One
Two
Three

Overwriting three.txt


In [218]:
class Formatter:
    def format_text(self, text):
        """Method to format text."
        
        Subclasses can override this method to specify
        how to format given text.
        """
        return text

    def format_file(self, filename):
        text = open(filename).read()
        return self.format_text(text)

In [219]:
f = Formatter()
f.format_text("Hello")

'Hello'

In [220]:
print(f.format_file("three.txt"))

One
Two
Three



In [221]:
class UpperCaseFormatter(Formatter):
    def format_text(self, text):
        return text.upper()

In [222]:
f = UpperCaseFormatter()

In [224]:
print(f.format_text("Hello"))

HELLO


In [225]:
print(f.format_file("three.txt"))

ONE
TWO
THREE



**Problem:** Implement `LowerCaseFormatter`.
    
```
>>> f = LowerCaseFormatter()
>>> f.format_text("Hello")
'hello'
>>> print(f.format_file("three.txt"))
one
two
three

```

In [227]:
class LineFormatter(Formatter):
    def format_text(self, text):
        lines = text.splitlines()
        lines = [self.format_line(line) for line in lines]
        return "\n".join(lines)
    
    def format_line(self, line):
        """Format a line.
        
        The subclasses can specify how to format a line.
        """
        return line

In [231]:
class PrefixFormatter(LineFormatter):
    def __init__(self, prefix):
        self.prefix = prefix
        
    def format_line(self, line):
        return self.prefix + line

In [232]:
f = PrefixFormatter("[INFO] ")

In [233]:
f.format_line("Hello")

'[INFO] Hello'

In [234]:
f.format_text("Hello")

'[INFO] Hello'

In [236]:
print(f.format_text("1\n2\n3"))

[INFO] 1
[INFO] 2
[INFO] 3


In [237]:
print(f.format_file("three.txt"))

[INFO] One
[INFO] Two
[INFO] Three


Problem: Implement a `LineNumberFormatter` class.


```
>>> f = LineNumberFormatter()
>>> print(f.format_line("a"))
1: a

>>> f = LineNumberFormatter()
>>> print(f.format_text("a\nb\nc"))
1: a
2: b
3: c

>>> f = LineNumberFormatter()
>>> print(f.format_file("three.txt"))
1: One
2: Two
3: Three
```

### Exception Handling

In [238]:
no_such_variable

NameError: name 'no_such_variable' is not defined

In [239]:
int("bad-number")

ValueError: invalid literal for int() with base 10: 'bad-number'

In [240]:
open("no-file")

FileNotFoundError: [Errno 2] No such file or directory: 'no-file'

In [244]:
def toint(strvalue):
    try:
        return int(strvalue)
    except ValueError:
        print("Bad number:", strvalue)
        return 0

In [245]:
toint('50')

50

In [246]:
toint('5a')

Bad number: 5a


0

In [247]:
1 + toint('5a')

Bad number: 5a


1

In [248]:
def f():
    # comment 1
    # comment 2    
    g()
    
def g():
    h()
    
def h():
    open("bad-file")

In [250]:
f()

FileNotFoundError: [Errno 2] No such file or directory: 'bad-file'

In [253]:
value = 'a'
try:
    int(value)
except ValueError:
    print("Bad value", value)
else:
    print("in else")
finally:
    print("in finally")    

Bad value a
in finally


#### Raising Exceptions

In [258]:
class BankError(Exception):
    pass

def deposit(amount):
    if amount % 100 != 0:
        raise BankError("The deposit amount must be in multiples of 100")
    print("deposited!")

In [259]:
deposit(200)

deposited!


In [260]:
deposit(42)

BankError: The deposit amount must be in multiples of 100