# Here, Some notes about Python


The original note come from here  
https://github.com/chiphuyen/python-is-cool
I modify a bit, add some comments and add notes about Generator, Iteration, ....



https://github.com/sahands/python-by-example/blob/master/python-by-example.rst#flattening-lists

# 1) Function lambda, Map, Reduce, Filter

## Lambda


In [1]:
def square_fn(x):
    return x*x

square_ln = lambda x: x*x

In [3]:
for i in range(0, 10):
    assert square_fn(i) == square_ln(i)

In [4]:
x = 'Helo'
assert x=='Hello'

AssertionError: 

### Note
- Two functions square_fn and square_ln are identical. Its quick declaration makes lambda functions ideal for use in callbacks, and when functions are to be passed as arguments to other functions. They are especially useful when used in conjunction with functions like map, filter, and reduce
- Assert is used to test / debug code. This keyword lets you test if a condition in your code returns True, if not the programme raise an AssertionError.

In [12]:
nums = [1/3, 333/7, 2323/2230, 40/34, 2/3]


In [13]:
square_nums_1 = map(square_fn, nums)
##square_nums_2 = map(lambda x: x*x, nums)

In [14]:
list(square_nums_1)

[0.1111111111111111,
 2263.0408163265306,
 1.0851472983570953,
 1.384083044982699,
 0.4444444444444444]

In [15]:
print(list(square_nums_1))

[]


In [17]:
square_nums_2 = list(map(lambda x: x*x, nums))

In [18]:
print(square_nums_2)

[0.1111111111111111, 2263.0408163265306, 1.0851472983570953, 1.384083044982699, 0.4444444444444444]


In [20]:
print(type(square_nums_2))

<class 'list'>


### Note
- Note that objects returned by map, filter are iterators, which means that their values aren't stored but generated as needed. After you've called list(square_nums_1), square_nums_1 becomes empty. If you want to keep all elements in diffs, convert it to a list using list(square_nums_1). An example at line 17, square_nums_2 is a list, not a generator.

### Filter(fn, iterable)
filter(fn, iterable) works the same way as map, except that fn returns a boolean value and filter returns all the elements of the iterable for which the fn returns True.

In [23]:
errors = [0,1, 0.4, 0.05, 0.9, 0.5, 0.7]
bad_preds = filter(lambda x : x > 0.5, errors)
print(list(bad_preds))

[1, 0.9, 0.7]


### Reduce(fn, iterable, initializer)
Reduce(fn, iterable, initializer) is used when we want to iteratively apply an operator to all elements in a list. For example, if we want to calculate the product of all elemetns in a list:

In [24]:
product = 1
for num in nums:
    product *= num
print(product)

12.95564683272412


In [25]:
from functools import reduce
product = reduce(lambda x,y : x*y, nums)
print(product)

12.95564683272412


### Note on the performance of lambda functions

- Lambda functions are meant for one time use. Each time `lambda x: dosomething(x)` is called, the function has to be created, which hurts the performance if you call `lambda x: dosomething(x)` multiple times (e.g. when you pass it inside `reduce`).

- When you assign a name to the lambda function as in `fn = lambda x: dosomething(x)`, its performance is slightly slower than the same function defined using `def`, but the difference is negligible. See [here](https://stackoverflow.com/questions/26540885/lambda-is-slower-than-function-call-in-python-why).

- Even though I find lambd as cool, I personally recommend using named functions when you can for the sake of clarity.

# 2) List

## Flatten list

In [26]:
list_of_list = [[3,4], [3,2], [1, 6]]
flatten_list = sum(list_of_list, [])
print(flatten_list)

[3, 4, 3, 2, 1, 6]


In [29]:
list_of_text = [['I', 'you', 'me'], ['here', 'you'], [['test']]]
flatten_list_text = sum(list_of_text, [])
print(flatten_list_text)

['I', 'you', 'me', 'here', 'you', ['test']]


## List slices with step (a[start:end:step])

In [36]:
x = list(range(10))
print(x)
print(x[::-1])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


In [34]:
print(x[1:5:2])

[1, 3]


The syntax [x:y:z] means "take every zth element of a list from index x to index y". When z is negative, it indicates going backwards. When x isn't specified, it defaults to the first element of the list in the direction you are traversing the list. When y isn't specified, it defaults to the last element of the list. So if we want to take every 2th element of a list, we use [::2].

In [38]:
print(x[::2])
print(x[-2::-2])

[0, 2, 4, 6, 8]
[8, 6, 4, 2, 0]


In [40]:
nested_lists = [[1,2], [3,4], [[5,6, [7,8, [9, 10]]]]]
flatten = lambda x: [y for l in x for y in flatten(l)] if type(x) is list else [x]
flatten(nested_lists)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [43]:
import itertools
x = [[1, 2], [3, 4], [5, 6, [7,8]]]
list(itertools.chain.from_iterable(x))

[1, 2, 3, 4, 5, 6, [7, 8]]

## Note on list in Python
- List is mutable object, this means that it can be changed after it is created

# 3) Decorator to time your function


It's often useful to know how long it takes a function to run, e.g. when you need to compare the performance of two algorithms that do the same thing. One naive way is to call time.time() at the begin and end of each function and print out the difference.


For example: compare two algorithms to calculate the n-th Fibonacci number, one uses memoization and one doesn't.

In [62]:
def fib_helper(n):
    if n < 2:
        return n
    return fib_helper(n-1) + fib_helper(n-2)
    
def fib(n):
    """ fib is a wrapper function so that later we can change its behavior
    at the top level without affecting the behavior at every recursion step.
    """
    return fib_helper(n)
    

In [66]:
def fib_memo_helper(m, dict_computed):
    if m in dict_computed:
        return dict_computed[m]
    else:
        dict_computed[m] = fib_memo_helper(m-1, dict_computed) + fib_memo_helper(m-2,dict_computed)
        return dict_computed[m]
    
def fib_memo(m):
    return fib_memo_helper(m, {0:0, 1:1})
    

Let's make sure that fib and fib_memo are functionally equivalent.

In [68]:
for i in range(20):
    assert fib(i) == fib_memo(i)

In [74]:
import time

start = time.time()
fib(30)

print(f'Without memoization, it takes {time.time() - start:7f} seconds.')

start = time.time()
fib_memo(30)
print(f"With memoization, it takss {time.time() - start:.7f} seconds.")

Without memoization, it takes 0.340336 seconds.
With memoization, it takss 0.0001211 seconds.


If you want to time multiple functions, it can be drag having to write the same code over and over again. It'd be nice to have a way to specify how to change any function in the same way. In this case would qbe to call time.time() at the beginning and the end of each function, and print out the time difference.


This is exactly what decorators do. They allow programmers to change the behavior of a function or class. Here's an example to create a decorator `timeit`.

### Note on *args and **kwargs python

Ref: https://viblo.asia/p/python-args-va-kwargs-gDVK2pdnlLj

There are two ways to pass arguments to a function in Python.
- Non keyword arguments (e.g. `def f(a,b); return a - b`. We need to call the function in position order of arguments. For example f(0,1) is different from f(1,0). 
                
- keyword arguments (e.g. `def f(a=0, b=1); return a - b)`. These arguments have to specify their values in definition. This method allows us to call the function in any position order of arguments like `f(a=0, b=5)` or `f(b=5, a=0)`, it gives us the same result.


Here, \*args and **kwargs are mostly used in function definitions. They allow you to pass an unspecified number of arguments to a function.

- *args is used to send a non-keyworded variable length argument list to the function. For exemple, if we want to calculate sum of the first n term of a series. But n is not fixed. So, we can pass a list/ or tuple to the function as a argument. In simplify the code, we should use *args instead. 

- ** kwargs allows you to pass keyworded variable length of arguments to a funcion. You should use ** kwargs if you want to handle named arguments in a function. If number of arguments and their name are unknown, we can pass a dictionary as a parameter. e.g. `def foo(a_dict): for key, value in a_dict.items(): print(key, value) `. In this case, we can replace the dictionary by ** kwargs. `def foo(**kwargs): for key, value in kwargs.items(): print(key, value)`. When we call the function, we just do: `foo(a=1, b=2)`.

Let's take some examples of *args and ** kwargs:

In [78]:
def sum_series(list_numbers):
    result = 0
    for ele in list_numbers:
        result += ele
    return result


def sum_series_2(*args):
    result = 0
    for x in args:
        result += x
    return result

# Note: here args is a tuple, not a list. 

In [82]:
print(sum_series([1,2,3]))
print(sum_series_2(1,2,3,4, 5))

6
15


In [81]:
list_1 = [1,2,3]
list_2 = [3,4,5]
list_3 = [6,7,8]
list_all = [*list_1, *list_2, list_3]
print(list_all)

[1, 2, 3, 3, 4, 5, [6, 7, 8]]


### Let's get back our sheep, Decorator

In [75]:
def timeit(fn):
    #*args and **kwargs are to support positional and named arguments of fn
    def get_time(*arg, **kwargs):
        start = time.time()
        output = fn(*arg, **kwargs)
        print(f"Time taken in {fn.__name__}: {time.time() - start:.7f}")
        return output # make sure that the decorator returns the output of fn
    return get_time
    

In [76]:
@timeit
def fib(n):
    return fib_helper(n)


@timeit
def fib_memo(n):
    return fib_memo_helper(n, {0:0, 1:1})

In [77]:
fib(30)
fib_memo(30)

Time taken in fib: 0.3420181
Time taken in fib_memo: 0.0000172


832040

# 4) Caching with @functools.lru_cache

Memoization is a form of cache: we cache the previously calculated Fibonacci numbers so that we don't have to calculate them again.

Caching is such an important technique that Python provides a built-in decorator to give your function the caching capacity. If you want fib_helper to reuse the previously calculated Fibonacci numbers, you can just add the decorator lru_cache from @functools.lru stands for "least recently used". For more information on cache, see here.

In [84]:
import functools

@functools.lru_cache()
def fib_helper(n):
    if n<2:
        return n
    return fib_helper(n-1) + fib_helper(n-2)

@timeit
def fib(n):
    return fib_helper(n)
    

In [87]:
fib(30)
fib_memo(30)

Time taken in fib: 0.0000029
Time taken in fib_memo: 0.0000212


832040

# 5) Classes and some special methods

In Python, magic methods are prefixed and suffixed with the double underscore `__`, also known as dunder. The most wellknown magic method is probably `__init__`. 


In [88]:
class Node:
    """ A struct to denote the node of a binary tree.
    It contains a value and pointers to left and right children.
    """
    
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right
        

We try to print out a Node object, however, it's not very interpretable.

In [91]:
node = Node(10)
print(node)

<__main__.Node object at 0x7fa17ed088b0>


Ideally, when user prints out a node, we want to print out the node's value and the values of its children if it has children. To do so, we use the magic method `__repr__`
which must return a printable object, like string.

In [46]:
class Node:
    """ A struct to denote the node of a binary tree.
    It contains a value and pointers to left and right children.
    """
    def __init__(self, value, left = None, right = None):
        self.value = value
        self.left = left
        self.right = right
        
        
    def __repr__(self):
        strings = [f'value: {self.value}']
        strings.append(f'left : {self.left.value}' if self.left else 'left: None')
        strings.append(f'right: {self.right.value}' if self.right else 'right: None')
        return ', '.join(strings)
    
left = Node(3)
root = Node(4, left)
print(root)

value: 4, left : 3, right: None


In [47]:
print(left)

value: 3, left: None, right: None


We'd also like to compare two nodes by comparing their values. To do so, we overload the operator `==` with `__eq__`, `<` with `__lt__`, and `>=` with `__ge__`.

In [95]:
class Node:
    """ A struct to denote the node of a binary tree.
    It contains a value and pointers to left and right children.
    """
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right
    
    def __eq__(self, other):
        return self.value == other.value
    
    def __lt__(self, other):
        return self.value < other.value
    
    def __ge__(self, other):
        return self.value >= other.value


left = Node(4)
root = Node(5, left)
print(left == root)
print(left < root)
print(left >= root)

False
True
False


For a comprehensive list of supported magic methods [here](https://www.tutorialsteacher.com/python/magic-methods-in-python) or see the official Python documentation [here](https://docs.python.org/3/reference/datamodel.html#special-method-names) (slightly harder to read).

Some of the methods that I highly recommend:

- `__len__`: to overload the `len()` function.
- `__str__`: to overload the `str()` function.
- `__iter__`: if you want to your objects to be iterators. This also allows you to call `next()` on your object.

For classes like Node where we know for sure all the attributes they can support (in the case of Node, they are `value`, `left`, and `right`), we might want to use `__slots__` to denote those values for both performance boost and memory saving. For a comprehensive understanding of pros and cons of `__slots__`, see this [absolutely amazing answer by Aaron Hall on StackOverflow](https://stackoverflow.com/a/28059785/5029595).


(Nghĩa là đối với những class mà attributes của chúng cố định như Node, thì chúng ta có thể sử dụng `__slot__` để khai báo những giá trị này. Việc này giúp tăng performance và memory saving.)

The special attribute `__slots__`allows you to explicitly state which instance attributes you expect your object instance to have, with the expected results:
- faster attribute access.
- space savings in memory.


The space savings is from storing value references in slots insteads of `__dict__`.
Denying `__dict__` and `__weakref` creation if parent classes deny them and you declare `__slots__`.

# 6) Iterator and Generator
https://medium.com/swe-tieng-viet/hi%E1%BB%83u-s%C3%A2u-h%C6%A1n-v%E1%BB%81-generator-trong-python-611a6dee19b4

https://medium.com/swe-tieng-viet/iterator-vs-generator-trong-python-703ad12c73a7

https://realpython.com/python-while-loop/

https://realpython.com/introduction-to-python-generators/


- Iteration: The process of looping through the objects or items a collection.

- Iterable: An object that can be iterated over

- Iterator: The object that produces successive items or values from its associated iterable. or Iterator in Python is simply an object that can be iterated upon. An object which will return data, one element at a time. This object must implement two special methods: `__next__()` and `__iter__()` that called iterator protocol.

- Iter(): a built-in function used to obtain a an iterable

- We can create class - Iterator by defining the method `next()`. For example, built-in Class named `range` is a iterator that allows us to generate successively its elements (or to iterate over its values).

In [123]:
x = ['a', 'b', 'c', 'd']
y = iter(x)
print(type(y))
print(next(y))

<class 'list_iterator'>
a


In [92]:
class PowerTwo:
    """Class to implement an iterator
    of powers of two : 2^n. We choose max_power for n and
    class create an iterator that generate numbers 2^0, 2^1, 2^2, ...,2^max_value """

    def __init__(self, max_power):
        self.max_power = max_power
        
    def __repr__(self):
        return f'value: {self.max_power}'

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n <= self.max_power:
            result = 2 ** self.n # 2^n
            self.n += 1
            return result
        else:
            raise StopIteration

In [93]:
u = PowerTwo(10)
print(u.__dict__)

{'max_power': 10}


In [89]:
v = iter(u)
print(v.__dict__)

{'max_power': 10, 'n': 0}


In [91]:
print(next(v))
print(next(v))
print(next(v))
print(next(v))

8
16
32
64


The function `dir(u)` will list all attributes and methods of the class.

In [95]:
dir(u)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'max_power']

## Generator
Source
https://www.programiz.com/python-programming/generator
https://realpython.com/introduction-to-python-generators/

There are two type of generator: Generator functions and generator expressions.

- Why Generator? There is a lot of work in building iterator in Python. We have to implement a class with `__iter__()` and `__next__()` methods, keep track of internal states, and raise StopIteration when there are no values to be returned. This both lengthy and counterintuitive. ---> Generator comes to the rescue in such situations.

- Python Generators are a simple way of creating iterator. A generator (function) is a function that returns an object (iterator) which we can iterate over (one value at a time).

- It's simple to create a generator in Python. It is as easy as defining a function, but with a `yield` statement instead of a `return` statement. If a function contians at least one `yield` statement , it becomes a generator fucntion. Both `yield`and `return`will return some value from a function.

- The difference is that while a `return`statement terminates a function entirely, `yield` statement pauses the function saving all its states and later continues from there on successive calls.



### Differences between Generator function and Normal function.
- Generator function contians one or more `yield` statements.

- When called, it returns an object (iterator) but does not start execution immediately.

- Methods like `__iter__()` and `__next__()` are implemented automatically. So we can iterate through the items using `next()`.

- One the funciton yields, the function is paused and the control is transferred to the caller.

- Local variables and their states are remembered between successive calls.

- Finally, whe the function terminates, StopIteration is raised automatically on further calls.



In [104]:
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1
        
i_seq = infinite_sequence()
print(type(i_seq))
print(next(i_seq))
print(next(i_seq))
print(next(i_seq))

<class 'generator'>
0
1
2


#### Reading large file

In [136]:
def count_line(file_name):
    for row in open(file_name, "r"):
        yield row       


test = count_line("/Users/bnmac/Documents/Travail/python/python-is-cool/data/Questions_Rep.csv")
count = 0
for ele in test:
    count += 1
    
print(count)

16


### Building generators with Generator expressions

In [113]:
import sys
squared_nums_list = [i**2 for i in range(100000)]
print(sys.getsizeof(squared_nums_list))

squared_nums_generator = (i**2 for i in range(100000))
print(sys.getsizeof(squared_nums_generator))

824456
112


In [137]:
import cProfile
cProfile.run('sum([i * 2 for i in range(100000)])')

         5 function calls in 0.014 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.012    0.012    0.012    0.012 <string>:1(<listcomp>)
        1    0.001    0.001    0.014    0.014 <string>:1(<module>)
        1    0.000    0.000    0.014    0.014 {built-in method builtins.exec}
        1    0.001    0.001    0.001    0.001 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




In [138]:
import cProfile
cProfile.run('sum(i * 2 for i in range(100000))')

         100005 function calls in 0.025 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100001    0.014    0.000    0.014    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    0.025    0.025 <string>:1(<module>)
        1    0.000    0.000    0.025    0.025 {built-in method builtins.exec}
        1    0.011    0.011    0.025    0.025 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




# 7) local namespace, object's attributes

The locals() function returns a dictionary containing the variables defined in the local namespace. 

In [107]:
class Model1:
    def __init__(self, hidden_size = 100, num_layers = 3, learning_rate = 3e-3):
        print(locals())
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.learning_rate = learning_rate
        
model1 = Model1()


{'self': <__main__.Model1 object at 0x7fa17ed3e4f0>, 'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.003}


All attributes of an object is stored in its `__dict__`

In [108]:
model1.__dict__

{'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.003}

Note that manually assigning each of teh arguments to an attribute can be quite tiring when the list of arguments is large. To avoid this, we can directly assign the list of arguments to the object's `__dict__`.

In [113]:
class Model2:
    def __init__(self, hidden_size = 50, num_layers = 30, learning_rate = 3e-2):
        params = locals()
        del params['self']
        self.__dict__ = params

In [114]:
model2 = Model2()
model2.__dict__

{'hidden_size': 50, 'num_layers': 30, 'learning_rate': 0.03}

This can be convenient when the object is initiated using the catch-all ** kwargs.

In [115]:
class Model3:
    def __init__(self, **kwargs):
        self.__dict__ = kwargs

In [116]:
model3 = Model3(hidden_size=100, num_layers=3, learning_rate=3e-4)
model3.__dict__

{'hidden_size': 100, 'num_layers': 3, 'learning_rate': 0.0003}

In [117]:
model3_1 = Model3(hidden_size = 10, num_layers = 4, learning_rate = 3e-2, paras = 5)
model3_1.__dict__

{'hidden_size': 10, 'num_layers': 4, 'learning_rate': 0.03, 'paras': 5}

# 6. Wildcard import
Often, you run into this wildcard import `*` that looks something like this:

`file.py`
    
    from parts import *

This is irresponsible because it will import everything in module, even the imports of that module. For example, if `parts.py` looks like this:

`parts.py`

    import numpy
    import tensorflow
    
    class Encoder:
        ...
    
    class Decoder:
        ...
        
    class Loss:
        ...
    
    def helper(*args, **kwargs):
        ...
    
    def utils(*args, **kwargs):
        ...

Since `parts.py` doesn't have `__all__` specified, `file.py` will import Encoder, Decoder, Loss, utils, helper together with numpy and tensorflow.

If we intend that only Encoder, Decoder, and Loss are ever to be imported and used in another module, we should specify that in `parts.py` using the `__all__` keyword.

`parts.py`
    
    __all__ = ['Encoder', 'Decoder', 'Loss']
    import numpy
    import tensorflow
    
    class Encoder:
        ...

Now, if some user irresponsibly does a wildcard import with `parts`, they can only import Encoder, Decoder, Loss. Personally, I also find `__all__` helpful as it gives me an overview of the module.