# Part I Python Fundamentals 1

This is my review note of Python for the purpose of self-study. The note mixes up with English & Chinese.
- Part I follows the *[Liao's Python tutorial](https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000) (in Chinese)*

In [14]:
# RUN IT
# Display multiple interactive objects in one shell
# No Need for print function
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### 1. Python Basics (Encoding/formating/list/tuple/if/loop/dict/set)

编码 Encoding

- **Python's str use Unicode**
- If you want to transform between `str` and `byte`, please use UTF-8
- 编码：
	- 最早只有127个字符被编码到计算机里，也就是大小写英文字母、数字和一些符号，这个编码表被称为ASCII编码
	- 其他国家语言加入进去，中文GB2312，日文Shift_JIS等等。编码不同，会产生乱码。
	- 为了世界和平，产生了 Unicode 包含所有语言
	- 问题在于 Unicode 编码通常为 2 个字节 而 ASCII 只需要 1个字节
	- 本着节约储存空间的原则，出现了把Unicode编码转化为“可变长编码”的UTF-8 编码， 可以把 Unicode 转化为 1-6 个字节 （英文 1 个， 中文 3 个， 稀有的特殊符号6个 等）。
	- 在计算机内存中，统一使用 Unicode 编码， 当要保存到硬盘上的时候再用 Unicode 转换成 UTF-8 节省空间。
	- 例如在记事本中打字的时候，编码为 Unicode， 保存为 abc.txt 的时候， 编码为 UTF-8。 读取文件时候 会把编码从 UTF-8 转换为 Unicode 显示在笔记本中。
	- 例如浏览网页的时候，服务器里储存的为 Unicode， 会把其转化为 UTF-8 显示到浏览器上。

In [80]:
# 如果知道字符的整数编码，可以用16进制这么写 str
# Kind of stupid
'\u4e2d\u6587'
'\u6211\u7684\u0020\u0070\u0079\u0074\u0068\u006f\u006e\u0020\u0036\u0036\u0036'

'中文'

'我的 python 666'

In [9]:
len('中'.encode('utf-8')) # 中文 3 个字节
len('中'.encode('GB2312')) # 中文 2 个字节

3

2

若在网络传输或保存磁盘，需要把 str 中的 char (unicode) 变为字节单位 bytes

In [81]:
'xyz'
b'xyz'

'xyz'

b'xyz'

In bytes, it will show 
### '\x##'
if the char is not in ascii

In [82]:
# encode
print('xyz'.encode('ascii'))
print('xyz'.encode('utf-8'))
print('哈の哈'.encode('GB2312')) # why it can encode Japanese??
print('哈哈'.encode('Shift_JIS')) # why the fuck it can encode Chinese??

b'xyz'
b'xyz'
b'\xb9\xfe\xa4\xce\xb9\xfe'
b'\x99\xfb\x99\xfb'


In [83]:
# Chinese cannot encoded by ascii
'哈哈'.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Reversely, if we want to get the str from the disk (bytes), we need to decode by different encoders (ascii/utf-8...).

In [115]:
b'xyz'.decode('ascii')
b'\xb9\xfe\xa4\xce\xb9\xfe'.decode('GB2312')

'xyz'

'哈の哈'

In [116]:
# similarly, utf-8 cannot decode GB2312
b'\xb9\xfe\xa4\xce\xb9\xfe'.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 0: invalid start byte

In [117]:
# you could choose to ignore the errors, in order to decode the understood part
b'\xb9\xfe\xa4\xce\xb9\xfe'.decode('utf-8', errors='ignore')
'ι'.encode('utf-8') # understood part, not exactly what we want

'ι'

b'\xce\xb9'

In [120]:
# len() returns the number of char in a str
len('哈哈')
len('哈哈'.encode('utf-8'))
'哈哈'.encode('utf-8') # disk space 2 chinese char = 6 bytes

2

6

b'\xe5\x93\x88\xe5\x93\x88'

### 为了避免乱码问题，应该始终坚持用 utf-8 进行 str / byte 转化

格式化 Formatting
两种方法

In [10]:
# %
'%2d-%02d' % (3,1) # 保留两位数字的空间
'%.2f' % (1.236) # 保留两位小数的空间

' 3-01'

'1.24'

If you don't know which one to use, then just use %s. (convert anything to str)

In [85]:
'my name is %s, I\'m %s' % ('Michael', 24)

"my name is Michael, I'm 24"

In [86]:
# format()
'{}\'s grade increase from {} to {}'.format('Michael', 80, 100)

"Michael's grade increase from 80 to 100"

List/Tuple
- List 是一个 可变的有序列表
- Tuple 是个 不可变的有序列表
- 能用 tuple 就不要用 list, tuple 更安全

In [122]:
# when define a tuple with one element, there is a trick
t1 = (1) # wrong
t2 = (1,) # right
t1
t2

1

(1,)

If statement

In [123]:
# simple and pythonic way

x = -1 # not zero int, True
y = 'a' # not empty str, True
z = [1, 2, 3] # note empty list, True

if x and y and z:
    print('This is totally True')

This is totally True


Loop

Actually nothing worth take notes here

Dict/ Set
- Compared to List, **Dict's search speed is very fast**, because it use the key to track the value like a dictionary
- Compared to List, **Dict will use lots of disk space (space/speed trade-off)**
- Key must be 不可变对象 (str/int...) but not 可变对象(list)

- Set means methamatical set
- Set only store keys but not values
- add() / remove() elements
- & | set operators

In [106]:
# one way to create dict
student = ['a', 'b', 'c']
grades = [95, 91, 99]
record = dict(zip(student, grades))
record

{'a': 95, 'b': 91, 'c': 99}

In [107]:
# delete a key and its value
record.pop('a')
record

95

{'b': 91, 'c': 99}

In [11]:
s1 = set([1, 2, 3])
s2 = set([1, 3, 4])
s1 & s2 # intersect
s1 | s2 # union

{1, 3}

{1, 2, 3, 4}

### 2. Function
- help(function)
- since functions are objects (**everything in python is an object**), we give a nickname to a function
- **VERY IMPORTANT FOR DEBUG: if there is no return in a function, it automatically return None**

In [38]:
# nickname a function
michael = abs
michael(-3)

3

In [11]:
# check type of parameters first when writing a function
def foo(x):
    if not isinstance(x, (int, float)):
        raise TypeError('x must be int or float')
    pass

foo('a')

TypeError: x must be int or float

函数在返回多值时实际上是返回了一个 **可被拆包的 tuple**, 

In [16]:
from math import hypot
def triangle_property(x, y):
    _sum = x + y
    _hypot = hypot(x, y)
    if _hypot <= _sum:
        _property = True
    else:
        _property = False
    return _sum, _hypot, _property

type(triangle_property(3, 4))
s, h, p = triangle_property(3, 4)
s, h, p

tuple

(7, 5.0, True)

### \*args

In [20]:
def sum_elements(*args):
    print(type(args)) # 已经拆包成功
    print(args) # 已经拆包成功
    return sum(x for x in args)

sum_elements(1, 2, 3)

<class 'tuple'>
(1, 2, 3)


6

In [22]:
# what if I want to input a list?
nbs = [1, 2, 3]
sum_elements(*nbs) # 进行主动拆包??

<class 'tuple'>
(1, 2, 3)


6

以上可以看出 * 的本质 就是把 元组拆包

而 ** 的作用是把 dict拆包

### \**kwargs

In [19]:
def information(name, age, **kwargs):
    print("name:", name, "age:", age, "others:", kwargs)
    
information('Michael', 13, handsome='very', smart='extremely', genius=True)

name: Michael age: 13 others: {'smart': 'extremely', 'handsome': 'very', 'genius': True}


限制关键字参数

In [23]:
# name, age: positional arguments
# handsome, job: keyword-only arguments (只接受这两个 kwargs, 被限制)
def information(name, age, *, handsome='very', job):
    print(name, age, handsome, job)

information('Michael', 13, job='actuary')

Michael 13 very actuary


**参数定义的顺序必须是：必选参数(positional arguments)、默认参数、可变参数、命名关键字参数(keyword-only arguments)和关键字参数**

In [28]:
def foo(x, y, z=1, *args, **kwargs):
    print(x, y, z, args, kwargs)
def fooo(x, y, *, z, **kwargs):
    print(x, y, z, kwargs)

In [51]:
foo(1, 2, a=2)
fooo(1, 2, z=3)

1 2 1 () {'a': 2}
1 2 3 {}


虽然这么多种参数很酷炫，但是尽量不要用太多在一起。Because if you read some code you dont't understand, simply just shoot the coder.       -- Tim Peters

Recursion(函数调用自身函数）

In [76]:
def fac(n):
    if n == 1:
        return 1
    return n * fac(n - 1)
%timeit fac(200)

10000 loops, best of 3: 51.5 µs per loop


In [78]:
from functools import lru_cache
@lru_cache(128) # store the recent 128 variables into cache, should be faster
def fac(n):
    if n == 1:
        return 1
    return n * fac(n - 1)
%timeit fac(200)

The slowest run took 1035.38 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 148 ns per loop


Hell NO! WTF, why is it much slower??

**使用递归函数的优点是逻辑简单清晰，缺点是过深的调用会导致栈溢出**

### 3.高级特性

切片 list[a: b] 包括 a 但不包括 b

In [24]:
ls = [0, 1, 2, 3, 4]
ls[:2] # first two elements
ls[2:] # from the thrid element to the end
ls[-3: -1]   # include [-3] but not [-1]

[0, 1]

[2, 3, 4]

[2, 3]

In [9]:
ls[::2] # 每隔2个取1个

[0, 2, 4]

In [25]:
# reverse the list
# understand it!
ls[::-1]

[4, 3, 2, 1, 0]

In [12]:
# copy a list
# understand it!
ls[:]

[0, 1, 2, 3, 4]

In [15]:
# tuple 也支持切片操作!
tps = (0, 1, 2, 3, 4)
tps[1:]

(1, 2, 3, 4)

In [26]:
# str 也支持切片操作!
s = 'abcd'
s[::-1]

'dcba'

迭代

In [27]:
# check if the object is iterable
from collections import Iterable
isinstance('xyz', Iterable)
isinstance([1, 2, 3], Iterable)

True

True

下标循环 enumerate()

In [34]:
x = 'Michael'
indexes = []
chars = []
for index, char in enumerate(x):
    indexes.append(index)
    chars.append(char)

dic = dict(zip(chars, indexes))
dic

{'M': 0, 'a': 4, 'c': 2, 'e': 5, 'h': 3, 'i': 1, 'l': 6}

List Comprehensions 列表推导

In [25]:
lst = [x**2 for x in range(1,11) if x % 2 == 0]
lst

[4, 16, 36, 64, 100]

In [35]:
[m + n for m in 'xyz'
       for n in '123']

['x1', 'x2', 'x3', 'y1', 'y2', 'y3', 'z1', 'z2', 'z3']

In [39]:
# Dictionary comprehensions
dic = {'x':1, 'y':2, 'z':3}
# dic.keys() # keys
# dic.values() # values
# dic.items() # keys + values

for key, value in dic.items():
    print(key, value)

# generate a dictionary with dict comprehensions
# fucking cool
dic2 = {k:v for k,v in dic.items()}
dic == dic2

z 3
y 2
x 1


True

In [48]:
# lower case all the elements in a list includes str and int
L = ['A', 1, 'B', 2, 'C']
L_new = [letter.lower() for letter in L if isinstance(letter, str)]
L_new

['a', 'b', 'c']

### Generator 生成器 (一个一边计算一边生成的列表，可以节省很多 space）
- 一个非常强大的工具，可以实现复杂逻辑

In [35]:
# Way 1: create a generator - use () instead of []
lst = [x for x in range(10)]
lst
gen = (x for x in range(5))
gen

# get values from generator using next()
while True:
    try:
        next(gen)
    except StopIteration: # generator 的停止条件
        break

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

<generator object <genexpr> at 0x104f72ba0>

0

1

2

3

4

generator 也是可以迭代的 （既是 Iterable 也是 Iterator）

In [37]:
from collections import Iterable, Iterator

isinstance(gen, Iterable)
isinstance(gen, Iterator)

gen = (x for x in range(5))
for i in gen:
    print(i, end = ' ')

True

True

0 1 2 3 4 

In [45]:
# 普通 function 来实现 Fibonacci numbers:
# 1, 1, 2, 3, 5, 8, 13 ...
def fib(max):
    a, b = 1, 1 # first two numbers
    n = 0 # nb of iterations
    while n < max:
        print(b, end=' ')
        b, a = a, a + b # Fucking cool
        n += 1
    return 'done'
fib(10)

1 1 2 3 5 8 13 21 34 55 

'done'

然后我们尝试用 generator 来实现 Fibonacci numbers:
- print(x) --> yield x
- 因为上面 print 出一个值再计算一个值的思想跟 generator 很像， **所以我们只需要将 print 改为 yield****
- 若一个函数包含关键字 **yield** 那么 它就不是普通 function 而是 generator

In [44]:
def fib_gen(max):
    a, b = 1, 1 # first two numbers
    n = 0 # nb of iterations
    while n < max:
        yield b 
        b, a = a, a + b
        n += 1
    return 'done'

for nb in fib_gen(10):
    print(nb, end=' ')

1 1 2 3 5 8 13 21 34 55 

In [42]:
# All natural numbers!!
# fucking cool
def natural_nb():
    x = 1
    while True:
        yield x
        x += 1
        
nnb = natural_nb()
for i in range(50):
    print(next(nnb), end=' ')

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 

这里，最难理解的就是generator和函数的执行流程不一样。函数是顺序执行，遇到return语句或者最后一行函数语句就返回。而变成generator的函数，__在每次调用next()的时候执行，遇到yield语句返回，再次执行时从上次返回的yield语句处继续执行__
- **1. next() 只会返回下一个 yield 的值**
- **2. 永远不会返回 return 的值**
- **3. 如果想返回 return 的值，需要 catch StopIteration 的错误**

In [47]:
gen = fib_gen(10) # defined above, final return NONE
while True:    
    try:
        print(next(gen))
    except StopIteration as e:
        print(e)
        break

# BREAK is very important here!!

1
1
2
3
5
8
13
21
34
55
done


让我们用 generator 实现一个杨辉三角


In [16]:
%%time
# 普通 function 的实现方法
# 期待输出:
# [1]
# [1, 1]
# [1, 2, 1]
# [1, 3, 3, 1]
# [1, 4, 6, 4, 1]
# [1, 5, 10, 10, 5, 1]
# [1, 6, 15, 20, 15, 6, 1]
# [1, 7, 21, 35, 35, 21, 7, 1]
# [1, 8, 28, 56, 70, 56, 28, 8, 1]
# [1, 9, 36, 84, 126, 126, 84, 36, 9, 1]
def triangle(n):
    old_list = [1]
    layer = 1
    while layer <= n:
        print(old_list)
        new_list = [None] * (layer + 1)
        for i in range(layer + 1):
            if i == 0 or i == len(old_list):
                new_list[i] = 1
            else:
                new_list[i] = old_list[i - 1] + old_list[i]
        layer += 1
        old_list = new_list
triangle(10)

[1]
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]
[1, 5, 10, 10, 5, 1]
[1, 6, 15, 20, 15, 6, 1]
[1, 7, 21, 35, 35, 21, 7, 1]
[1, 8, 28, 56, 70, 56, 28, 8, 1]
[1, 9, 36, 84, 126, 126, 84, 36, 9, 1]
CPU times: user 125 µs, sys: 0 ns, total: 125 µs
Wall time: 130 µs


In [33]:
# generator 实现方法
# 仅仅是把 print 换成了 yield
def triangle_gen(n):
    old_list = [1]
    layer = 1
    while layer <= n:
        yield old_list
        new_list = [None] * (layer + 1)
        for i in range(layer + 1):
            if i == 0:
                new_list[i] = 1
            elif i == len(old_list):
                new_list[i] = 1
            else:
                new_list[i] = old_list[i - 1] + old_list[i]
        layer += 1
        old_list = new_list
triangle_gen(10)
for i in triangle_gen(10):
    print(i)

# 卧槽，竟然想出来了

<generator object triangle_gen at 0x104fcd150>

[1]
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]
[1, 5, 10, 10, 5, 1]
[1, 6, 15, 20, 15, 6, 1]
[1, 7, 21, 35, 35, 21, 7, 1]
[1, 8, 28, 56, 70, 56, 28, 8, 1]
[1, 9, 36, 84, 126, 126, 84, 36, 9, 1]


#### 可以被 for 迭代的有两种结构：
- Iterable: str, list, tuple, dict, set
- Iterator: generator
- 他们的区别为： Iterator 可以用 next() 但 Iterable 那些不能

迭代器 Iterator
- 可以被next( )函数调用并不断返回下一个值的对象称为迭代器：Iterator。
- 可以使用 isinstance() 判断一个对象是否是 Iterator 对象 （注意不是 Iterable)

iter( ) 可以把 str, list dict 等 变为 Iterator

In [18]:
from collections import Iterable, Iterator
isinstance([1, 2, 3], Iterable)
isinstance([1, 2, 3], Iterator)

False

In [19]:
# 把 list 变成 iter
isinstance(iter([1, 2, 3]), Iterator)

True

这是因为Python的Iterator对象表示的是一个数据流，Iterator对象可以被next()函数调用并不断返回下一个数据，直到没有数据时抛出StopIteration错误。可以把这个数据流看做是一个有序序列，但我们却不能提前知道序列的长度，只能不断通过next()函数实现按需计算下一个数据，所以Iterator的计算是惰性的，只有在需要返回下一个数据时它才会计算。

Iterator甚至可以表示一个无限大的数据流，例如全体自然数。而使用list是永远不可能存储全体自然数的。

**Python的for循环本质上就是通过不断调用next()函数实现的**