# Python Fundamentals

This is my review note of Python for the purpose of self-study. The note mixes up with English & Chinese.
- Part I follows the [Liao's Python tutorial](https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000) (in Chinese)

In [48]:
# Display multiple interactive objects in one shell
# No Need for print function
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Part I

### 1. Python Basics (Encoding/formating/list/tuple/if/loop/dict/set)
- 缩进为4个空格
- Python's str use Unicode
- If you want to transform between str and bytes, please use UTF-8
- 编码：
	- 最早只有127个字符被编码到计算机里，也就是大小写英文字母、数字和一些符号，这个编码表被称为ASCII编码
	- 其他国家语言加入进去，中文GB2312，日文Shift_JIS等等。编码不同，会产生乱码。
	- 为了世界和平，产生了 Unicode 包含所有语言
	- 问题在于 Unicode 编码通常为 2 个字节 而 ASCII 只需要 1个字节
	- 本着节约储存空间的原则，出现了把Unicode编码转化为“可变长编码”的UTF-8 编码， 可以把 Unicode 转化为 1-6 个字节 （英文 1 个， 中文 3 个， 稀有的特殊符号6个 等）。
	- 在计算机内存中，统一使用 Unicode 编码， 当要保存到硬盘上的时候再用 Unicode 转换成 UTF-8 节省空间。
	- 例如在记事本中打字的时候，编码为 Unicode， 保存为 abc.txt 的时候， 编码为 UTF-8。 读取文件时候 会把编码从 UTF-8 转换为 Unicode 显示在笔记本中。
	- 例如浏览网页的时候，服务器里储存的为 Unicode， 会把其转化为 UTF-8 显示到浏览器上。

In [80]:
# 如果知道字符的整数编码，可以用16进制这么写 str
# COOL!
'\u4e2d\u6587'
'\u6211\u7684\u0020\u0070\u0079\u0074\u0068\u006f\u006e\u0020\u0036\u0036\u0036'

'中文'

'我的 python 666'

若在网络传输或保存磁盘，需要把 str 中的 char (unicode) 变为字节单位 bytes

In [81]:
'xyz'
b'xyz'

'xyz'

b'xyz'

In [82]:
# encode
# in bytes, it will show '\x##' if the char is not in ascii
print('xyz'.encode('ascii'))
print('xyz'.encode('utf-8'))
print('哈の哈'.encode('GB2312')) # why it can encode Japanese??
print('哈哈'.encode('Shift_JIS')) # why the fuck it can encode Chinese??

b'xyz'
b'xyz'
b'\xb9\xfe\xa4\xce\xb9\xfe'
b'\x99\xfb\x99\xfb'


In [83]:
# Chinese cannot encoded by ascii
'哈哈'.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Reversely, if we want to get the str from the disk (bytes), we need to decode by different encoders (ascii/utf-8...).

In [115]:
b'xyz'.decode('ascii')
b'\xb9\xfe\xa4\xce\xb9\xfe'.decode('GB2312')

'xyz'

'哈の哈'

In [116]:
# similarly, utf-8 cannot decode GB2312
b'\xb9\xfe\xa4\xce\xb9\xfe'.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 0: invalid start byte

In [117]:
# you could choose to ignore the errors, in order to decode the understood part
b'\xb9\xfe\xa4\xce\xb9\xfe'.decode('utf-8', errors='ignore')
'ι'.encode('utf-8') # understood part, not exactly what we want

'ι'

b'\xce\xb9'

In [120]:
# len() returns the number of char in a str
len('哈哈')
len('哈哈'.encode('utf-8'))
'哈哈'.encode('utf-8') # disk space 2 chinese char = 6 bytes

2

6

b'\xe5\x93\x88\xe5\x93\x88'

为了避免乱码问题，应该始终坚持用 utf-8 进行 str / byte 转化

程序一般会以这两行开头，第一行告诉 linux/osx 系统用 python3 运行(windows will ignore)，第二行告诉 python 按照 utf-8 读源代码，否则读取中文时候会出现乱码
另外要确定 文本编辑器 正在使用 utf-8 without BOM 编码.

In [121]:
#!/usr/bin/python3
# -*- coding: utf-8 -*-

格式化 Formatting
Two ways:
- 1. % (%d 和 %f 可以指定是否 补0以及位数)
    - %d integer
    - %f float
    - %s str
    - %x 十六进制整数

- 2. format()

In [84]:
'%2d-%02d' % (3,1)
'%.2f' % (1.236)

' 3-01'

'1.24'

If you don't know which one to use, then just use %s. (convert anything to str)

In [85]:
'my name is %s, I\'m %s' % ('Michael', 24)

"my name is Michael, I'm 24"

In [86]:
# format()
'{}\'s grade increase from {} to {}'.format('Michael', 80, 100)

"Michael's grade increase from 80 to 100"

List/Tuple
- List 是一个 可变的有序列表
- Tuple 是个 不可变的有序列表
- 能用 tuple 就不要用 list, tuple 更安全

In [122]:
# when define a tuple with one element, there is a trick
t1 = (1)
t2 = (1,) # right
t1
t2

1

(1,)

If statement

In [123]:
# simple and pythonic way

x = -1 # not zero int, True
y = 'a' # not empty str, True
z = [1, 2, 3] # note empty list, True

if x and y and z:
    print('This is totally True')

This is totally True


Loop

Actually nothing worth take notes here

Dict/ Set
- Compared to List, Dict's search speed is very fast, because it use the key to track the value like a dictionary
- Compared to List, Dict will use lots of disk space (space/speed trade-off)
- Key must be 不可变对象 (str/int...) but not list

- Set means methamatical set
- Set only store keys but not values
- add() / remove() elements
- & | set operators

In [106]:
# one way to create dict
student = ['a', 'b', 'c']
grades = [95, 91, 99]
record = dict(zip(student, grades))
record

{'a': 95, 'b': 91, 'c': 99}

In [107]:
# delete a key and its value
record.pop('a')
record

95

{'b': 91, 'c': 99}

In [111]:
s1 = set([1, 2, 3])
s2 = set([1, 3, 4])
s1 & s2
s1 | s2

{1, 3}

{1, 2, 3, 4}

### 2. Function
- help(function)
- since functions are objects (everything in python is object), we give a nickname to a function
- if there is no return in a function, it automatically return None

In [38]:
# nickname a function
michael = abs
michael(-3)

3

In [2]:
# check type of parameters
def foo(x):
    if not isinstance(x, (int, float)):
        raise TypeError('x must be int or float')
    pass

foo('a')

TypeError: x must be int or float

函数在返回多值时实际上是返回了一个 可被拆包的 tuple, 

In [3]:
from math import hypot
def triangle_property(x, y):
    _sum = x + y
    _hypot = hypot(x, y)
    if _hypot >= _sum:
        _property = True
    else:
        _property = False
    return _sum, _hypot, _property

s, h, p = triangle_property(3, 4)
return_values = triangle_property(3, 4)
return_values
if (s, h, p) == return_values:
    print('Equal')

Equal


\*args

In [14]:
def sum_elements(*args):
    print(type(args))
    print(args) # 已经拆包成功
    return sum(x for x in args)

sum_elements(1, 2, 3)

<class 'tuple'>
(1, 2, 3)


6

In [15]:
# what if I want to input a list?
nbs = [1, 2, 3]
sum_elements(*nbs) # 进行主动拆包

<class 'tuple'>
(1, 2, 3)


6

以上可以看出 * 的本质 就是把 元组拆包

而 ** 的作用是把 dict拆包

\**kwargs

In [33]:
def information(name, age, **kwargs):
    if 'smart' in kwargs:   # check if the user input the key parameter 'smart'
        pass
    print("name:", name, "age:", age, "others:", kwargs)
information('Michael', 13, handsome='very', smart='extremely', genius=True)

name: Michael age: 13 others: {'smart': 'extremely', 'handsome': 'very', 'genius': True}


限制参数 （没看懂）

参数定义的顺序必须是：必选参数(positional arguments)、默认参数、可变参数、命名关键字参数(keyword-only arguments)和关键字参数

In [46]:
def foo(x, y, z=1, *args, **kwargs):
    print(x, y, z, args, kwargs)
def fooo(x, y, *, z, **kwargs):
    print(x, y, z, kwargs)

In [51]:
foo(1, 2, a=2)
fooo(1, 2, z=3)

1 2 1 () {'a': 2}
1 2 3 {}


虽然这么多种参数很酷炫，但是尽量不要用太多在一起。Because if you read some code you dont't understand, simply just shoot the coder.       -- Tim Peters

Recursion(函数调用自身函数）

In [76]:
def fac(n):
    if n == 1:
        return 1
    return n * fac(n - 1)
%timeit fac(200)

10000 loops, best of 3: 51.5 µs per loop


In [78]:
from functools import lru_cache
@lru_cache(128) # store the recent 128 variables into cache, should be faster
def fac(n):
    if n == 1:
        return 1
    return n * fac(n - 1)
%timeit fac(200)

The slowest run took 1035.38 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 148 ns per loop


Hell NO! WTF, why is it much slower??

使用递归函数的优点是逻辑简单清晰，缺点是过深的调用会导致栈溢出。

### 3.高级特性