这本书关心的是如何用Python对数据进行处理和清洗等操作

* [anaconda command]()
* [jupyter totorial]()
* [numpy tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)
* [matplotlib tutorial]()
* [scipy tutorial]()
* [pandas tutorial]()
* [scikit-learn tutorial](http://scikit-learn.org/stable/documentation.html)

其他
* [tmux tutorial和配置](https://www.jianshu.com/p/fd3bbdba9dc9)
* [python字符编码问题](http://liujiacai.net/blog/2015/11/20/strings/)
* [github-cheat-sheet-zh](https://github.com/tiimgreen/github-cheat-sheet/blob/master/README.zh-cn.md)
* [统计学中的协方差,相关系数概念](https://www.zhihu.com/question/20852004)

#### 术语表

* pseudocode 英 ['sjuːdəʊ]
* Jargon 美['dʒɑrɡən] 行话
* ndarray numpy的多维数组对象

### python基础（常用表达方法）
```
for i in range(len(number))
'yes' if x>0 else 'false'

template = '{0:.2f} {1:s} are worth US${2:d}'
template.format(4.5560, 'Argentine Pesos', 1)

#读取文件
with open(filename) as f:
    lines = [x.rstrip() for x in f]
```

### Numpy
see official [tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)

下面是这个tutorial的简单笔记,复习请见cheat sheet

In [68]:
import numpy as np

num = np.array([[1,2,3],[4,5,6]])
#更好的方法是
num = np.arange(1,7).reshape(2,3)
print(num)
#The first axis has a length of 2, 
#the second axis has a length of 3.


[[1 2 3]
 [4 5 6]]


### numpy 异同
* np.eyes 和 np.identity
* np.arange 和 np.linspace

### ndarray的多维数组
自己容易弄错，所以单独讲一下

In [48]:
# 3d array and its axes
c = np.arange(24).reshape(2,3,4) 
print(c)
print('axis=0')
print(c.sum(axis=0)) #这里是一个垂直轴，所以0+12=12 4+16=20
print('axis=1')
print(c.sum(axis=1))
print('axis=2')
print(c.sum(axis=2))

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
axis=0
[[12 14 16 18]
 [20 22 24 26]
 [28 30 32 34]]
axis=1
[[12 15 18 21]
 [48 51 54 57]]
axis=2
[[ 6 22 38]
 [54 70 86]]


In [45]:
b = np.arange(1,13).reshape(3,4)
print(b)
print(b.sum(axis=0)) #顺着axis的轴合计计算

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[15 18 21 24]


### numpy索引
其实只需要明白下面的等价关系就可以了
* `a[x,y] ==> a[x][y]`
* `a[0] ==> a[0, :]==> a[0, ...]`
* **dots** `...`的用法：如果a有5个axes，则，`a[1, ..., 2] ==> a[1, :, :, :, 2]`

In [57]:
a = np.arange(1,10,1)
print(a)
print(a[: : -1]) # reversed
a[:6:2]= 10 #神奇的赋值操作
print(a)


[1 2 3 4 5 6 7 8 9]
[9 8 7 6 5 4 3 2 1]
[10  2 10  4 10  6  7  8  9]


In [59]:
def f(x,y):
    return 10*x+y
a = np.fromfunction(f, (5,4), dtype=int)
a

array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])

In [60]:
a[2,3]

23

In [61]:
a[2][3]

23

In [62]:
a[0:5, 0]

array([ 0, 10, 20, 30, 40])

In [64]:
a[1:3, 1:3]

array([[11, 12],
       [21, 22]])

In [65]:
a[0]

array([0, 1, 2, 3])

In [66]:
a[0, :]

array([0, 1, 2, 3])

**列表中的遍历操作**  
`for element in a.flat`

### numpy shape操作
下面的操作不改变原来的a,而是返回一个新的

In [70]:
a = np.arange(12).reshape(3,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [71]:
a.reshape(12) # 1 dimension equivalent to a.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [74]:
a.reshape(1,12) # 2 dimension

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])

In [76]:
a.T

array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

In [79]:
# but this one modify itself
a.resize(2,6)
a

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

### numpy stacking
* np.vstack((a, b))
* np.hstack((a, b))
* a[:, np.newaxis] #增加一个新的轴
* np.concatenate #下面的直接help命令查用法吧
* np.r_
* np.c_

In [81]:
np.full?

In [83]:
e = np.full((3,2), 7)
e

array([[7, 7],
       [7, 7],
       [7, 7]])

In [105]:
text = np.random.random((3,4))
print(text)
print(text.max(axis=0))
print(text.argmax(axis=0))

[[0.0564403  0.47493228 0.20812884 0.5467776 ]
 [0.1188882  0.34005564 0.80132848 0.6581348 ]
 [0.77168612 0.58976652 0.9918916  0.24498047]]
[0.77168612 0.58976652 0.9918916  0.6581348 ]
[2 2 2 1]


In [93]:
text.astype(int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

#### 比较两个数组是否相等
* np.all(a == b)
* np.array_equal(a, b)
#### 相当重要的fancy indexing
直接看tutorial吧= = 

In [117]:
### ndarray split
a = np.floor(10*np.random.random((4,6)))
print(a)
np.hsplit(a, 3)    # split into 3 ndarray
np.hsplit(a, (3,2)) # split at 3 and 2 columns

[[8. 6. 8. 9. 7. 8.]
 [7. 4. 1. 3. 0. 2.]
 [2. 5. 9. 9. 1. 6.]
 [9. 3. 2. 5. 3. 8.]]
[[8. 6.]
 [7. 4.]
 [2. 5.]
 [9. 3.]]
[[8. 9.]
 [1. 3.]
 [9. 9.]
 [2. 5.]]
[[7. 8.]
 [0. 2.]
 [1. 6.]
 [3. 8.]]


[array([[8., 6., 8.],
        [7., 4., 1.],
        [2., 5., 9.],
        [9., 3., 2.]]),
 array([], shape=(4, 0), dtype=float64),
 array([[8., 9., 7., 8.],
        [1., 3., 0., 2.],
        [9., 9., 1., 6.],
        [2., 5., 3., 8.]])]

In [125]:
a = np.arange(12).reshape(2,6)
b = a.view()
print(a)
#b.shape = 3,4 #这样不会改变a，是不是很神奇
b.reshape(3,4) #这样会改变a
print(b)
print(a)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


### Functions and Methods Overview
tutorial 这一节很重要，相当于一个cheat sheet了