# Numpy

## numpy简介
1. 强大的ndarray多维数组结构
2. 成熟的函数库
3. 用于整合C/C++等代码的工具包
4. 实用的线性代数\傅里叶变换和随机数模块

numpy使用之前需要导入

In [2]:
import numpy as np
my_array = np.array([1,2,3,4,5])
my_array

array([1, 2, 3, 4, 5])

在数学中,向量(也称为欧几里得向量\几何向量\矢量),指具有大小(magnitude)和方向的量.  
在numpy中,向量(Vector)即一维数组,可以使用arange函数来创建向量.  

In [6]:
vec = np.arange(10)
vec#左闭右开

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:
type(vec)

numpy.ndarray

In [8]:
vec.dtype

dtype('int64')

In [9]:
vec*10

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [11]:
np.linspace(1,19,10)

array([ 1.,  3.,  5.,  7.,  9., 11., 13., 15., 17., 19.])

In [12]:
np.linspace(1,19,10,endpoint = False)

array([ 1. ,  2.8,  4.6,  6.4,  8.2, 10. , 11.8, 13.6, 15.4, 17.2])

In [13]:
np.zeros(20,np.int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [14]:
np.ones(20,np.int)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [15]:
np.random.randn(10)#正态分布

array([ 0.20368586,  1.02255915,  0.38929251, -1.92392919,  1.10320389,
        0.75780325,  0.47957192, -1.29807242,  0.60620074,  0.9465564 ])

## 索引和切片

numpy同python基础库中的序列一样,数组的索引和切片也同样使用中括号`[]`选定下标来实现,也可以使用`:`分割起止位置与间隔,用`,`表示不同的纬度.  

In [16]:
a = np.arange(1,10,1)
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
a[1]

2

In [18]:
a[1:3]#左闭右开

array([2, 3])

In [20]:
a[:3]#从起始位置开始

array([1, 2, 3])

In [21]:
a[-2]#从右-1开始数

8

In [24]:
a[::-1] #从索引 : 开始到索引 : 停止，间隔为 -1

array([9, 8, 7, 6, 5, 4, 3, 2, 1])

In [25]:
b = np.arange(24).reshape(2,3,4)
b

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [26]:
b[1,1,3]

19

In [27]:
b[0,1,:]

array([4, 5, 6, 7])

In [28]:
b[0,2]

array([ 8,  9, 10, 11])

In [29]:
b[1]

array([[12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [30]:
b[:,1]

array([[ 4,  5,  6,  7],
       [16, 17, 18, 19]])

In [31]:
b[:,:,1]

array([[ 1,  5,  9],
       [13, 17, 21]])

逻辑索引可以通过指定布尔数组或者条件进行索引

In [32]:
c =np.arange(1,20,1)
c

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [33]:
c[c>=15]

array([15, 16, 17, 18, 19])

In [34]:
c[~(c>=15)]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [35]:
c[(c>=5)&(c<=15)]

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

## 矩阵应用
矩阵是numpy提供的另外一种数据类型,可以使用mat或者matrix函数将数组转化为矩阵.

In [36]:
mat1 = [[1,2,3],[4,5,6]]
np.mat(mat1)

matrix([[1, 2, 3],
        [4, 5, 6]])

In [37]:
np.mat(mat1)*8

matrix([[ 8, 16, 24],
        [32, 40, 48]])

In [39]:
mat2=np.mat([[1,2,3],[4,5,6],[7,8,9]])
mat2

matrix([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [40]:
mat1*mat2#矩阵乘法

matrix([[30, 36, 42],
        [66, 81, 96]])

In [42]:
mat2.T#矩阵转置

matrix([[1, 4, 7],
        [2, 5, 8],
        [3, 6, 9]])

In [43]:
mat3 = np.mat(np.zeros((3,3)))
mat3

matrix([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

In [45]:
mat4 = np.mat(np.ones((2,4)))
mat4

matrix([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [46]:
mat5 = np.mat(np.random.rand(2,2))
mat5

matrix([[0.09332519, 0.14980131],
        [0.52108594, 0.24647676]])

In [47]:
mat6=np.mat(np.eye(2,2,dtype=int))
mat6

matrix([[1, 0],
        [0, 1]])

In [51]:
a1=[1,2,3]
a2=np.mat(np.diag(a1))
a2

matrix([[1, 0, 0],
        [0, 2, 0],
        [0, 0, 3]])

## 基础项目练习

读入数据
```python
c,v=np.loadtxt('datalab/28801/data.csv',delimiter=',', usecols=(6,7), unpack=True)
#delimiter分割 usecols使用的列 unpack分别把两列数据放进c,v中
```

In [64]:
c = np.array([336.1,339.32,345.03,344.32,343.44,346.5,351.88,355.2,358.16,354.54,356.85,359.18,359.9,363.13,358.3,
            350.56,338.61,342.62,342.88,348.16,353.21,349.31,352.12,359.56,360.,355.76,352.47,321.5,360.54,355.5])
v = np.array([21144800.,13473000.,15236800.,9242600.,14064100.,11494200.,17322100.,13608500.,17240800.,33162400.,
             13127500.,11086200.,10149000.,17184100.,18949000.,29144500.,31162200.,23994700.,17853500.,13572000.,
             14395400.,16290300.,21521000.,17885200.,16188000.,19504300.,12718000.,16192700.,18138800.,16824200.])
c,v

(array([336.1 , 339.32, 345.03, 344.32, 343.44, 346.5 , 351.88, 355.2 ,
        358.16, 354.54, 356.85, 359.18, 359.9 , 363.13, 358.3 , 350.56,
        338.61, 342.62, 342.88, 348.16, 353.21, 349.31, 352.12, 359.56,
        360.  , 355.76, 352.47, 321.5 , 360.54, 355.5 ]),
 array([21144800., 13473000., 15236800.,  9242600., 14064100., 11494200.,
        17322100., 13608500., 17240800., 33162400., 13127500., 11086200.,
        10149000., 17184100., 18949000., 29144500., 31162200., 23994700.,
        17853500., 13572000., 14395400., 16290300., 21521000., 17885200.,
        16188000., 19504300., 12718000., 16192700., 18138800., 16824200.]))

In [65]:
#VWAP:成交量加权平均价格,某个价格的成交量越高,该价格所占的比重越大.所以VWAP就是以成交量为权重计算出来的加权平均值
vwap = np.average(c,weights=v)
vwap

350.15861758074186

In [66]:
#算术平均值函数
mean = np.mean(c)
mean

350.4883333333334

In [67]:
#时间加权平均价格TWAP,属于另一种平均价格的指标
t = np.arange(len(c))
twap = np.average(c, weights=t)
twap


351.45917241379306

In [80]:
h = np.random.randint(320,360,30)
l = np.random.randint(400,450,30)
h,l

(array([321, 357, 344, 348, 356, 324, 344, 330, 352, 357, 325, 326, 340,
        321, 337, 338, 358, 325, 321, 334, 330, 324, 345, 356, 337, 350,
        337, 355, 353, 351]),
 array([425, 425, 427, 433, 418, 444, 411, 447, 436, 432, 431, 407, 438,
        441, 448, 448, 403, 408, 435, 444, 437, 404, 449, 440, 448, 430,
        404, 420, 431, 405]))

In [81]:
#最大值和最小值
#最高价的最大值和最低价的最小值
highest = np.max(h)
lowest = np.min(l)
highest,lowest

(358, 403)

In [83]:
#ptp函数可以计算数组的取值范围,返回数组元素的最大值和最小值之间的差值
spread_high_price = np.ptp(h)
spread_low_price = np.ptp(l)
spread_high_price,spread_low_price

(37, 46)

In [84]:
#简单统计分析
median = np.median(c)#计算收盘价中的中位数
median

352.295

In [85]:
variance = np.var(c)#计算方差
variance

81.29190055555559

In [86]:
#计算简单收益率,即相邻两个价格的变化率,指的是所有价格取对数之后两两之间的差值
returns = np.diff(c)/c[:-1] #diff函数就是执行的是后一个元素减去前一个元素
returns

array([ 0.00958048,  0.01682777, -0.00205779, -0.00255576,  0.00890985,
        0.0155267 ,  0.00943503,  0.00833333, -0.01010721,  0.00651548,
        0.00652935,  0.00200457,  0.00897472, -0.01330102, -0.02160201,
       -0.03408832,  0.01184253,  0.00075886,  0.01539897,  0.01450483,
       -0.01104159,  0.00804443,  0.02112916,  0.00122372, -0.01177778,
       -0.00924781, -0.08786563,  0.12143079, -0.01397903])

In [87]:
#用std函数计算标准差
std_deviation = np.std(returns)
std_deviation

0.030447566955357278

In [88]:
#计算对数收益率
logreturns = np.diff(np.log(c))
logreturns

array([ 0.00953488,  0.01668775, -0.00205991, -0.00255903,  0.00887039,
        0.01540739,  0.0093908 ,  0.0082988 , -0.01015864,  0.00649435,
        0.00650813,  0.00200256,  0.00893468, -0.01339027, -0.02183875,
       -0.03468287,  0.01177296,  0.00075857,  0.01528161,  0.01440064,
       -0.011103  ,  0.00801225,  0.02090904,  0.00122297, -0.01184769,
       -0.00929083, -0.09196797,  0.11460536, -0.01407766])

In [90]:
#波动率
annual_volatility = np.std(logreturns)/np.mean(logreturns)
annual_volatility = annual_volatility/np.sqrt(1./252.)
annual_volatility

246.1051999253604