# Python for Data Analysis
官方网页 http://pandas-docs.github.io/pandas-docs-travis/  
参考书籍：
[Python for Data Analysis](d:\Google Drive\Code_HELP\Python\Textbook General\Python for Data Analysis.Oreilly.pdf) 
[《利用Python进行数据分析》](d:\Google Drive\Code_Help\Python\Textbook General\Python for Data Analysis.中文.扫描.pdf)

In [2]:
# 导入常用Libraries
import pandas as pd
from pandas import DataFrame, Series
import numpy as np
import matplotlib.pyplot as plt
pd.options.display.max_rows = 11

# NumPy Basic
## The NumPy ndarrayy：一种多维数组对象

ndarray是一个通用的同构数据多维容器，其中所有的元素必须是相同类型的。  
每个数组都有一个`shape`和一个`dtype`

In [3]:
data = np.array([[0.9526, -0.246, -0.8856], [0.5639, 0.2379, 0.9104]])

In [4]:
data.shape

(2, 3)

In [5]:
data.dtype

dtype('float64')

用`np.array`构建新的数组，`zeros`和`ones`可以创建全0或是全1的数组。

In [6]:
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [7]:
np.ones((3,6))

array([[ 1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.]])

### 基本的索引和切片
跟列表最重要的区别在于，数组切片是原始数组的视图(views)。这意味着数组不会被复制，视图上的任何修改都会直接反映到源数组上：

In [8]:
l = [0,1,2,3,4,5,6,7,8,9,10]
a = np.array(l)

In [9]:
l_slice = l[3:5]
a_slice = a[3:5]

In [10]:
l_slice[0] = 12345
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [11]:
a_slice[0] = 12345
a

array([    0,     1,     2, 12345,     4,     5,     6,     7,     8,
           9,    10])

切片索引，可以在一个或多个轴上进行切片，也可以跟整数索引混合使用

In [12]:
array2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [13]:
array2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [14]:
array2d[:2, 1:] = 0
array2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

### 布尔型索引


In [15]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [16]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], 
      dtype='<U4')

In [17]:
names == 'Bob'

array([ True, False, False,  True, False, False, False], dtype=bool)

# Pandas: Getting Started

# Data loading, Storage, and File Formats

# Data Wrangling: Clean, Transform, Merge, Reshape

# Plotting and Visualization

# Data Aggregation and Group Operations

# Time Series

# Financial and Economic Data Applications

# Advanced Numpy

# 附录 Python语言精要

## 变量和按引用传递
Python在对变量赋值时，其实是在创建等号右侧对象的一个引用，如果将a赋值给新变量b，改变a的同时也会改变b

In [18]:
a = [1, 2, 3]
b = a
a.append(4)
b

[1, 2, 3, 4]

Python中将对象以参数的形式传入函数时，只是传入了一个引用而已。因此Python函数可以修改其参数的内容。

In [19]:
def append_element(some_list, element):
    some_list.append(element)

In [20]:
data  = [1, 2, 3]
append_element(data, 4)
data

[1, 2, 3, 4]

## 日期和时间
datetime, date, time

In [21]:
from datetime import datetime, date, time

In [22]:
dt = datetime(2011, 10, 29, 20, 30, 21)
dt

datetime.datetime(2011, 10, 29, 20, 30, 21)

In [23]:
dt.day

29

In [24]:
dt.date()

datetime.date(2011, 10, 29)

`strftime`将datetime格式化为字符串

In [25]:
dt.strftime('%m/%d/%Y %H:%M')

'10/29/2011 20:30'

`strptime`将字符串转化为`datetime`对象

In [26]:
datetime.strptime('20091031', '%Y%m%d')

datetime.datetime(2009, 10, 31, 0, 0)

In [27]:
# 替换datetime中的字段，对date同样适用
dt.replace(minute=0, second=0)

datetime.datetime(2011, 10, 29, 20, 0)

## Enumerate
同时返回位置和值(i, value)

In [28]:
some_list = ['foo', 'bar', 'baz']
mapping = dict((v, i) for i, v in enumerate(some_list))
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

## \*args, **kwargs
在python中，当编写`func(a, b, c, d=some, e=value)`时，位置和关键字参数其实分别被打包成`tuple`和`dict`类型。函数实际接收到一个`tuple` args和一个`dict` kwargs。

In [29]:
def say_hello_then_call_f(f, *args, **kwargs):
    print('args is', args)
    print('kwargs is ', kwargs)
    print("Hello! Now I'm going to call %s" % f)
    return f(*args, **kwargs)

def g(x, y, z=1):
    return (x + y) / z

In [30]:
say_hello_then_call_f(g, 1, 2, z=5.)

args is (1, 2)
kwargs is  {'z': 5.0}
Hello! Now I'm going to call <function g at 0x00000049AF5D0F28>


0.6

## Currying: Partial Argument Application
deriving new functionsn from existing ones by partial argument application

In [31]:
def add_numbers(x, y):
    return x + y

add_five = lambda y: add_numbers(5, y)

In [32]:
add_five(3)

8

此类方法可以方便实现变换time-interval的场景，比如计算60天，30天的moving average

In [33]:
# compute 60-day moving average of time series x
ma60 = lambda x: pandas.rolling_mean(x, 60)
# take the 60-day moving average of all time series in data
data.apply(ma60)

AttributeError: 'list' object has no attribute 'apply'

## Basic Slice
Python从0开始计数，data[a:b]不包括b