In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
s = pd.Series([1,3,5,np.nan,6,8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [5]:
pd.Series?

Series 是一种类似于一维数组的对象，他由一维数组（Numpy数据类型）以及相关的数据标签组成
```
pd.Series(self, data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
Parameters
----------
data : array-like, dict, or scalar value
    Contains data stored in Series
index : array-like or Index (1d)
    Values must be unique and hashable, same length as data. Index
    object (or other iterable of same length as data) Will default to
    RangeIndex(len(data)) if not provided. If both a dict and index
    sequence are used, the index will override the keys found in the
    dict.
dtype : numpy.dtype or None
    If None, dtype will be inferred
copy : boolean, default False
    Copy input data
```

In [6]:
s.values

array([  1.,   3.,   5.,  nan,   6.,   8.])

In [7]:
s.index

RangeIndex(start=0, stop=6, step=1)

In [8]:
s = pd.Series([1,3,5,8],index = ['v','d','f','g'])
s

v    1
d    3
f    5
g    8
dtype: int64

## 与Numpy相比，可以用索引得到Series的一个或者一组数据

In [10]:
print(s['d'])
print(s[['g','d']])

3
g    8
d    3
dtype: int64


## 可以进行NumPy数组运算，而且会保留索引与值之间的链接。

In [11]:
s[s>3]

f    5
g    8
dtype: int64

In [12]:
s*2

v     2
d     6
f    10
g    16
dtype: int64

## 可以将Series看成定长的有序字典，它的索引值是一个映射。它可以用在原本需要字典参数的函数中 

In [13]:
'g' in s

True

##  可以用字典来创建Series

s = Series（dic）

## pandas 中的isnull 和notnull可以判断缺失数据

In [19]:
dic = ['v','d','f','e']
sdata = {'v':1,'d':2,'f':2,"g":3}
obj1 = pd.Series(sdata)
print obj1
obj = pd.Series(sdata,dic)
print obj
pd.isnull(obj)

d    2
f    2
g    3
v    1
dtype: int64
v    1.0
d    2.0
f    2.0
e    NaN
dtype: float64


v    False
d    False
f    False
e     True
dtype: bool

## Series 有一个重要的功能：在算术运算中会自动对齐不同索引的数据。 

In [20]:
obj1 + obj

d    4.0
e    NaN
f    4.0
g    NaN
v    2.0
dtype: float64

## Series 对象本身以及索引都有一个name的属性，该属性和pandas的其他功能非常密切。 

In [21]:
obj1.name = 'dic'
obj1.index.name = 'index'
obj1

index
d    2
f    2
g    3
v    1
Name: dic, dtype: int64

## Series 可以用个赋值方式进行修改

In [22]:
obj1.index = ['a','b','c','d']
obj1

a    2
b    2
c    3
d    1
Name: dic, dtype: int64

# DataFrame

DataFrame 是一种表格形式的数据结构，它含有一组有序的列，每列可以使不同的值类型。DataFrame由行索引和列索引。DataFrame中的数据是以一个或多个二维块存放的。
创建方法： 可以传入**等长**列表或者Numpy数组组成的**字典**。
**结果dataframe会自己加入行索引。**
**也可以自己加入行序列（一个list）**

In [24]:
pd.DataFrame?

```
Init signature: pd.DataFrame(self, data=None, index=None, columns=None, dtype=None, copy=False)
Docstring:     
Two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). Arithmetic operations
align on both row and column labels. Can be thought of as a dict-like
container for Series objects. The primary pandas data structure

Parameters
----------
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
    Dict can contain Series, arrays, constants, or list-like objects
index : Index or array-like
    Index to use for resulting frame. Will default to np.arange(n) if
    no indexing information part of input data and no index provided
columns : Index or array-like
    Column labels to use for resulting frame. Will default to
    np.arange(n) if no column labels are provided
dtype : dtype, default None
    Data type to force, otherwise infer
copy : boolean, default False
    Copy data from inputs. Only affects DataFrame / 2d ndarray input
Examples
--------
>>> d = {'col1': ts1, 'col2': ts2}
>>> df = DataFrame(data=d, index=index)
>>> df2 = DataFrame(np.random.randn(10, 5))
>>> df3 = DataFrame(np.random.randn(10, 5),
...                 columns=['a', 'b', 'c', 'd', 'e'])
```

## 和Series一样，找不到数据时候会产生NA值。

## 可以用字典形式得到列的一个Series，而且索引值与DataFrame一样

In [27]:
df2 = pd.DataFrame(np.random.randn(10, 5))
df2

Unnamed: 0,0,1,2,3,4
0,0.044882,1.947019,0.167263,-0.645261,1.325821
1,-1.629131,0.762345,-0.412886,-0.813852,1.878055
2,-0.722374,0.197613,2.042455,0.093834,-0.837068
3,-0.938291,1.616222,-2.017747,0.353953,0.582901
4,-0.952369,-0.777576,0.042165,-1.369082,1.76886
5,-0.879009,0.320341,-0.761303,1.855261,-0.509548
6,1.17058,-0.581491,-0.115374,-0.321679,1.11684
7,-1.42855,-0.948327,0.849442,0.774132,-0.388989
8,-0.497311,-0.23118,-1.56849,0.431784,-0.047164
9,0.457726,0.598834,1.38831,0.333702,-1.770764


In [28]:
df2[0]

0    0.044882
1   -1.629131
2   -0.722374
3   -0.938291
4   -0.952369
5   -0.879009
6    1.170580
7   -1.428550
8   -0.497311
9    0.457726
Name: 0, dtype: float64

## 也可以得到对应的行，通过位置或者名称的方式进行获取，比如说用索引字段ix

In [29]:
df2.ix[1]

0   -1.629131
1    0.762345
2   -0.412886
3   -0.813852
4    1.878055
Name: 1, dtype: float64

## 列也可以通过赋值的方式进行修改，比如说给一个空的列赋值 

In [30]:
df2['new'] = 16.5
df2

Unnamed: 0,0,1,2,3,4,new
0,0.044882,1.947019,0.167263,-0.645261,1.325821,16.5
1,-1.629131,0.762345,-0.412886,-0.813852,1.878055,16.5
2,-0.722374,0.197613,2.042455,0.093834,-0.837068,16.5
3,-0.938291,1.616222,-2.017747,0.353953,0.582901,16.5
4,-0.952369,-0.777576,0.042165,-1.369082,1.76886,16.5
5,-0.879009,0.320341,-0.761303,1.855261,-0.509548,16.5
6,1.17058,-0.581491,-0.115374,-0.321679,1.11684,16.5
7,-1.42855,-0.948327,0.849442,0.774132,-0.388989,16.5
8,-0.497311,-0.23118,-1.56849,0.431784,-0.047164,16.5
9,0.457726,0.598834,1.38831,0.333702,-1.770764,16.5


In [32]:
df2['new'] = np.arange(10)
df2

Unnamed: 0,0,1,2,3,4,new
0,0.044882,1.947019,0.167263,-0.645261,1.325821,0
1,-1.629131,0.762345,-0.412886,-0.813852,1.878055,1
2,-0.722374,0.197613,2.042455,0.093834,-0.837068,2
3,-0.938291,1.616222,-2.017747,0.353953,0.582901,3
4,-0.952369,-0.777576,0.042165,-1.369082,1.76886,4
5,-0.879009,0.320341,-0.761303,1.855261,-0.509548,5
6,1.17058,-0.581491,-0.115374,-0.321679,1.11684,6
7,-1.42855,-0.948327,0.849442,0.774132,-0.388989,7
8,-0.497311,-0.23118,-1.56849,0.431784,-0.047164,8
9,0.457726,0.598834,1.38831,0.333702,-1.770764,9
