In [1]:
import pandas as pd
import numpy as np

In [2]:
from pandas import Series, DataFrame

***常用的两个类***
## Series
--------
是一种一维的数组型对象,包含一个值序列(类似于NumPy中的类型),并且包含数据标签,称为索引index

In [3]:
obj = pd.Series([4,7,-5,3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

In [4]:
obj.index

RangeIndex(start=0, stop=4, step=1)

In [5]:
obj.values

array([ 4,  7, -5,  3], dtype=int64)

也可以自己创建一个索引序列

In [6]:
obj2 = pd.Series([4,7,-5,3] , index=['a','b','c','d'])
obj2

a    4
b    7
c   -5
d    3
dtype: int64

In [7]:
obj2.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [8]:
obj2['a']

4

np.exp()函数是求e^{x}的值的函数。

In [9]:
np.exp(obj2)

a      54.598150
b    1096.633158
c       0.006738
d      20.085537
dtype: float64

可以把Series认为成一个长度固定且有序(根据索引来排序)的字典Dict,可以用字典生成一个Series

In [10]:
'b' in obj2

True

In [11]:
sdata = {'quanzhou': 1000,'fuzhou': 2000}
obj3 = pd.Series(sdata)
obj3

quanzhou    1000
fuzhou      2000
dtype: int64

In [12]:
pd.isnull(obj3)

quanzhou    False
fuzhou      False
dtype: bool

In [13]:
sdata2 = {'quanzhou': 1000,'xiamen': 2000}
obj4 = pd.Series(sdata2)
obj3 + obj4

fuzhou         NaN
quanzhou    2000.0
xiamen         NaN
dtype: float64

Series对象自身和其索引都有name属性

In [14]:
obj4.name = 'statics'
obj4.index.name = 'city'
obj4

city
quanzhou    1000
xiamen      2000
Name: statics, dtype: int64

## DataFrame
---------
df表示的是矩阵的数据表,包含**已排列**的列集合,每一列可以是不同的值类型.既有行索引也有列索引,可以被视为一个共享相同索引的Series的字典

In [15]:
data = {
    'user_id': [1,2,3],
    'user_name': ['luke','consonnm','zebao'],
    'user_gender': ['boy','?','boy']
}
frame = pd.DataFrame(data)
frame

Unnamed: 0,user_id,user_name,user_gender
0,1,luke,boy
1,2,consonnm,?
2,3,zebao,boy


列的顺序是由自己设置的
似乎可以模拟数据库,继续往下学~

In [16]:
frame.loc[2]

user_id            3
user_name      zebao
user_gender      boy
Name: 2, dtype: object

In [17]:
frame2 = pd.DataFrame(data,columns=['user_id','user_name','user_gender','phone'],index=['one','two','three'])
frame2

Unnamed: 0,user_id,user_name,user_gender,phone
one,1,luke,boy,
two,2,consonnm,?,
three,3,zebao,boy,


In [18]:
frame2['phone'] = '110'
frame2

Unnamed: 0,user_id,user_name,user_gender,phone
one,1,luke,boy,110
two,2,consonnm,?,110
three,3,zebao,boy,110


In [19]:
tall = pd.Series([1.8,1.8],index=['one','three'])
frame2['height'] = tall
frame2

Unnamed: 0,user_id,user_name,user_gender,phone,height
one,1,luke,boy,110,1.8
two,2,consonnm,?,110,
three,3,zebao,boy,110,1.8


In [20]:
del frame2['phone']
frame2

Unnamed: 0,user_id,user_name,user_gender,height
one,1,luke,boy,1.8
two,2,consonnm,?,
three,3,zebao,boy,1.8


In [21]:
frame2.T

Unnamed: 0,one,two,three
user_id,1,2,3
user_name,luke,consonnm,zebao
user_gender,boy,?,boy
height,1.8,,1.8


In [22]:
frame2.index.name = 'id'
frame2.columns.name = 'info'
frame2

info,user_id,user_name,user_gender,height
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
one,1,luke,boy,1.8
two,2,consonnm,?,
three,3,zebao,boy,1.8


## 索引对象
------
索引对象是不可变的,用户不可修改索引对象

In [23]:
index = frame2.index
index

Index(['one', 'two', 'three'], dtype='object', name='id')

In [24]:
index[1] = 'a'

TypeError: Index does not support mutable operations

重建索引

In [26]:
obj = pd.Series([2,3,-1,4],index=['b','c','a','d'])
obj

b    2
c    3
a   -1
d    4
dtype: int64

In [27]:
obj2 = obj.reindex(['a','b','c','d','e'])
obj2

a   -1.0
b    2.0
c    3.0
d    4.0
e    NaN
dtype: float64

原有的索引会保留对应value值，之前不存在的索引会引入缺失值

In [28]:
obj3 = pd.Series(['blue','red','yellow'],index = [0,2,4])
obj3

0      blue
2       red
4    yellow
dtype: object

可以用method传入参数进行特殊重建索引的方法 比如ffill：向前填充

In [29]:
obj3.reindex(range(6),method='ffill')

0      blue
1      blue
2       red
3       red
4    yellow
5    yellow
dtype: object

试一下把DataFrame的索引重建

In [31]:
frame2.reindex(['one','three','four'])

info,user_id,user_name,user_gender,height
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
one,1.0,luke,boy,1.8
three,3.0,zebao,boy,1.8
four,,,,


columns关键词 用来重建列索引

In [32]:
col = ['user_name','user_gender','money']
frame2.reindex(columns=col)

info,user_name,user_gender,money
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
one,luke,boy,
two,consonnm,?,
three,zebao,boy,


使用loc可以DataFrame索引出行和列的子集

In [35]:
frame2.loc['one',['user_name','user_id']]

info
user_name    luke
user_id         1
Name: one, dtype: object

iloc与之类似，但是是用整数进行数据选择

In [36]:
frame2.iloc[2,[1,3]]

info
user_name    zebao
height         1.8
Name: three, dtype: object

In [41]:
frame3 = frame2
frame3 + frame2

info,user_id,user_name,user_gender,height
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
one,2,lukeluke,boyboy,3.6
two,4,consonnmconsonnm,??,
three,6,zebaozebao,boyboy,3.6


In [44]:
series = frame2.iloc[1]
series

info
user_id               2
user_name      consonnm
user_gender           ?
height              NaN
Name: two, dtype: object

注意：Series在DataFrame中只是一行，即使结构看起来应该是一列，但雀氏是一行的