# pandas数据结构

Series：一种类似于一维数组的对象，它是由一组数据（各种Numpy数据类型）以及一组与之相关的数据标签（即索引）组成。仅由一组数据即可产生简单的series

DataFrame：一个表格型的数据结构，含有一组有序的列，每列可以是不同的值
类型（数值、字符串、布尔值等），DataFrame既有行索引也有列索引，可以被
看做是由Series组成的字典

## 创建series

### 通过一维数组创建

In [51]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame

In [52]:
arr=np.array([1,2,3,4,5])
series1=Series(arr)
series1

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [53]:
series1.index

RangeIndex(start=0, stop=5, step=1)

In [54]:
series1.values

array([1, 2, 3, 4, 5])

In [55]:
series1.dtype

dtype('int64')

In [56]:
series1.index=['a','b','c','d','e']
series1

a    1
b    2
c    3
d    4
e    5
dtype: int64

### 通过字典创建

In [57]:
dict1={'a':1,'b':2,'c':3,'d':4,'e':5}
series2=Series(dict1)
series2

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [58]:
series2.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [59]:
series2.values

array([1, 2, 3, 4, 5])

## series应用numpy数组运算

In [60]:
# 通过索引取值
series1['a']

np.int64(1)

In [61]:
series1.iloc[0]

np.int64(1)

In [62]:
# 缺失值检测
scores=Series({'a':1,'b':2,'c':3})
new_index=['a','b','c','d']
scores=Series(scores,index=new_index)
scores

a    1.0
b    2.0
c    3.0
d    NaN
dtype: float64

In [63]:
pd.isnull(scores)

a    False
b    False
c    False
d     True
dtype: bool

In [64]:
pd.notnull(scores)

a     True
b     True
c     True
d    False
dtype: bool

In [65]:
scores[pd.notnull(scores)]

a    1.0
b    2.0
c    3.0
dtype: float64

In [66]:
# 自动对齐
product_num=Series([1,2,3,4,5],index=['a','b','c','d','e'])
price=Series([1,2,3,4,5,6],index=['a','b','c','d','e','f'])
product_num*price

a     1.0
b     4.0
c     9.0
d    16.0
e    25.0
f     NaN
dtype: float64

In [67]:
# series及其索引的name属性
series1.name='product_num'
series1.index.name='product_name'
series1

product_name
a    1
b    2
c    3
d    4
e    5
Name: product_num, dtype: int64

## 通过二维数组创建DataFrame

In [68]:
df01=DataFrame([['Tome','Merry','John'],[12,13,14]])
df01

Unnamed: 0,0,1,2
0,Tome,Merry,John
1,12,13,14


In [69]:
df2=DataFrame([['Tome',2,3],['Merry',3,4],['John',4,5]])
df2

Unnamed: 0,0,1,2
0,Tome,2,3
1,Merry,3,4
2,John,4,5


In [70]:
arr = np.array([['Tome', 2, 3], ['Merry', 3, 4], ['John', 4, 5]])
df3 = DataFrame(arr, columns=['name', 'age', 'score'])  # 添加第三个列名
df3


Unnamed: 0,name,age,score
0,Tome,2,3
1,Merry,3,4
2,John,4,5


In [71]:
df4=DataFrame(arr,columns=['name','age','score'],index=['a','b','c'])
df4

Unnamed: 0,name,age,score
a,Tome,2,3
b,Merry,3,4
c,John,4,5


## 通过字典的方式创建DataFrame

In [73]:
data={'name':['Tome','Merry','John'],'age':[12,13,14],'score':[1,2,3]}
df5=DataFrame(data)
df5

Unnamed: 0,name,age,score
0,Tome,12,1
1,Merry,13,2
2,John,14,3


# 索引对象

In [75]:
# 通过索引从series中取值
series3=Series([1,2,3,4,5],index=['a','b','c','d','e'])
series3['a']

np.int64(1)

In [76]:
# 可以使用切片
series3['a':'c']

a    1
b    2
c    3
dtype: int64

## 通过索引从DataFrame中取值

In [77]:
df=pd.read_csv('../../data/titanic.csv')

In [78]:
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [79]:
df['Name']

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [80]:
df['Name']='哈士奇'
df['Name']

0      哈士奇
1      哈士奇
2      哈士奇
3      哈士奇
4      哈士奇
      ... 
886    哈士奇
887    哈士奇
888    哈士奇
889    哈士奇
890    哈士奇
Name: Name, Length: 891, dtype: object