# DataFrame
DataFrame这种列表式数据结构跟工作吧（Excel等等）极为相似。设计的初衷就是将Series的使用从一维扩展到多维   
变成了数据库式存储
***
***
## 定义DataFrame对象

In [3]:
import pandas as pd
import numpy as np

In [1]:
data = {'color':['blue','green','yellow','red','white'],
        'object':['ball','pen','pencil','paper','mug'],
       'price':[1.2,1.0,0.6,0.9,1.7]}

In [4]:
frame = pd.DataFrame(data)

In [5]:
frame

Unnamed: 0,color,object,price
0,blue,ball,1.2
1,green,pen,1.0
2,yellow,pencil,0.6
3,red,paper,0.9
4,white,mug,1.7


**如果data数据中包含自己不想要的，可以指定列::对应与数据库的select**

In [6]:
frame2 = pd.DataFrame(data,columns=['object','price'])

In [7]:
frame2

Unnamed: 0,object,price
0,ball,1.2
1,pen,1.0
2,pencil,0.6
3,paper,0.9
4,mug,1.7


 ### 当DataFrame没有明确指定标签，pandas会自动增加一列从0开始的作为索引

In [11]:
frame2 = pd.DataFrame(data,columns=['object','price'],index=['one','two','three','four','five'])

In [12]:
frame2

Unnamed: 0,object,price
one,ball,1.2
two,pen,1.0
three,pencil,0.6
four,paper,0.9
five,mug,1.7


In [14]:
frame3 = pd.DataFrame(np.arange(16).reshape((4,4)),
                      index=['red','bule','yellow','white'],
                      columns=['ball','pen','pencil','paper'])

In [15]:
frame3

Unnamed: 0,ball,pen,pencil,paper
red,0,1,2,3
bule,4,5,6,7
yellow,8,9,10,11
white,12,13,14,15


 ## 选取元素、访问
 return Series对象

In [16]:
frame.columns

Index(['color', 'object', 'price'], dtype='object')

In [17]:
frame.index

RangeIndex(start=0, stop=5, step=1)

In [18]:
frame.values

array([['blue', 'ball', 1.2],
       ['green', 'pen', 1.0],
       ['yellow', 'pencil', 0.6],
       ['red', 'paper', 0.9],
       ['white', 'mug', 1.7]], dtype=object)

In [19]:
frame['price'] #通过index获取某一列

0    1.2
1    1.0
2    0.6
3    0.9
4    1.7
Name: price, dtype: float64

In [20]:
frame.price #通过'.'操作符合获取

0    1.2
1    1.0
2    0.6
3    0.9
4    1.7
Name: price, dtype: float64

In [23]:
frame.loc[0] #loc[row_number] 获取某行

color     blue
object    ball
price      1.2
Name: 0, dtype: object

In [24]:
frame.loc[[0,4]] #传入数组可以获取多行

Unnamed: 0,color,object,price
0,blue,ball,1.2
4,white,mug,1.7


In [25]:
frame

Unnamed: 0,color,object,price
0,blue,ball,1.2
1,green,pen,1.0
2,yellow,pencil,0.6
3,red,paper,0.9
4,white,mug,1.7


In [26]:
frame[0:2] #0，1 <2

Unnamed: 0,color,object,price
0,blue,ball,1.2
1,green,pen,1.0


In [27]:
frame.loc[0:2] #0,1,2 <=2

Unnamed: 0,color,object,price
0,blue,ball,1.2
1,green,pen,1.0
2,yellow,pencil,0.6


In [29]:
frame[['color','price']] #相当于省略了columns

Unnamed: 0,color,price
0,blue,1.2
1,green,1.0
2,yellow,0.6
3,red,0.9
4,white,1.7


In [33]:
frame['color'][2]

'yellow'

    ！ask：为什么frame[2]['color']报错

In [35]:
#frame[2]['color']

***
## 赋值

In [36]:
frame.index.name = 'id'
frame.columns.name = 'item'
frame

item,color,object,price
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,blue,ball,1.2
1,green,pen,1.0
2,yellow,pencil,0.6
3,red,paper,0.9
4,white,mug,1.7


In [38]:
frame['new']=12
frame

item,color,object,price,new
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,blue,ball,1.2,12
1,green,pen,1.0,12
2,yellow,pencil,0.6,12
3,red,paper,0.9,12
4,white,mug,1.7,12


    类似于广播的填充
当赋值当列名不存在时，新建该列扩充。类似于操作系统的mv命令，不存在就新建

In [40]:
frame.new=[3.0,1.3,2.2,0.8,1.1]
frame

item,color,object,price,new
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,blue,ball,1.2,3.0
1,green,pen,1.0,1.3
2,yellow,pencil,0.6,2.2
3,red,paper,0.9,0.8
4,white,mug,1.7,1.1


In [43]:
ser = pd.Series(np.arange(5))
ser

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [44]:
frame['new']=ser
frame

item,color,object,price,new
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,blue,ball,1.2,0
1,green,pen,1.0,1
2,yellow,pencil,0.6,2
3,red,paper,0.9,3
4,white,mug,1.7,4


In [53]:
frame['price'][2]=99
frame

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  frame['price'][2]=99


item,color,object,price,new
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,blue,ball,1.2,0
1,green,pen,1.0,1
2,yellow,pencil,99.0,2
3,red,paper,0.9,3
4,white,mug,1.7,4


 **元素的属于关系**

In [54]:
frame.isin([1.0,'pen'])

item,color,object,price,new
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,False,False,False,False
1,False,True,True,True
2,False,False,False,False
3,False,False,False,False
4,False,False,False,False


In [56]:
frame[frame.isin([1.0,'pen'])]

item,color,object,price,new
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,,,,
1,,pen,1.0,1.0
2,,,,
3,,,,
4,,,,


## 删除

In [57]:
del frame['new']
frame

item,color,object,price
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,blue,ball,1.2
1,green,pen,1.0
2,yellow,pencil,99.0
3,red,paper,0.9
4,white,mug,1.7


## 筛选

In [61]:
frame[frame.price < 1.2]
frame

item,color,object,price
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,blue,ball,1.2
1,green,pen,1.0
2,yellow,pencil,99.0
3,red,paper,0.9
4,white,mug,1.7


    >>>frame[frame<2]
    >>>frame
    str and int error

 ## 嵌套字典的方式生成DataFrame

In [64]:
nestdict = {'red':{2012:22,2013:33},
           'white':{2011:12,2012:22,2013:16},
           'blue':{2011:17,2012:27,2013:18}}
nestdict

{'red': {2012: 22, 2013: 33},
 'white': {2011: 12, 2012: 22, 2013: 16},
 'blue': {2011: 17, 2012: 27, 2013: 18}}

    将嵌套字典作为参数传入DataFrame（），pandas会将外部的键解释成列的名称，
    内部的键解释成用作索引的标签；
    对于缺失值使用NaN填充

In [65]:
frame2 = pd.DataFrame(nestdict)
frame2

Unnamed: 0,red,white,blue
2012,22.0,22,27
2013,33.0,16,18
2011,,12,17


## 转置

In [66]:
frame2.T

Unnamed: 0,2012,2013,2011
red,22.0,33.0,
white,22.0,16.0,12.0
blue,27.0,18.0,17.0
