DataFrame是一个表格型的数据结构，它含有一组有序的列，每列可以是不同的值类型(数值、字符
串、布尔值等)。DataFrame既有行索引也有列索引，它可以被看作由Series组成的字典(公用同
个索引》。跟其他类似的数据结构相比，DataFrame中面向行和面向列的操作基本是平衡的。其实
DataFrame中的数据是以一个或多个二维块存放的(而不是列表、字典或者别的一维数据结构)。

In [23]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
data = {'sno':['95001', '95002', '95003', '95004'],
    'name':['Xiaoming','Zhangsan','Lisi','Wangwu'],
    'sex':['M','F','F','M'],
    'age':[22,25,24,23]}
frame=DataFrame(data)


添加列索引
在制定列索引时，如果存在不匹配的列，则不匹配的列的值为NaN:

In [24]:
frame=DataFrame(data,columns=['name','sno','sex','age','grade'])
print(frame)

       name    sno sex  age grade
0  Xiaoming  95001   M   22   NaN
1  Zhangsan  95002   F   25   NaN
2      Lisi  95003   F   24   NaN
3    Wangwu  95004   M   23   NaN


同时指定行索引和列索引

In [25]:
frame=DataFrame(data,columns=['sno','name','sex','age','grade'],index=['a','b','c','d'])
print(frame)

     sno      name sex  age grade
a  95001  Xiaoming   M   22   NaN
b  95002  Zhangsan   F   25   NaN
c  95003      Lisi   F   24   NaN
d  95004    Wangwu   M   23   NaN


In [26]:
print(frame['sno'])
print(frame.name)

a    95001
b    95002
c    95003
d    95004
Name: sno, dtype: object
a    Xiaoming
b    Zhangsan
c        Lisi
d      Wangwu
Name: name, dtype: object


In [27]:
frame.loc['b']

sno         95002
name     Zhangsan
sex             F
age            25
grade         NaN
Name: b, dtype: object

In [28]:
frame.iloc[1]

sno         95002
name     Zhangsan
sex             F
age            25
grade         NaN
Name: b, dtype: object

In [29]:
frame.loc['b':'c']

Unnamed: 0,sno,name,sex,age,grade
b,95002,Zhangsan,F,25,
c,95003,Lisi,F,24,


In [30]:
frame.iloc[2:4]

Unnamed: 0,sno,name,sex,age,grade
c,95003,Lisi,F,24,
d,95004,Wangwu,M,23,


可以用“切片”的方式使用列名称获取1个列

In [31]:
frame.loc[:,['sex']]

Unnamed: 0,sex
a,M
b,F
c,F
d,M


In [32]:
frame.loc[:,'sex':]

Unnamed: 0,sex,age,grade
a,M,22,
b,F,25,
c,F,24,
d,M,23,


In [33]:
frame.iloc[:,1:4]

Unnamed: 0,name,sex,age
a,Xiaoming,M,22
b,Zhangsan,F,25
c,Lisi,F,24
d,Wangwu,M,23


In [34]:
frame['grade']=[93,89,72,84]
frame

Unnamed: 0,sno,name,sex,age,grade
a,95001,Xiaoming,M,22,93
b,95002,Zhangsan,F,25,89
c,95003,Lisi,F,24,72
d,95004,Wangwu,M,23,84


可以用一个Series修改一个DataFrame的值，将精确匹配DataFrame的索引，
空位将补上缺失值:

In [35]:
frame['grade']=Series([67,89],index=['a','c'])
print(frame)

     sno      name sex  age  grade
a  95001  Xiaoming   M   22   67.0
b  95002  Zhangsan   F   25    NaN
c  95003      Lisi   F   24   89.0
d  95004    Wangwu   M   23    NaN


增加列

In [36]:
frame['province']=['ZheJiang','FuJian','Beijing','ShangHai']
print(frame)

     sno      name sex  age  grade  province
a  95001  Xiaoming   M   22   67.0  ZheJiang
b  95002  Zhangsan   F   25    NaN    FuJian
c  95003      Lisi   F   24   89.0   Beijing
d  95004    Wangwu   M   23    NaN  ShangHai


In [37]:
del frame['province']
print(frame)

     sno      name sex  age  grade
a  95001  Xiaoming   M   22   67.0
b  95002  Zhangsan   F   25    NaN
c  95003      Lisi   F   24   89.0
d  95004    Wangwu   M   23    NaN


In [38]:
dic={'computer':{2020:78,2021:82},'math':{2019:76,2020:78,2021:81}}
frame1=DataFrame(dic)
print(frame1)

      computer  math
2020      78.0    78
2021      82.0    81
2019       NaN    76


In [39]:
frame1.T



Unnamed: 0,2020,2021,2019
computer,78.0,82.0,
math,78.0,81.0,76.0


In [40]:
frame2=DataFrame(dic,index=[2020,2021,2022])
print(frame)

     sno      name sex  age  grade
a  95001  Xiaoming   M   22   67.0
b  95002  Zhangsan   F   25    NaN
c  95003      Lisi   F   24   89.0
d  95004    Wangwu   M   23    NaN


#用顺序数np.arange(12).reshape(3,4)

In [42]:
df1 = pd.DataFrame(np.arange(12).reshape(3,4),columns=['a','b','c','d'])
print(df1)

   a  b   c   d
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11


#用随机数np.random.randint(20,size=(2,3))

In [43]:
df2=pd.DataFrame(np.random.randint(20,size=(2,3)),columns=['b','d','a'])
print(df2)

    b  d   a
0  16  1  13
1   9  6   5


#用随机数np.random.randn(5,3)

In [44]:
df3=pd.DataFrame(np.random.randn(5,3),index=list('abcde'),columns=['one','two','three'])
print(df3)

        one       two     three
a  0.412199 -1.181851 -1.222096
b  1.347675 -0.321430 -0.860737
c  1.189142 -0.434712 -0.080009
d  0.668352  1.551579 -0.051688
e  0.832748 -0.445921  0.636485


In [45]:
import pandas as pd
from pandas import DataFrame
# csv_df = pd.read_csv('C:\\Python38\my_file.csv')
# table_df = pd.read_table('C:\\Python38\my_table.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Python38\\my_file.csv'