# 1. 缺失值的判断
pandas使用浮点值NaN(Not a Number)表示浮点数和非浮点数组中的缺失值，同时python内置None值也会被当作是缺失值。
## 1.1 Series的缺失值判断

In [5]:
import pandas as pd
import numpy as np
s = pd.Series(["a","b",np.nan,"c",None])
print(s)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
print(s.isnull())
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#输出缺失值的索引和值
print(s[s.isnull()])

0       a
1       b
2     NaN
3       c
4    None
dtype: object
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0    False
1    False
2     True
3    False
4     True
dtype: bool
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2     NaN
4    None
dtype: object


## 1.2 DataFrame的缺失值判断

In [6]:
a = [[1,np.nan,2],[3,4,None]]
data = pd.DataFrame(a)
#DataFrame的None值变成了NaN
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
print(data.isnull())
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
print(data[data.isnull()])

   0    1    2
0  1  NaN  2.0
1  3  4.0  NaN
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
       0      1      2
0  False   True  False
1  False  False   True
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    0   1   2
0 NaN NaN NaN
1 NaN NaN NaN


# 2. 过滤缺失数据
## 2.1 Series的缺失值过滤

In [8]:
s = pd.Series(["a","b",np.nan,"c",None])
#通过使用notnull方法来获取非缺失数据
print(s[s.notnull()])
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#使用dropna方法删除缺失数据,返回一个删除后的Series
print(s.dropna())
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#并没有在原来的Series上进行直接删除
print(s)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#通过设置inplace参数为True,在原Series上进行删除,不会返回Series
print(s.dropna(inplace=True))
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#None
print(s)

0    a
1    b
3    c
dtype: object
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0    a
1    b
3    c
dtype: object
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0       a
1       b
2     NaN
3       c
4    None
dtype: object
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
None
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
0    a
1    b
3    c
dtype: object


## 2.2 DataFrame的缺失值过滤
DataFrame删除缺失值相对于Series而言就要复杂一些，也许有的时候你是想删除含有缺失值的行或列，也许有时候你需要删除的是，当整行或整列全为缺失值的时候才删除，好在pandas对于这两种情况都有相对应的处理方法。
### 2.2.1 删除含有缺失值的行和列


In [9]:
a = [[1, np.nan, 2],[9,None,np.nan],[3, 4, None],[5,6,7]]
data = pd.DataFrame(a)
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#使用dropna方法删除含有缺失值的行，默认是行
print(data.dropna())
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#删除含有缺失值的列
print(data.dropna(axis=1))


   0    1    2
0  1  NaN  2.0
1  9  NaN  NaN
2  3  4.0  NaN
3  5  6.0  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
   0    1    2
3  5  6.0  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
   0
0  1
1  9
2  3
3  5


### 2.2.2 删除全为NaN的行和列

In [10]:
a = [[1, np.nan, 2],[np.nan,None,np.nan],[3, None, None],[5,None,7]]
data = pd.DataFrame(a)
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#当行全为NaN的时候,才删除,参数how默认是any,含有缺失值就删除
print(data.dropna(how="all"))
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#当列全为NaN的时候，才删除
print(data.dropna(how="all",axis=1))
#dropna方法的inplace的设置与Series一样

     0   1    2
0  1.0 NaN  2.0
1  NaN NaN  NaN
2  3.0 NaN  NaN
3  5.0 NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
     0   1    2
0  1.0 NaN  2.0
2  3.0 NaN  NaN
3  5.0 NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
     0    2
0  1.0  2.0
1  NaN  NaN
2  3.0  NaN
3  5.0  7.0


### 2.2.3 指定删除数据后显示部分数据观察

In [12]:
a = [[1, np.nan, 2],[np.nan,None,np.nan],[3, None, None],[5,None,7]]
data = pd.DataFrame(a)
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#当行全为NaN的时候,才删除,参数how默认是any,含有缺失值就删除
print(data.dropna(how="all"))
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#通过thresh参数来控制显示删除数据的条数，删除列的时候thresh参数无效
print(data.dropna(how="all",thresh=2))

     0   1    2
0  1.0 NaN  2.0
1  NaN NaN  NaN
2  3.0 NaN  NaN
3  5.0 NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
     0   1    2
0  1.0 NaN  2.0
2  3.0 NaN  NaN
3  5.0 NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
     0   1    2
0  1.0 NaN  2.0
3  5.0 NaN  7.0


# 3 填充缺失值
数据都是宝贵的，也许有时候你的数据不够多，因为数据越多对于模型的训练，数据分析都是有好处的，所以很多的时候我们都不想删除数据。通常情况下，也许你会选择用一些特殊值来填充缺失值。下面介绍使用pandas的fillna方法来填充缺失数据。
## 3.1 指定特殊值填充缺失值

In [13]:
a = [[1, 2, 2],[3,None,6],[3, 7, None],[5,None,7]]
data = pd.DataFrame(a)
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#用0填充所有的缺失数据
print(data.fillna(0))

   0    1    2
0  1  2.0  2.0
1  3  NaN  6.0
2  3  7.0  NaN
3  5  NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
   0    1    2
0  1  2.0  2.0
1  3  0.0  6.0
2  3  7.0  0.0
3  5  0.0  7.0


## 3.2 不同列使用不同的填充值

In [14]:
a = [[1, 2, 2],[3,None,6],[3, 7, None],[5,None,7]]
data = pd.DataFrame(a)
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
print(data.fillna({1:1,2:2}))

   0    1    2
0  1  2.0  2.0
1  3  NaN  6.0
2  3  7.0  NaN
3  5  NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
   0    1    2
0  1  2.0  2.0
1  3  1.0  6.0
2  3  7.0  2.0
3  5  1.0  7.0


## 3.3 前向填充和后向填充

In [15]:
a = [[1, 2, 2],[3,None,6],[3, 7, None],[5,None,7]]
data = pd.DataFrame(a)
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#前向填充，使用默认是上一行的值,设置axis=1可以使用列进行填充
print(data.fillna(method="ffill"))
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
#后向填充，使用下一行的值,不存在的时候就不填充
print(data.fillna(method="bfill"))

   0    1    2
0  1  2.0  2.0
1  3  NaN  6.0
2  3  7.0  NaN
3  5  NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
   0    1    2
0  1  2.0  2.0
1  3  2.0  6.0
2  3  7.0  6.0
3  5  7.0  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
   0    1    2
0  1  2.0  2.0
1  3  7.0  6.0
2  3  7.0  7.0
3  5  NaN  7.0


## 3.4 使用列的平均值进行填充

In [16]:
a = [[1, 2, 2],[3,None,6],[3, 7, None],[5,None,7]]
data = pd.DataFrame(a)
print(data)
print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")
print(data.fillna(data.mean()))

   0    1    2
0  1  2.0  2.0
1  3  NaN  6.0
2  3  7.0  NaN
3  5  NaN  7.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
   0    1    2
0  1  2.0  2.0
1  3  4.5  6.0
2  3  7.0  5.0
3  5  4.5  7.0
