## Missing values
By the end of this lecture you will be able to:
- identify missing values in a `DataFrame`
- count the number of missing values in a column
- find and drop `null` or non-`null` values


## 缺失值
本讲结束时，您将能够：
- 识别 `DataFrame` 中的缺失值
- 计算列中缺失值的数量
- 查找并删除 `null` 或非 `null` 值

In [4]:
import polars as pl
import polars.selectors as cs

In [5]:
csv_file = "../../Files/Sample_Superstore.csv"

df = pl.read_csv(csv_file)

> In Pandas a missing value can be represented with a `null`,`NaN` or `None` value depending on the dtype of the column. Polars also allows `NaN` values for floating point columns to represent non-numberic values (e.g. where division by zero has occurred). This use of `NaN` is distinct from missing values. 

### Metadata on `null` values
Polars stores metadata about `null` values for each column in a `DataFrame`.

#### Null count
Polars stores a count of how many `null` values there are. We can access this with the `null_count` method on a single column or on all the columns


> 在 Pandas 中，缺失值可以用 `null`、`NaN` 或 `None` 表示，具体取决于列的数据类型。Polars 也允许浮点列使用 `NaN` 值来表示非数值（例如，除以零的情况）。这种 `NaN` 的用法与缺失值不同。

### 关于 `null` 值的元数据

Polars 会存储 `DataFrame` 中每一列的 `null` 值的元数据。

#### 空值计数

Polars 会存储 `null` 值的数量。我们可以使用 `null_count` 方法访问单个列或所有列的此计数。

In [8]:
df.null_count()

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32,u32
0,2,1,2,1,0,0,0,2,1,0,0,0,0,0,0,0,0,0,0,0


Polars keeps track of the `null_count` at all times so this is a cheap operation regardless of the size of the column.

Polars 会始终跟踪 `null_count`，因此无论列的大小如何，这都是一项开销很小的操作。

### Finding `null` values

We use the `is_null` expression to find out each value is `null` for the converse

### 查找空值

我们使用 `is_null` 表达式来判断每个值是否为空，反之亦然。

In [10]:
(
    df.select(
        [
            pl.col("Customer_Name"),
            pl.col("Category").is_null().alias("Category_is_null"),
            pl.col("Region").is_null().alias("Region_is_null"),
        ]
    ).head()
)

Customer_Name,Category_is_null,Region_is_null
str,bool,bool
"""Claire Gute""",False,False
"""Claire Gute""",False,False
"""Darrin Van Huff""",False,False
"""Sean O'Donnell""",False,False
"""Sean O'Donnell""",False,False


### Filtering by `null` values

#### Filtering on a single column
We can use these methods to filter by `null` values on a single column.

In this example we want all rows where the values in `Category` are not `null`

### 按空值筛选

#### 按单列筛选

我们可以使用以下方法按单列中的空值进行筛选。

在本例中，我们想要筛选出所有 `Category` 列的值不为空的行。

In [25]:
(
    df.filter(
        pl.col("Category").is_null(),
    ).select("Customer_Name", "Category", "Profit")
)

Customer_Name,Category,Profit
str,str,f64
"""Brosina Hoffman""",,5.4432
"""Zuschuss Donatelli""",,2.4824
"""Emily Burns""",,240.2649
"""Gene Hale""",,123.4737
"""Katrina Willman""",,9.936
"""Dean Katz""",,0.777
"""Mark Packer""",,3.36
"""Bradley Drucker""",,206.316


#### Filtering by `null` values in multiple columns


#### 按多列中的 `null` 值进行筛选

In [9]:
(df.filter(pl.any_horizontal(pl.all().is_null())).head())

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,,,"""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
3,"""CA-2016-138688""","""12-06-2016""",,,"""DV-13045""","""Darrin Van Huff""","""Corporate""",,"""Los Angeles""","""California""",90036,"""West""","""OFF-LA-10000240""","""Office Supplies""","""Labels""","""Self-Adhesive Address Labels f…",14.62,2,0.0,6.8714
4,,"""11-10-2015""",,"""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""","""Fort Lauderdale""","""Florida""",33311,"""South""","""FUR-TA-10000577""","""Furniture""","""Tables""","""Bretford CR4500 Series Slim Re…",957.5775,5,0.45,-383.031
5,"""US-2015-108966""","""11-10-2015""","""18-10-2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""",,"""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164
8,"""CA-2014-115812""","""09-06-2014""","""14-06-2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""",,"""Los Angeles""","""California""",90032,"""West""","""TEC-PH-10002275""","""Technology""","""Phones""","""Mitel 5320 IP Phone VoIP phone""",907.152,6,0.2,90.7152


### Using the `drop_nulls` method

Polars has a convenience `drop_nulls` method for dropping rows where all values are `null`

### 使用 `drop_nulls` 方法

Polars 提供了一个便捷的 `drop_nulls` 方法，用于删除所有值均为 `null` 的行。

In [10]:
(df.drop_nulls(subset=["Ship_Date"]))

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
1,,,"""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-BO-10001798""","""Furniture""","""Bookcases""","""Bush Somerset Collection Bookc…",261.96,2,0.0,41.9136
2,"""CA-2016-152156""","""08-11-2016""","""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
5,"""US-2015-108966""","""11-10-2015""","""18-10-2015""","""Standard Class""","""SO-20335""","""Sean O'Donnell""","""Consumer""","""United States""",,"""Florida""",33311,"""South""","""OFF-ST-10000760""","""Office Supplies""","""Storage""","""Eldon Fold 'N Roll Cart System""",22.368,2,0.2,2.5164
6,"""CA-2014-115812""","""09-06-2014""","""14-06-2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""FUR-FU-10001487""","""Furniture""","""Furnishings""","""Eldon Expressions Wood and Pla…",48.86,7,0.0,14.1694
7,"""CA-2014-115812""","""09-06-2014""","""14-06-2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""OFF-AR-10002833""","""Office Supplies""","""Art""","""Newell 322""",7.28,4,0.0,1.9656
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
9990,"""CA-2014-110422""","""21-01-2014""","""23-01-2014""","""Second Class""","""TB-21400""","""Tom Boeckenhauer""","""Consumer""","""United States""","""Miami""","""Florida""",33180,"""South""","""FUR-FU-10001889""","""Furniture""","""Furnishings""","""Ultra Door Pull Handle""",25.248,3,0.2,4.1028
9991,"""CA-2017-121258""","""26-02-2017""","""03-03-2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""FUR-FU-10000747""","""Furniture""","""Furnishings""","""Tenex B1-RE Series Chair Mats …",91.96,2,0.0,15.6332
9992,"""CA-2017-121258""","""26-02-2017""","""03-03-2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""TEC-PH-10003645""","""Technology""","""Phones""","""Aastra 57i VoIP phone""",258.576,2,0.2,19.3932
9993,"""CA-2017-121258""","""26-02-2017""","""03-03-2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""OFF-PA-10004041""","""Office Supplies""","""Paper""","""It's Hot Message Books with St…",29.6,4,0.0,13.32


In [11]:
(df.drop_nulls())  # 删除列表中的所有 null 值.

Row_ID,Order_ID,Order_Date,Ship_Date,Ship_Mode,Customer_ID,Customer_Name,Segment,Country,City,State,Postal_Code,Region,Product_ID,Category,Sub_Category,Product_Name,Sales,Quantity,Discount,Profit
i64,str,str,str,str,str,str,str,str,str,str,i64,str,str,str,str,str,f64,i64,f64,f64
2,"""CA-2016-152156""","""08-11-2016""","""11-11-2016""","""Second Class""","""CG-12520""","""Claire Gute""","""Consumer""","""United States""","""Henderson""","""Kentucky""",42420,"""South""","""FUR-CH-10000454""","""Furniture""","""Chairs""","""Hon Deluxe Fabric Upholstered …",731.94,3,0.0,219.582
6,"""CA-2014-115812""","""09-06-2014""","""14-06-2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""FUR-FU-10001487""","""Furniture""","""Furnishings""","""Eldon Expressions Wood and Pla…",48.86,7,0.0,14.1694
7,"""CA-2014-115812""","""09-06-2014""","""14-06-2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""OFF-AR-10002833""","""Office Supplies""","""Art""","""Newell 322""",7.28,4,0.0,1.9656
9,"""CA-2014-115812""","""09-06-2014""","""14-06-2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""OFF-BI-10003910""","""Office Supplies""","""Binders""","""DXL Angle-View Binders with Lo…",18.504,3,0.2,5.7825
10,"""CA-2014-115812""","""09-06-2014""","""14-06-2014""","""Standard Class""","""BH-11710""","""Brosina Hoffman""","""Consumer""","""United States""","""Los Angeles""","""California""",90032,"""West""","""OFF-AP-10002892""","""Office Supplies""","""Appliances""","""Belkin F5C206VTEL 6 Outlet Sur…",114.9,5,0.0,34.47
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
9990,"""CA-2014-110422""","""21-01-2014""","""23-01-2014""","""Second Class""","""TB-21400""","""Tom Boeckenhauer""","""Consumer""","""United States""","""Miami""","""Florida""",33180,"""South""","""FUR-FU-10001889""","""Furniture""","""Furnishings""","""Ultra Door Pull Handle""",25.248,3,0.2,4.1028
9991,"""CA-2017-121258""","""26-02-2017""","""03-03-2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""FUR-FU-10000747""","""Furniture""","""Furnishings""","""Tenex B1-RE Series Chair Mats …",91.96,2,0.0,15.6332
9992,"""CA-2017-121258""","""26-02-2017""","""03-03-2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""TEC-PH-10003645""","""Technology""","""Phones""","""Aastra 57i VoIP phone""",258.576,2,0.2,19.3932
9993,"""CA-2017-121258""","""26-02-2017""","""03-03-2017""","""Standard Class""","""DB-13060""","""Dave Brooks""","""Consumer""","""United States""","""Costa Mesa""","""California""",92627,"""West""","""OFF-PA-10004041""","""Office Supplies""","""Paper""","""It's Hot Message Books with St…",29.6,4,0.0,13.32


We can also specify a subset of columns to apply the condition on

我们还可以指定要应用条件的列子集。