# [教學目標]
- 以下程式碼將示範在 python 如何利用 pandas.cut 與 .qcut 計算出數據的離散化標籤

# [範例重點]
- pandas.cut 的等寬劃分效果 (In[3], Out[4])
- pandas.qcut 的等頻劃分效果 (In[5], Out[6])

In [17]:
# 載入套件
import os
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

In [18]:
# 初始設定 Ages 的資料
ages = pd.DataFrame({"age": [18,22,25,27,7,21,23,37,30,61,45,41,9,18,80,100]})

#### 等寬劃分

In [19]:
# 新增欄位 "equal_width_age", 對年齡做等寬劃分
ages["equal_width_age"] = pd.cut(ages["age"], 4)

In [20]:
# 觀察等寬劃分下, 每個種組距各出現幾次
ages["equal_width_age"].value_counts() # 每個 bin 的值的範圍大小都是一樣的

(6.907, 30.25]    10
(30.25, 53.5]      3
(76.75, 100.0]     2
(53.5, 76.75]      1
Name: equal_width_age, dtype: int64

#### 等頻劃分

In [21]:
# 新增欄位 "equal_freq_age", 對年齡做等頻劃分
ages["equal_freq_age"] = pd.qcut(ages["age"], 4)

In [22]:
# 觀察等頻劃分下, 每個種組距各出現幾次
ages["equal_freq_age"].value_counts() # 每個 bin 的資料筆數是一樣的

(6.999, 20.25]    4
(20.25, 26.0]     4
(26.0, 42.0]      4
(42.0, 100.0]     4
Name: equal_freq_age, dtype: int64

### 作業
新增一個欄位 `customized_age_grp`，把 `age` 分為 (0, 10], (10, 20], (20, 30], (30, 50], (50, 100] 這五組，'(' 表示不包含, ']' 表示包含

Hints: 執行 ??pd.cut()，了解提供其中 bins 這個參數的使用方式

In [23]:
agess = pd.DataFrame({"age": [18,22,25,27,7,21,23,37,30,61,45,41,9,18,80,100]})
bin_cut = np.linspace(0, 100, num = 11)
agess["equal_width_age"] = pd.cut(ages["age"], bins=bin_cut)

In [24]:
agess["equal_width_age"].value_counts()

(20.0, 30.0]     6
(0.0, 10.0]      2
(10.0, 20.0]     2
(40.0, 50.0]     2
(30.0, 40.0]     1
(60.0, 70.0]     1
(70.0, 80.0]     1
(90.0, 100.0]    1
(50.0, 60.0]     0
(80.0, 90.0]     0
Name: equal_width_age, dtype: int64

In [26]:
ages["customized_age_grp"] = pd.cut(ages["age"], [0, 10, 20, 30, 50, 100])

In [27]:
ages

Unnamed: 0,age,equal_width_age,equal_freq_age,customized_age_grp
0,18,"(6.907, 30.25]","(6.999, 20.25]","(10, 20]"
1,22,"(6.907, 30.25]","(20.25, 26.0]","(20, 30]"
2,25,"(6.907, 30.25]","(20.25, 26.0]","(20, 30]"
3,27,"(6.907, 30.25]","(26.0, 42.0]","(20, 30]"
4,7,"(6.907, 30.25]","(6.999, 20.25]","(0, 10]"
5,21,"(6.907, 30.25]","(20.25, 26.0]","(20, 30]"
6,23,"(6.907, 30.25]","(20.25, 26.0]","(20, 30]"
7,37,"(30.25, 53.5]","(26.0, 42.0]","(30, 50]"
8,30,"(6.907, 30.25]","(26.0, 42.0]","(20, 30]"
9,61,"(53.5, 76.75]","(42.0, 100.0]","(50, 100]"
