# 資料結構（3）

## 郭耀仁

## 第一個 pandas data frame

- 還記得 **dict** 這個資料結構嗎？
- 將一個 **dict** 的資料結構轉換為 data frame 是最方便的方法

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]

straw_hat_dict = {"name": name,
                  "age": age
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
print(type(straw_hat_df))
straw_hat_df
```

## Data frame 的特性

- 包含多種資料類型，不會像 ndarray 僅容納單一資料類型

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
print(straw_hat_df.dtypes)
```

## Data frame 的特性（2）

- 同樣使用中括號 `[]` 選擇元素
- 使用 `.ix` 屬性

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)

print(straw_hat_df.ix[0, :]) # 選第 0 個觀測值
print("---")
print(straw_hat_df.ix[:, "name"]) # 選 name 欄位
print("---")
print(straw_hat_df.ix[0, "name"]) # 選第 0 個觀測值的 name 欄位
```

## Data frame 的特性（3）

- 可以使用布林值篩選

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)

# 篩選小於 30 歲的船員
filter = straw_hat_df.ix[:, "age"] <= 30
straw_hat_df[filter]

# 篩選女性船員
filter = straw_hat_df.ix[:, "is_male"] == False
straw_hat_df[filter]
```

## 了解 data frame 的概觀

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)

print(straw_hat_df.shape) # 回傳列數與欄數
print("---")
print(straw_hat_df.describe()) # 回傳描述性統計
print("---")
print(straw_hat_df.head(3)) # 回傳前三筆觀測值
print("---")
print(straw_hat_df.tail(3)) # 回傳後三筆觀測值
print("---")
print(straw_hat_df.columns) # 回傳欄位名稱
print("---")
print(straw_hat_df.index) # 回傳索引值
```

## 讀取外部資料

- 使用 `pandas` 套件的 `.read_csv()` 方法讀取 csv 檔案

```python
import pandas as pd

url = "https://storage.googleapis.com/py_ds_basic/iris.csv" # 在雲端上儲存了一份 csv 檔案
iris_df = pd.read_csv(url)
iris_df.head()
```

## 讀取外部資料（2）

- 使用 `pandas` 套件的 `.read_table()` 方法讀取 tsv 檔案

```python
import pandas as pd

url = "https://storage.googleapis.com/py_ds_basic/iris.tsv" # 在雲端上儲存了一份 tsv 檔案
iris_df = pd.read_table(url, sep = "\t")
iris_df.head()
```

## 讀取外部資料（3）

- 使用 `pandas` 套件的 `.read_excel()` 方法來讀取 excel 檔案

```python
import pandas as pd

url = "https://storage.googleapis.com/py_ds_basic/iris.xlsx" # 在雲端上儲存了一份 Excel 試算表
iris_df = pd.read_excel(url)
iris_df.head()
```

## 讀取外部資料（4）

- 使用 `pandas` 套件的 `.read_json()` 方法來讀取 JSON 檔案

```python
import pandas as pd

url = "https://storage.googleapis.com/py_ds_basic/iris.json" # 在雲端上儲存了一份 JSON 檔
iris_df = pd.read_json(url)
iris_df.head()
```

## 十分鐘暸解 Pandas

<http://pandas.pydata.org/pandas-docs/stable/10min.html>