# Pandas DataFrame - 資料處理與轉換的重要好夥伴

#### 作者：徐子皓(臺灣行銷研究特邀作者)
#### 完整文章介紹鏈接：https://medium.com/p/de4bde2d4463/

## 一、基本操作

### 1. 導入資料集

In [3]:
import pandas as pd
transaction = pd.read_csv('transaction.csv')
data = {'uid':[1,2,3,4,5],
        'name':['Howard','Lily','Kai',
                'Jojo','Ivan'],
        'age':[25,21,35,18,15]}

In [4]:
transaction

Unnamed: 0,tid,uid,product,quantity,price
0,T0001,1,lemon,5,12.4
1,T0002,5,banana,6,42.16
2,T0003,4,orange,4,6.2
3,T0003,3,cherry,3,74.4
4,T0003,3,guava,2,24.8
5,T0004,2,banana,10,42.16
6,T0004,2,orange,5,6.2
7,T0004,2,guava,2,24.8
8,T0005,5,orange,4,6.2
9,T0005,5,lemon,5,12.4


In [5]:
data

{'uid': [1, 2, 3, 4, 5],
 'name': ['Howard', 'Lily', 'Kai', 'Jojo', 'Ivan'],
 'age': [25, 21, 35, 18, 15]}

### 2. 將「字典」型式資料轉成「資料集」形式

In [6]:
member = pd.DataFrame(data)
member 

Unnamed: 0,uid,name,age
0,1,Howard,25
1,2,Lily,21
2,3,Kai,35
3,4,Jojo,18
4,5,Ivan,15


### 3. 取出資料集的開頭幾筆資料

In [7]:
member.head()

Unnamed: 0,uid,name,age
0,1,Howard,25
1,2,Lily,21
2,3,Kai,35
3,4,Jojo,18
4,5,Ivan,15


### 4. 查看資料集的基本資料型態

In [8]:
member.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
uid     5 non-null int64
name    5 non-null object
age     5 non-null int64
dtypes: int64(2), object(1)
memory usage: 200.0+ bytes


### 5. 查看資料集的列、欄數量

In [9]:
member.shape 

(5, 3)

### 6. 取出特定欄位

In [10]:
member['name']

0    Howard
1      Lily
2       Kai
3      Jojo
4      Ivan
Name: name, dtype: object

### 7. 取出多個特定欄位

In [11]:
member[ ['name','age'] ]

Unnamed: 0,name,age
0,Howard,25
1,Lily,21
2,Kai,35
3,Jojo,18
4,Ivan,15


### 8. 資料集數學運算

In [15]:
# 平均會員年紀
member['age'].mean()

22.8

In [16]:
# 最大會員年紀
member['age'].max()

35

In [17]:
# 最小會員年紀
member['age'].min()

15

### 9. 其他常見統計

In [18]:
member['age'].describe()

count     5.000000
mean     22.800000
std       7.758866
min      15.000000
25%      18.000000
50%      21.000000
75%      25.000000
max      35.000000
Name: age, dtype: float64

## 二、資料排序

### 1. 遞增排序

In [19]:
member['age'].sort_values()

4    15
3    18
1    21
0    25
2    35
Name: age, dtype: int64

### 2. 遞減排序

In [20]:
member['age'].sort_values(ascending = False)

2    35
0    25
1    21
3    18
4    15
Name: age, dtype: int64

### 3. 單獨欄位排序 

In [24]:
member.sort_values(['age'])

Unnamed: 0,uid,name,age
4,5,Ivan,15
3,4,Jojo,18
1,2,Lily,21
0,1,Howard,25
2,3,Kai,35


## 三、進階應用

### 1. 移除欄位 

In [23]:
member2 = member.drop(columns=['uid'])
member2

Unnamed: 0,name,age
0,Howard,25
1,Lily,21
2,Kai,35
3,Jojo,18
4,Ivan,15


### 2. 判斷每一列是否含有指定字串

In [25]:
transaction['product'] == 'lemon'

0     True
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9     True
Name: product, dtype: bool

### 3. 取出含有指定字串之資料

In [26]:
transaction[transaction['product'] == 'lemon']

Unnamed: 0,tid,uid,product,quantity,price
0,T0001,1,lemon,5,12.4
9,T0005,5,lemon,5,12.4


### 4. 將資料轉為串列形式

In [27]:
member.values.tolist()

[[1, 'Howard', 25],
 [2, 'Lily', 21],
 [3, 'Kai', 35],
 [4, 'Jojo', 18],
 [5, 'Ivan', 15]]