## **17. Pandas的数据转换函数 map, apply, applymap**

### 数据转换函数对比：

- map: 只使用与Series,实现每个值-->值的映射
- apply: 用于Series时，实现每个值的处理；用于DataFrame时，实现某个轴的Series的处理
- applymap: 只能用于DataFrame,用于该DataFrame的每个值的处理（所有元素同时处理）

### **17.0 准备数据**

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline

In [2]:
df000651=pd.read_csv('./stocks/000651.csv',dtype={'SaleOrderID':'object','BuyOrderID':'object'})
df000651['Code']='000651'

df601012=pd.read_csv('./stocks/601012.csv',dtype={'SaleOrderID':'object','BuyOrderID':'object'})
df601012['Code']='601012'

df601288=pd.read_csv('./stocks/601288.csv',dtype={'SaleOrderID':'object','BuyOrderID':'object'})
df601288['Code']='601288'

df601318=pd.read_csv('./stocks/601318.csv',dtype={'SaleOrderID':'object','BuyOrderID':'object'})
df601318['Code']='601318'

df=pd.concat([df000651,df601012,df601288,df601318])

In [3]:
df

Unnamed: 0,TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice,Code
0,1,09:25:00,59.10,400,400,1400,S,1,53.87,1,65.84,000651
1,2,09:25:00,59.10,300,300,1400,S,2,53.87,1,65.84,000651
2,3,09:25:00,59.10,200,200,1400,S,3,53.87,1,65.84,000651
3,4,09:25:00,59.10,200,200,1400,S,4,53.87,1,65.84,000651
4,5,09:25:00,59.10,300,5900,1400,S,5,53.87,1,65.84,000651
...,...,...,...,...,...,...,...,...,...,...,...,...
196762,196763,15:00:00,85.18,100,243258,100,S,8493946,85.18,8461234,85.18,601318
196763,196764,15:00:00,85.18,100,243258,100,S,8493946,85.18,8474391,85.18,601318
196764,196765,15:00:00,85.18,25500,243258,25500,S,8493946,85.18,8492769,85.18,601318
196765,196766,15:00:00,85.18,1000,262500,1000,B,8493946,85.18,8498955,85.18,601318


In [4]:
df['Code'].unique()

array(['000651', '601012', '601288', '601318'], dtype=object)

### **17.1 map用于Series值的处理**

#### Series.map(dict)或者Series.map(fun)均可

In [5]:
#将代码映射成中文
dict_name={'000651':'格力电气', '601012':'隆基绿能', '601288':'农业银行', '601318':'中国平安'}

#### **方法1:Series.map(dict)**

In [11]:
#以Code的值为key,返回字典的Value值
df['Name1']=df['Code'].map(dict_name)

In [12]:
df.head(1)

Unnamed: 0,TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice,Code,Name1
0,1,09:25:00,59.1,400,400,1400,S,1,53.87,1,65.84,651,格力电气


#### **方法2：Series.map(fun)**
**此时，fun的参数是Series的每个元素的值**

In [13]:
#使用lambda函数
df['name2']=df['Code'].map(lambda x:dict_name[x])

In [14]:
df.head(1)

Unnamed: 0,TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice,Code,Name1,name2
0,1,09:25:00,59.1,400,400,1400,S,1,53.87,1,65.84,651,格力电气,格力电气


In [17]:
#使用一个自定义的标准函数。df['Code']是一个Series，函数fun1(x)的入口参数是这个Series的每个元素值
def fun1(x):
    return dict_name[x]+'**'
df['name_fun1']=df['Code'].map(fun1)

In [18]:
df.head(1)

Unnamed: 0,TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice,Code,Name1,name2,name_fun1
0,1,09:25:00,59.1,400,400,1400,S,1,53.87,1,65.84,651,格力电气,格力电气,格力电气**


### **17.2 apply用于Series和DataFrame的转换**

#### **1. apply(function)只能使用fun作为参数，不能使用dict作为参数**
#### **2. Series.apply(function),function的参数是Series的每个值**
#### **3. DataFrame.apply(function),function的参数是Series**

### **17.2.1 Series.apply(fun)**
**fun的参数是Series的每个值**

In [21]:
#与Series.map(fun)相似
df['name3']=df['Code'].apply(lambda x:dict_name[x])

In [22]:
df.head(1)

Unnamed: 0,TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice,Code,Name1,name2,name_fun1,name3
0,1,09:25:00,59.1,400,400,1400,S,1,53.87,1,65.84,651,格力电气,格力电气,格力电气**,格力电气


### **17.2.2 DataFrame.apply(fun)**
**fun的参数是DataFrame对应轴的Serie**

In [25]:
#执行速度慢
df['name4']=df.apply(lambda x:dict_name[x['Code']],axis=1)

注意：
- apply函数是在df这个DataFrame上调用的
- lambda x 中的 x 是一个Series，应为指定了axis=1，所以Series的key是df的列名，可以用df['Code']获取列的值

In [24]:
df.head(1)

Unnamed: 0,TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice,Code,Name1,name2,name_fun1,name3,name4
0,1,09:25:00,59.1,400,400,1400,S,1,53.87,1,65.84,651,格力电气,格力电气,格力电气**,格力电气,格力电气


### **17.3 applymap用于DataFrame所有值的转换**

In [36]:
df_sub=df[['Price','SaleOrderPrice','BuyOrderPrice','SaleOrderVolume','BuyOrderVolume']]

In [38]:
df_sub.head(5)

Unnamed: 0,Price,SaleOrderPrice,BuyOrderPrice,SaleOrderVolume,BuyOrderVolume
0,59.1,53.87,65.84,400,1400
1,59.1,53.87,65.84,300,1400
2,59.1,53.87,65.84,200,1400
3,59.1,53.87,65.84,200,1400
4,59.1,53.87,65.84,5900,1400


In [39]:
#将数据取整，应用于所有元素
df_sub.applymap(lambda x:int(x))

Unnamed: 0,Price,SaleOrderPrice,BuyOrderPrice,SaleOrderVolume,BuyOrderVolume
0,59,53,65,400,1400
1,59,53,65,300,1400
2,59,53,65,200,1400
3,59,53,65,200,1400
4,59,53,65,5900,1400
...,...,...,...,...,...
196762,85,85,85,243258,100
196763,85,85,85,243258,100
196764,85,85,85,243258,25500
196765,85,85,85,262500,1000


In [55]:
df_sub.reset_index(drop=True,inplace=True)

----

## **18. Pandas怎样对每个groupby分组应用apply**

### **18.1 知识**

#### Pandas的groupby的实现过程

**pandas的groupby遵重split,apply,combine模式，如图所示：**

![GroupBy模式](groupby.png)