#### Pandas怎么实现Excel的vlookup并且在指定列后面输出？
背景：
1. 有两个excel，他们有相同的一个列；
2. 按照这个列合并成一个大的excel，即vlookup功能，要求：
    + 只需要第二个excel的少量的列，比如从40个列中挑选2列
    + 新增的来自第二个excel的列需要放到第一个excel指定的列后面；
3.将结果输出到一个新的excel

#### 步骤1： 读取两个数据表

In [16]:
import pandas as pd

In [30]:
# 学生成绩表
df_grade = pd.read_excel('./files/students/student_scores.xlsx')
df_grade

Unnamed: 0,class,studentId,unit1_score,unit2_score,unit3_score
0,c01,1,56,21,95
1,c01,2,58,22,92
2,c01,3,60,23,89
3,c01,4,62,24,86
4,c01,5,64,25,83
5,c01,6,66,26,80
6,c01,7,68,27,77
7,c02,8,70,28,74
8,c02,9,72,29,71
9,c02,10,74,30,68


In [31]:
df_grade.dtypes

class          object
studentId       int64
unit1_score     int64
unit2_score     int64
unit3_score     int64
dtype: object

In [32]:
# studentId是float类型，需改为int类型
df_grade.studentId=df_grade.studentId.astype('int')

In [33]:
# 学生信息表
df_info = pd.read_excel('./files/students/student_info.xlsx')
df_info.head()

Unnamed: 0,studentId,sname,sphone,sgender,sstate
0,1,AAA,11111,f,nsw
1,2,BBB,22222,m,vic
2,3,CCC,33333,f,tas
3,4,DDD,44444,m,nsw
4,5,EEE,55555,m,qld


In [34]:
df_info.dtypes

studentId     int64
sname        object
sphone        int64
sgender      object
sstate       object
dtype: object

#### 目标：怎样将第二个‘学生信息表的姓名，性别两列添加要第一个表，并且放在学号列后面？

#### 步骤2：实现两个表的关联
即excel的vlookup功能

In [35]:
# 只筛选第二个表的少量的列
df_info=df_info[['studentId','sname','sgender']]
df_info.head()

Unnamed: 0,studentId,sname,sgender
0,1,AAA,f
1,2,BBB,m
2,3,CCC,f
3,4,DDD,m
4,5,EEE,m


In [36]:
df_merge=pd.merge(left=df_grade, right=df_info, left_on='studentId', right_on='studentId')
df_merge.head()

Unnamed: 0,class,studentId,unit1_score,unit2_score,unit3_score,sname,sgender
0,c01,1,56,21,95,AAA,f
1,c01,2,58,22,92,BBB,m
2,c01,3,60,23,89,CCC,f
3,c01,4,62,24,86,DDD,m
4,c01,5,64,25,83,EEE,m


#### 步骤3. 调整列的顺序

In [8]:
df_merge.columns

Index(['studentId', 'sname_x', 'sphone', 'sgender_x', 'sstate', 'sname_y',
       'sgender_y'],
      dtype='object')

#### 问题：怎样将 ‘姓名’，‘性别’两列放到 ‘学号’的后面？
用Python的语法实现列表的处理

In [37]:
# 将columns变成python的列表形式
new_columns = df_merge.columns.to_list()
new_columns

['class',
 'studentId',
 'unit1_score',
 'unit2_score',
 'unit3_score',
 'sname',
 'sgender']

In [42]:
# 按逆序insert，会将 sname，sgender 放到 studentId的后面
# [::-1]代表逆向
for name in ['sname','sgender'][::-1]:
    new_columns.remove(name)
    new_columns.insert(new_columns.index('studentId')+1,name)

new_columns

['class',
 'studentId',
 'sname',
 'sgender',
 'unit1_score',
 'unit2_score',
 'unit3_score']

In [43]:
# 调整 ‘列索引’
df_merge=df_merge.reindex(columns=new_columns)
df_merge.head()

Unnamed: 0,class,studentId,sname,sgender,unit1_score,unit2_score,unit3_score
0,c01,1,AAA,f,56,21,95
1,c01,2,BBB,m,58,22,92
2,c01,3,CCC,f,60,23,89
3,c01,4,DDD,m,62,24,86
4,c01,5,EEE,m,64,25,83


In [44]:
#### 步骤4：输出最终Excel文件
df_merge.to_excel('./files/students/student_merged.xlsx',index=False)