# Merging Datasets
***
## 1. Rename 2008 columns to distinguish from 2018 columns after the merge
To do this, use pandas' rename() with a lambda function. See example here.

In the lambda function, take the first 10 characters of the column label and and concatenate it with _2008. (Only take the first 10 characters to prevent really long column names.)

The lambda function should look something like this: lambda x: x[:10] + "_2008"

In your rename, don't forget to specify the parameter columns= when you add the lambda function!
***
## 2. Perform inner merge
To answer the last question, we are only interested in how the same model of car has been updated and how the new model's mpg compares to the old model's mpg.

Perform an inner merge with the left on model_2008 and the right on model. See documentation for pandas' merge here.
***

# Merging Datasets
Use pandas Merges to create a combined dataset from `clean_08.csv` and `clean_18.csv`. You should've created these data files in the previous section: *Fixing Data Types Pt 3*.

In [38]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline

In [50]:
# load datasets

df_08 = pd.read_csv("clean_08.csv")
df_18 = pd.read_csv("clean_18.csv")

In [51]:
df_08.head(1)

Unnamed: 0,model,displ,cyl,trans,drive,fuel,veh_class,air_pollution_score,city_mpg,hwy_mpg,cmb_mpg,greenhouse_gas_score,smartway
0,ACURA MDX,3.7,6,Auto-S5,4WD,Gasoline,SUV,7.0,15.0,20.0,17.0,4,no


In [52]:
df_18.head(1)

Unnamed: 0,model,displ,cyl,trans,drive,fuel,veh_class,air_pollution_score,city_mpg,hwy_mpg,cmb_mpg,greenhouse_gas_score,smartway
0,ACURA RDX,3.5,6,SemiAuto-6,2WD,Gasoline,small SUV,3,20.0,28.0,23.0,5,No


In [53]:
df_08.rename(columns=(lambda x: x[:10] + "_2008" ), inplace=True)

In [54]:
df_08.head(1)

Unnamed: 0,model_2008,displ_2008,cyl_2008,trans_2008,drive_2008,fuel_2008,veh_class_2008,air_pollut_2008,city_mpg_2008,hwy_mpg_2008,cmb_mpg_2008,greenhouse_2008,smartway_2008
0,ACURA MDX,3.7,6,Auto-S5,4WD,Gasoline,SUV,7.0,15.0,20.0,17.0,4,no


### Create combined dataset

In [None]:
# rename 2008 columns



In [None]:
# view to check names
df_08.head()

In [55]:
# merge datasets
df_combined = df_18.append(df_08, sort=False)

In [56]:
# view to check merge
df_combined.head()

Unnamed: 0,model,displ,cyl,trans,drive,fuel,veh_class,air_pollution_score,city_mpg,hwy_mpg,...,trans_2008,drive_2008,fuel_2008,veh_class_2008,air_pollut_2008,city_mpg_2008,hwy_mpg_2008,cmb_mpg_2008,greenhouse_2008,smartway_2008
0,ACURA RDX,3.5,6.0,SemiAuto-6,2WD,Gasoline,small SUV,3.0,20.0,28.0,...,,,,,,,,,,
1,ACURA RDX,3.5,6.0,SemiAuto-6,4WD,Gasoline,small SUV,3.0,19.0,27.0,...,,,,,,,,,,
2,ACURA TLX,2.4,4.0,AMS-8,2WD,Gasoline,small car,3.0,23.0,33.0,...,,,,,,,,,,
3,ACURA TLX,3.5,6.0,SemiAuto-9,2WD,Gasoline,small car,3.0,20.0,32.0,...,,,,,,,,,,
4,ACURA TLX,3.5,6.0,SemiAuto-9,4WD,Gasoline,small car,3.0,21.0,30.0,...,,,,,,,,,,


Save the combined dataset

In [57]:
df_combined.to_csv('combined_dataset.csv', index=False)