# Combining Data

結合資料有很多方法，且隨著資料不同而有所不同，這裡我們練習結合以下兩個來自　World Bank Indicators data　的資料
* [rural_population_percent.csv](https://data.worldbank.org/indicator/SP.RUR.TOTL.ZS) - 鄉村人口比例
* [electricity_access_percent.csv](https://data.worldbank.org/indicator/EG.ELC.ACCS.ZS) - 能使用電力的人口比例

## Main tools 
- [Merge, join, and concatenate](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html)

### 1. Combine the two data sets using the [pandas concat method](https://pandas.pydata.org/pandas-docs/stable/merging.html)

In [1]:
# import the pandas library
import pandas as pd
# read in each csv file into a separate variable
df_rural = pd.read_csv('../data/rural_population_percent.csv', skiprows=4)
df_electricity = pd.read_csv('../data/electricity_access_percent.csv', skiprows=4)

# remove the 'Unnamed:62' column from each data set
df_rural = df_rural.drop('Unnamed: 62', axis=1)
df_electricity = df_electricity.drop('Unnamed: 62', axis=1)

# combine the two data sets together using the concat method
# all of the rows of df_rural will come first followed by all the rows in df_electricity.
df_concat = pd.concat([df_rural, df_electricity], axis=0)
# df_rural.shape = (264, 62)
df_concat[262:266]

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
262,Zambia,ZMB,Rural population (% of total population),SP.RUR.TOTL.ZS,81.855,81.049,80.215,79.288,77.985,76.628,...,62.125,61.701,61.275,60.847,60.413,59.973,59.528,59.078,58.621,58.16
263,Zimbabwe,ZWE,Rural population (% of total population),SP.RUR.TOTL.ZS,87.392,87.179,86.918,86.422,85.908,85.38,...,66.44,66.622,66.804,66.985,67.166,67.346,67.499,67.624,67.723,67.793
0,Aruba,ABW,Access to electricity (% of population),EG.ELC.ACCS.ZS,,,,,,,...,93.086166,93.354546,93.356292,93.942375,94.255814,94.578262,94.906723,95.238182,95.570145,
1,Afghanistan,AFG,Access to electricity (% of population),EG.ELC.ACCS.ZS,,,,,,,...,42.4,44.854885,42.7,43.222019,69.1,67.259552,89.5,71.5,84.137138,


### 2. Combine the two datas in the csv file together so that the output looks like the following:

|Country Name|Country Code|Year|Rural_Value|Electricity_Value|
|------|------|------|------|------|
|Aruba|ABW|1960|49.224|49.239|
... etc.

Order the results in the dataframe by country and then by year

Here are a few pandas methods that should be helpful:
* [melt](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)
* [drop](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html)
* [merge](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.merge.html)
* [sort_values](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html)

In [2]:
# use the melt method to change the formatting of each data frame so that it looks like this:
# Country Name, Country Code, Year, Rural Value
# Country Name, Country Code, Year, Electricity Value
df_rural_melt = pd.melt(df_rural, 
                    id_vars=['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code'],
                    var_name='Year',
                    value_name='Rural Value')
df_electricity_melt = pd.melt(df_electricity, 
                              id_vars=['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code'],
                              var_name='Year',
                              value_name='Electricity Value')

# drop any columns from the data frames that aren't needed
df_rural_melt.drop(['Indicator Name', 'Indicator Code'], axis=1, inplace=True)
df_electricity_melt.drop(['Indicator Name', 'Indicator Code'], axis=1, inplace=True)

# merge the data frames together based on their common columns
df_merge = df_rural_melt.merge(df_electricity_melt, 
                               how='inner',
                               on=['Country Name', 'Country Code', 'Year'])

# sort the results by country and then by year
df_combined = df_merge.sort_values(['Country Name', 'Year'])
df_combined.head()

Unnamed: 0,Country Name,Country Code,Year,Rural Value,Electricity Value
1,Afghanistan,AFG,1960,91.779,
265,Afghanistan,AFG,1961,91.492,
529,Afghanistan,AFG,1962,91.195,
793,Afghanistan,AFG,1963,90.89,
1057,Afghanistan,AFG,1964,90.574,
