# merge_ordered() caution, multiple columns
When using `merge_ordered()` to merge on multiple columns, the order is important when you combine it with the forward fill feature. The function sorts the merge on columns in the order provided. In this exercise, we will merge GDP and population data from the World Bank for the Australia and Sweden, reversing the order of the merge on columns. The frequency of the series are different, the GDP values are quarterly, and the population is yearly. Use the forward fill feature to fill in the missing data. Depending on the order provided, the fill forward will use unintended data to fill in the missing values.

The tables `gdp` and `pop` have been loaded.

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
path=r'/media/documentos/Cursos/Data Science/Python/Data_Science_Python/data_sets/'

gdp=pd.read_csv(path+'WorldBank_GDP.csv',usecols = ['Year','Country Name','GDP'])
gdp.rename(columns = {'Year':'date', 'Country Name':'country','Country Name':'country','GDP':'gdp'}, inplace = True)
print('gdp \n',gdp.head(),'\n')

pop=pd.read_csv(path+'WorldBank_POP.csv',usecols = ['Year','Country Name','Pop'])
pop.rename(columns = {'Year':'date', 'Country Name':'country','Pop':'pop'}, inplace = True)
print('pop \n',pop.head(),'\n')

gdp 
          country  date           gdp
0          China  2010  6.087160e+12
1        Germany  2010  3.417090e+12
2          Japan  2010  5.700100e+12
3  United States  2010  1.499210e+13
4          China  2011  7.551500e+12 

pop 
        country  date         pop
0        Aruba  2010    101669.0
1  Afghanistan  2010  29185507.0
2       Angola  2010  23356246.0
3      Albania  2010   2913021.0
4      Andorra  2010     84449.0 



- Use merge_ordered() on gdp and pop, merging on columns date and country with the fill feature, save to ctry_date.

In [7]:
# Merge gdp and pop on date and country with fill and notice rows 2 and 3
ctry_date = pd.merge_ordered(gdp,pop, on=['date','country'],
                             fill_method='ffill')

# Print ctry_date
print(ctry_date)

                 country  date           gdp           pop
0            Afghanistan  2010           NaN  2.918551e+07
1                Albania  2010           NaN  2.913021e+06
2                Algeria  2010           NaN  3.597746e+07
3         American Samoa  2010           NaN  5.607900e+04
4                Andorra  2010           NaN  8.444900e+04
...                  ...   ...           ...           ...
2643  West Bank and Gaza  2018  2.049410e+13  4.569087e+06
2644               World  2018  2.049410e+13  7.594270e+09
2645         Yemen, Rep.  2018  2.049410e+13  2.849869e+07
2646              Zambia  2018  2.049410e+13  1.735182e+07
2647            Zimbabwe  2018  2.049410e+13  1.443902e+07

[2648 rows x 4 columns]


- Perform the same merge of `gdp` and `pop`, but join on country and date (reverse of step 1) with the fill feature, saving this as date_ctry.

In [8]:
# Merge gdp and pop on country and date with fill
date_ctry = pd.merge_ordered(gdp,pop, on=['country','date'],
                             fill_method='ffill')

# Print date_ctry
print(date_ctry)

          country  date           gdp         pop
0     Afghanistan  2010           NaN  29185507.0
1     Afghanistan  2011           NaN  30117413.0
2     Afghanistan  2012           NaN  31161376.0
3     Afghanistan  2012           NaN  31161376.0
4     Afghanistan  2013           NaN  32269589.0
...           ...   ...           ...         ...
2643     Zimbabwe  2014  2.049410e+13  13586681.0
2644     Zimbabwe  2015  2.049410e+13  13814629.0
2645     Zimbabwe  2016  2.049410e+13  14030390.0
2646     Zimbabwe  2017  2.049410e+13  14236745.0
2647     Zimbabwe  2018  2.049410e+13  14439018.0

[2648 rows x 4 columns]
