# Mergin Data: Types of Merges

In this notebook I'll showcase three types of merges with Pandas:

- One-to-one
- Many-to-one
- Many-to-Many

All these merges can be performed using the same Pandas function: [pd.merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html)

    DataFrame.merge(self, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None) → 'DataFrame'

In [54]:
# import required library: Pandas
import pandas as pd
import numpy as np
# I'll access world bank data using pandas_datareader
import pandas_datareader as pdr
from pandas_datareader import wb
import requests
from dbnomics import fetch_series

In [4]:
# create a list of all country codes, excluding aggregates

country_codes = wb.get_countries()
country_codes = country_codes[country_codes['region'] != 'Aggregates']['iso3c'].values #exclude aggregates
country_codes

array(['ABW', 'AFG', 'AGO', 'ALB', 'AND', 'ARE', 'ARG', 'ARM', 'ASM',
       'ATG', 'AUS', 'AUT', 'AZE', 'BDI', 'BEL', 'BEN', 'BFA', 'BGD',
       'BGR', 'BHR', 'BHS', 'BIH', 'BLR', 'BLZ', 'BMU', 'BOL', 'BRA',
       'BRB', 'BRN', 'BTN', 'BWA', 'CAF', 'CAN', 'CHE', 'CHI', 'CHL',
       'CHN', 'CIV', 'CMR', 'COD', 'COG', 'COL', 'COM', 'CPV', 'CRI',
       'CUB', 'CUW', 'CYM', 'CYP', 'CZE', 'DEU', 'DJI', 'DMA', 'DNK',
       'DOM', 'DZA', 'ECU', 'EGY', 'ERI', 'ESP', 'EST', 'ETH', 'FIN',
       'FJI', 'FRA', 'FRO', 'FSM', 'GAB', 'GBR', 'GEO', 'GHA', 'GIB',
       'GIN', 'GMB', 'GNB', 'GNQ', 'GRC', 'GRD', 'GRL', 'GTM', 'GUM',
       'GUY', 'HKG', 'HND', 'HRV', 'HTI', 'HUN', 'IDN', 'IMN', 'IND',
       'IRL', 'IRN', 'IRQ', 'ISL', 'ISR', 'ITA', 'JAM', 'JOR', 'JPN',
       'KAZ', 'KEN', 'KGZ', 'KHM', 'KIR', 'KNA', 'KOR', 'KWT', 'LAO',
       'LBN', 'LBR', 'LBY', 'LCA', 'LIE', 'LKA', 'LSO', 'LTU', 'LUX',
       'LVA', 'MAC', 'MAF', 'MAR', 'MCO', 'MDA', 'MDG', 'MDV', 'MEX',
       'MHL', 'MKD',

In [5]:
# I'll get a list of country names with populations, from the World Bank
# Using pandas datareader
# see https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#world-bank

pop = (
        wb.download(indicator='SP.POP.TOTL',country=country_codes, start=2015, end=2015)
       .reset_index()
       .drop(columns='year')
       .rename(columns={'SP.POP.TOTL':'population'})
       .sort_values(by='population', ascending=False) 
      )
pop.head()



Unnamed: 0,country,population
36,China,1371220000.0
89,India,1310152000.0
203,United States,320742700.0
87,Indonesia,258383300.0
26,Brazil,204471800.0


In [25]:
url = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/IFS/Q.AU.PXP_IX.?startPeriod=1957&endPeriod=2016'

# Get data from the above URL using the requests package
data = requests.get(url).json()

# Load data into a pandas dataframe
auxp = pd.DataFrame(data['CompactData']['DataSet']['Series']['Obs'])

# Show the last five observiations
auxp.tail()

Unnamed: 0,@TIME_PERIOD,@OBS_VALUE
223,2015-Q4,85.5465473860777
224,2016-Q1,81.5208275090858
225,2016-Q2,82.6390830304725
226,2016-Q3,85.9938495946324
227,2016-Q4,98.6301369863014


symbols (string) – Possible formats: 1. DB/SYM: The Quandl ‘codes’: DB is the database name, SYM is a ticker-symbol-like Quandl abbreviation for a particular security. 2. SYM.CC: SYM is the same symbol and CC is an ISO country code, will try to map to the best single Quandl database for that country. Beware of ambiguous symbols (different securities per country)! Note: Cannot use more than a single string because of the inflexible way the URL is composed of url and _get_params in the superclass  


from: https://www.quandl.com/data/ODA-IMF-Cross-Country-Macroeconomic-Statistics/documentation

Data Organization

The quickest way to find a dataset within the IMF database is via search. Click the Data tab on the left of this page and then type your query (including both indicator name and country name) into the search box marked "search this database".

All IMF datasets can also be accessed directly via their unique Quandl code. The codes for these datasets follow the format ODA/{COUNTRY}_{INDICATOR}.

For example, the Quandl code for population of Albania is ODA/ALB_LP, where ALB is the ISO code for Albania and LP is the indicator code for population. The table below lists all available indicators; note that not all indicators are available for all countries. You can see a list of all country ISO codes here.

In [49]:
# symbol = 'WIKI/AAPL'  # or 'AAPL.US'
symbol = 'ODA/ABW_PCPI'
temp = web.DataReader(symbol, 'quandl', '2019-01-01', '2020-01-01', api_key='FBwSgrKW14w3TMa6a8un')
temp

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
2019-12-31,122.46


In [52]:
temp.iloc[0,0]

122.46

In [55]:
d = {} # initialize dict
print len(country_codes)
for country_code in country_codes:
    print(country_code)
    print(country_codes.index(country_code))
    try:
        d[country_code] = web.DataReader(f'ODA/{country_code}_PCPI', 'quandl', '2019-01-01', '2020-01-01', api_key='FBwSgrKW14w3TMa6a8un').iloc[0,0]
    except:
        d[country_code] = np.nan

In [56]:
d

{'ABW': 122.46,
 'AFG': 113.815,
 'AGO': 353.82099999999997,
 'ALB': 107.49799999999999,
 'AND': nan,
 'ARE': 289.798,
 'ARG': 408.024,
 'ARM': 292.605,
 'ASM': nan,
 'ATG': 141.821,
 'AUS': 115.662,
 'AUT': 130.347,
 'AZE': 255.5,
 'BDI': 145.089,
 'BEL': 108.491,
 'BEN': 115.397,
 'BFA': 112.771,
 'BGD': 261.41900000000004,
 'BGR': 104.95200000000001,
 'BHR': 136.17,
 'BHS': 108.016,
 'BIH': 138.356,
 'BLR': 818.8430000000001,
 'BLZ': 137.662,
 'BMU': nan,
 'BOL': 103.97200000000001,
 'BRA': 34297640468509.0,
 'BRB': 186.44400000000002,
 'BRN': 99.537,
 'BTN': 267.839,
 'BWA': 342.99199999999996,
 'CAF': 211.986,
 'CAN': 135.615,
 'CHE': 102.509,
 'CHI': nan,
 'CHL': 102.339,
 'CHN': 108.176,
 'CIV': 153.28799999999998,
 'CMR': 251.669,
 'COD': 6415.59,
 'COG': 149.04399999999998,
 'COL': 102.397,
 'COM': 183.00900000000001,
 'CPV': 140.986,
 'CRI': 106.176,
 'CUB': nan,
 'CUW': nan,
 'CYM': nan,
 'CYP': 100.76799999999999,
 'CZE': 107.79,
 'DEU': 105.429,
 'DJI': 105.26799999999999,

## One-to-one Merge

In this type of merge, there should be no duplicate values in each column. If there are duplicate values, the duplicate value will be repeated. 

## Many-toone

In this type of merge, 