## Brief Description

<a id='Introduction'></a>

The datasets for cleaning were obtained from [the Global Knowledge Partnership on Migration and Development (KNOMAD)](https://www.knomad.org/data/remittances) website in their broader effort to fill the knowledge gaps for monitoring and analyzing migration and remittances. It provides remittance data movements (inbound and outbound) between various countries.

Summary Content:
The datasets were cleaned and saved as CSV files.

* Time period: 2000 till 2022
* This dataset contains two files:
    * `remittance-inflows_clean.csv` - Historical remittance money inflow into world countries since 2000.
    * `remittance-outflows_clean.csv` - Historical remittance money outflow from world countries since 2000.

All monetary values are in terms of millions of US dollars.

Each dataset contains the names of countries in the rows, with remittance values listed in the columns. The final column represents the percentage of Gross Domestic Product (GDP) that the remittance value made up in 2022.

##  Data Cleaning for Remittance Inflow


In [None]:
# Importing libraries
import numpy as np
import pandas as pd
import warnings

#### Preparing datasets

In [None]:
# Load the inward remittance flows dataset
inward_filepath = '/content/inward_remittance_flows_december_2023_1.xlsx'
#'https://knomad.org/sites/default/files/2023-12/inward_remittance_flows_december_2023_1.xlsx'
df = pd.read_excel(inward_filepath, index_col=0)
print(df.shape)
# Display the first few rows of the dataset
df.head(10)

(226, 25)


Unnamed: 0_level_0,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,...,2015,2016,2017,2018,2019,2020,2021,2022,2023e,% of GDP in 2023
Remittance inflows (US$ million),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,89.5,140.695166,...,348.624717,627.710802,822.73163,803.546454,828.571904,788.917115,320.0,370.0,300.0,2.007898
Albania,597.8,699.3,733.57,888.748582,1160.672105,1289.704316,1359.467325,1468.02,1865.6,1717.698012,...,1290.863508,1306.009167,1311.822432,1458.210056,1472.812242,1465.987212,1718.355918,1745.245136,1970.0,8.553317
Algeria,0.0,0.0,0.0,0.0,0.0,170.0,189.0,99.004563,103.631887,150.336961,...,1997.393458,1989.023597,1791.887073,1984.998399,1785.838683,1699.608935,1792.158957,1658.97581,1770.0,0.790179
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,21.1,47.416324,53.001418,0.0,0.0,0.0
Angola,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,82.084,0.2,...,11.114712,3.988048,1.418196,1.579247,3.445473,8.053051,12.631149,14.005491,14.688758,0.01566
Antigua and Barbuda,17.342963,23.676667,14.242167,16.372237,17.601626,18.305489,19.32572,20.783287,21.828403,20.658719,...,31.244412,26.705676,24.020044,32.768265,36.875282,36.220491,45.26585,47.183398,49.448219,2.537107
Argentina,86.343659,189.6,206.63,273.42,311.78,432.09,540.7,606.498927,705.144403,628.542376,...,494.433532,391.579048,479.93746,522.441503,561.398445,648.09005,901.123183,1267.449071,1600.0,0.257235
Armenia,182.16,209.24,263.89,335.86,787.51951,915.231162,1169.173063,1644.375091,1904.067123,1439.809232,...,1491.475903,1382.330887,1538.655752,1487.814908,1527.937457,1327.006085,1556.822544,2034.814437,1850.0,7.538712
Aruba,1.083799,1.335196,1.329609,0.273743,1.458101,0.819846,1.036166,5.195531,6.759777,9.050279,...,53.640101,52.7,56.140485,36.920295,34.264018,34.98404,35.845637,38.279493,34.449858,0.900179


In [None]:
# Display the dataset information
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 226 entries, Afghanistan to When using the data, please cite: World Bank-KNOMAD, December 2023
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   2000              199 non-null    float64
 1   2001              199 non-null    float64
 2   2002              199 non-null    float64
 3   2003              199 non-null    float64
 4   2004              199 non-null    float64
 5   2005              199 non-null    float64
 6   2006              199 non-null    float64
 7   2007              198 non-null    float64
 8   2008              198 non-null    float64
 9   2009              198 non-null    float64
 10  2010              198 non-null    float64
 11  2011              197 non-null    float64
 12  2012              197 non-null    float64
 13  2013              197 non-null    float64
 14  2014              197 non-null    float64
 15  2015              197 n

In [None]:
# replace empty rows with NaN values
df.replace('', np.nan, inplace=True)

In [None]:
# Check for missing values
df.isna().sum()

2000                27
2001                27
2002                27
2003                27
2004                27
2005                27
2006                27
2007                28
2008                28
2009                28
2010                28
2011                29
2012                29
2013                29
2014                29
2015                29
2016                29
2017                30
2018                30
2019                30
2020                30
2021                30
2022                30
2023e               30
% of GDP in 2023    36
dtype: int64

### Preprocessing the data

In [None]:
# Selecting country names only
df = df.iloc[:214, :]

# Remove all rows that have missing values
df.dropna(inplace=True)
df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(inplace=True)


Unnamed: 0_level_0,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,...,2015,2016,2017,2018,2019,2020,2021,2022,2023e,% of GDP in 2023
Remittance inflows (US$ million),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,89.500000,140.695166,...,348.624717,627.710802,822.731630,803.546454,828.571904,788.917115,320.000000,370.000000,300.000000,2.007898
Albania,597.8,699.300000,733.570000,888.748582,1160.672105,1289.704316,1359.467325,1468.020000,1865.600000,1717.698012,...,1290.863508,1306.009167,1311.822432,1458.210056,1472.812242,1465.987212,1718.355918,1745.245136,1970.000000,8.553317
Algeria,0.0,0.000000,0.000000,0.000000,0.000000,170.000000,189.000000,99.004563,103.631887,150.336961,...,1997.393458,1989.023597,1791.887073,1984.998399,1785.838683,1699.608935,1792.158957,1658.975810,1770.000000,0.790179
Andorra,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,21.100000,47.416324,53.001418,0.000000,0.000000,0.000000
Angola,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,82.084000,0.200000,...,11.114712,3.988048,1.418196,1.579247,3.445473,8.053051,12.631149,14.005491,14.688758,0.015660
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Vietnam,1585.0,1100.000000,1767.000000,2100.000000,2919.000000,3150.000000,3800.000000,6180.000000,6804.000000,6018.000000,...,8050.696859,8556.092341,9405.606825,10191.360839,10885.224766,10714.571746,12722.088864,13200.000000,14000.000000,3.233256
West Bank and Gaza,863.8,805.742463,771.775118,310.077968,398.558393,378.337239,464.056318,598.543423,740.724366,755.235908,...,1817.412109,2086.576176,2378.923437,2833.912788,3152.859814,2559.660846,3760.462368,4049.078713,3800.000000,19.882900
"Yemen, Rep.",0.0,0.000000,0.000000,0.000000,0.000000,1282.599000,1282.600000,1321.520000,1410.520000,1160.000000,...,3350.500000,3770.584000,3770.584000,3770.584000,3770.584000,3770.584000,3770.584000,3770.584000,3770.584000,17.916769
Zambia,0.0,0.000000,0.000000,36.300000,48.400000,52.900000,57.680000,59.300000,68.195000,41.264000,...,47.046538,38.464441,93.644095,106.965626,98.259121,134.864832,239.709361,243.486023,250.000000,0.846425


In [None]:
# Save the merged file to csv
df.to_csv('remittance_inflows_clean.csv', index=True)

##  Data Cleaning for Remittance Outflow

In [None]:
# Filepath
outflow_filepath = '/content/outward-remittance-flows-brief-39-december-2023-revised-as-of-mar.8-2024_1.xlsx'
outflow = pd.read_excel(outflow_filepath, index_col=0)
print(outflow.shape)
# Display the first few rows of the dataset
outflow.head(10)
outflow

(223, 24)


Unnamed: 0_level_0,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,% of GDP in 2022
Remittance outflows (US$ million),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.0,0.0,0.0,0.000000,0.00000,0.000000,0.000000,0.000000,216.600000,624.996677,...,524.163479,228.991809,167.894464,143.979106,234.618701,217.292318,225.420606,0.000000,0.000000,
Albania,0.0,0.0,0.0,4.135728,4.86374,6.511798,26.532132,9.940000,279.700000,229.819782,...,178.722842,153.312400,147.156757,106.330983,114.879337,119.679641,124.630376,140.344310,149.489653,0.783366
Algeria,0.0,0.0,0.0,0.000000,0.00000,27.000000,35.000000,48.863733,26.782545,45.702406,...,295.922112,72.128991,76.633496,214.378057,87.006383,81.545365,149.253465,83.259319,60.221598,0.030873
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,0.0,0.0,0.0,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,91.500000,83.804275,70.514091,0.000000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"For additional information, please also see ""International Transactions in Remittances: Guide for Compilers and Users"", International Monetary Fund, 2009.",,,,,,,,,,,...,,,,,,,,,,
GDP data from IMF World Economic Outlook,,,,,,,,,,,...,,,,,,,,,,
"For latest data and analysis on migration and remittances, please visit https://www.knomad.org/",,,,,,,,,,,...,,,,,,,,,,
Date: December 2023,,,,,,,,,,,...,,,,,,,,,,


In [None]:
# Rename some outflow columns
#outflow = outflow.rename(columns={'% of GDP in 2021': '%GDP_2021'})
# Selecting country names only
outflow = outflow.iloc[:214, :]
outflow

Unnamed: 0_level_0,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,% of GDP in 2022
Remittance outflows (US$ million),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,216.600000,624.996677,...,524.163479,228.991809,167.894464,143.979106,234.618701,217.292318,225.420606,0.000000,0.000000,
Albania,0.0,0.000000,0.000000,4.135728,4.863740,6.511798,26.532132,9.940000,279.700000,229.819782,...,178.722842,153.312400,147.156757,106.330983,114.879337,119.679641,124.630376,140.344310,149.489653,0.783366
Algeria,0.0,0.000000,0.000000,0.000000,0.000000,27.000000,35.000000,48.863733,26.782545,45.702406,...,295.922112,72.128991,76.633496,214.378057,87.006383,81.545365,149.253465,83.259319,60.221598,0.030873
American Samoa,,,,,,,,,,,...,,,,,,,,,,
Andorra,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,91.500000,83.804275,70.514091,0.000000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Virgin Islands (U.S.),,,,,,,,,,,...,,,,,,,,,,
West Bank and Gaza,5.6,4.900869,5.095333,17.883733,10.760551,6.475643,6.788488,7.929239,8.299895,7.912044,...,36.128914,30.398350,30.393004,27.462198,22.347335,29.079904,21.881177,32.247854,9.330391,0.051524
"Yemen, Rep.",0.0,0.000000,0.000000,0.000000,0.000000,109.495910,120.440000,318.720000,336.840000,336.840000,...,335.395533,333.389518,332.700000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,
Zambia,24.1,28.400000,37.500000,71.500000,76.100000,93.700000,115.457000,123.930000,138.885000,65.617000,...,81.199881,72.104699,63.405836,121.370621,132.001364,103.806724,88.510886,122.331952,131.254123,0.441309


In [None]:
# Display the dataset information
outflow.info()

<class 'pandas.core.frame.DataFrame'>
Index: 214 entries, Afghanistan to Zimbabwe
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   2000              201 non-null    float64
 1   2001              201 non-null    float64
 2   2002              201 non-null    float64
 3   2003              201 non-null    float64
 4   2004              201 non-null    float64
 5   2005              201 non-null    float64
 6   2006              201 non-null    float64
 7   2007              201 non-null    float64
 8   2008              201 non-null    float64
 9   2009              201 non-null    float64
 10  2010              201 non-null    float64
 11  2011              201 non-null    float64
 12  2012              201 non-null    float64
 13  2013              201 non-null    float64
 14  2014              201 non-null    float64
 15  2015              201 non-null    float64
 16  2016              201 non-null    

In [None]:
# replace empty rows with NaN values
outflow.replace('', np.nan, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  outflow.replace('', np.nan, inplace=True)


In [None]:
# Check for missing values
outflow.isnull().sum()

2000                13
2001                13
2002                13
2003                13
2004                13
2005                13
2006                13
2007                13
2008                13
2009                13
2010                13
2011                13
2012                13
2013                13
2014                13
2015                13
2016                13
2017                13
2018                13
2019                13
2020                13
2021                13
2022                13
% of GDP in 2022    56
dtype: int64

In [None]:
# Remove all rows that have missing values
outflow.dropna(inplace=True)
outflow.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  outflow.dropna(inplace=True)


Unnamed: 0_level_0,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,% of GDP in 2022
Remittance outflows (US$ million),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Albania,0.0,0.0,0.0,4.135728,4.86374,6.511798,26.532132,9.94,279.7,229.819782,...,178.722842,153.3124,147.156757,106.330983,114.879337,119.679641,124.630376,140.34431,149.489653,0.783366
Algeria,0.0,0.0,0.0,0.0,0.0,27.0,35.0,48.863733,26.782545,45.702406,...,295.922112,72.128991,76.633496,214.378057,87.006383,81.545365,149.253465,83.259319,60.221598,0.030873
Angola,266.29,216.06,223.507959,229.84997,296.009591,214.905216,412.666785,602.671451,669.453676,716.0,...,2746.615873,1252.909012,1176.110314,961.415276,681.627015,549.082043,576.471437,445.400144,517.668187,0.421619
Antigua and Barbuda,1.558148,1.564444,1.411481,1.489293,1.566737,1.670141,1.837155,2.1311,2.367652,2.122363,...,52.1,52.822644,54.742927,55.44539,54.174268,47.772733,44.608057,52.574952,59.787266,3.400868
Argentina,267.7,256.1,119.55,180.41,234.47,314.01,356.5,463.192097,631.436069,766.571548,...,732.414557,685.001366,769.242058,1060.48453,1010.375536,669.932291,521.040572,596.899083,590.412343,0.093626


In [None]:
outflow.shape

(158, 24)

In [None]:
outflow.dtypes

2000                float64
2001                float64
2002                float64
2003                float64
2004                float64
2005                float64
2006                float64
2007                float64
2008                float64
2009                float64
2010                float64
2011                float64
2012                float64
2013                float64
2014                float64
2015                float64
2016                float64
2017                float64
2018                float64
2019                float64
2020                float64
2021                float64
2022                float64
% of GDP in 2022    float64
dtype: object

In [None]:
# Save cleaned remittance outflow data to CSV file
outflow.to_csv('remittance_outflows_clean.csv', index=True)