# Remittance Data Enhancement with GDP Per Capita

This notebook enhances remittance data by adding GDP per capita information for both sending and receiving countries. It follows the same approach as notebook 21 but uses GDP per capita instead of absolute GDP values.

## Objectives:
1. Load remittance data from `22.csv`
2. Load World Bank GDP per capita data from `GDP_Capita.csv`
3. Create country code mappings using ISO3 standards
4. Integrate GDP per capita data for sending and receiving countries
5. Export enhanced dataset as `29.csv`

## 1. Import Required Libraries

In [1]:
import pandas as pd
import numpy as np
import country_converter as coco
import os

# Display settings for better output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

## 2. Load Input Datasets

In [2]:
# Load remittance data from 22.csv
df = pd.read_csv(r'C:\Users\clint\Desktop\RER\Code\22.csv')
print("Remittance dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
df.head()

Remittance dataset loaded successfully!
Shape: (3980, 11)
Columns: ['Sending_Country', 'Receiving_Country', 'Year', 'Value', 'Unit', 'Source', 'Region', 'Sending_Country_Code', 'Receiving_Country_Code', 'Sending_Country_GDP', 'Receiving_Country_GDP']


Unnamed: 0,Sending_Country,Receiving_Country,Year,Value,Unit,Source,Region,Sending_Country_Code,Receiving_Country_Code,Sending_Country_GDP,Receiving_Country_GDP
0,Algeria,Senegal,2021,0.183414825,USD millions,BCEAO,Africa,DZA,SEN,199488.9,24359.596784
1,Australia,Ethiopia,2020,13.59617511,USD millions,National Bank of Ethiopia,Africa,AUS,ETH,1491063.0,95071.775812
2,Australia,Kenya,2024,184497.099695719,USD millions,Central Bank of Kenya,Africa,AUS,KEN,1665258.0,104575.203136
3,Australia,Uganda,2022,22.0,USD millions,Bank of Uganda,Africa,AUS,UGA,1587133.0,44147.21689
4,Austria,Kenya,2024,13169.065145833,USD millions,Central Bank of Kenya,Africa,AUT,KEN,418190.4,104575.203136


In [3]:
# Load GDP per capita data
gdp_per_capita_df = pd.read_csv(r'C:\Users\clint\Desktop\RER\data\Remittance_4\GDP_Capita.csv')
print("GDP per capita dataset loaded successfully!")
print(f"Shape: {gdp_per_capita_df.shape}")
print(f"Columns: {gdp_per_capita_df.columns.tolist()}")
gdp_per_capita_df.head()

GDP per capita dataset loaded successfully!
Shape: (266, 54)
Columns: ['Series Name', 'Series Code', 'Country Name', 'Country Code', '1975 [YR1975]', '1976 [YR1976]', '1977 [YR1977]', '1978 [YR1978]', '1979 [YR1979]', '1980 [YR1980]', '1981 [YR1981]', '1982 [YR1982]', '1983 [YR1983]', '1984 [YR1984]', '1985 [YR1985]', '1986 [YR1986]', '1987 [YR1987]', '1988 [YR1988]', '1989 [YR1989]', '1990 [YR1990]', '1991 [YR1991]', '1992 [YR1992]', '1993 [YR1993]', '1994 [YR1994]', '1995 [YR1995]', '1996 [YR1996]', '1997 [YR1997]', '1998 [YR1998]', '1999 [YR1999]', '2000 [YR2000]', '2001 [YR2001]', '2002 [YR2002]', '2003 [YR2003]', '2004 [YR2004]', '2005 [YR2005]', '2006 [YR2006]', '2007 [YR2007]', '2008 [YR2008]', '2009 [YR2009]', '2010 [YR2010]', '2011 [YR2011]', '2012 [YR2012]', '2013 [YR2013]', '2014 [YR2014]', '2015 [YR2015]', '2016 [YR2016]', '2017 [YR2017]', '2018 [YR2018]', '2019 [YR2019]', '2020 [YR2020]', '2021 [YR2021]', '2022 [YR2022]', '2023 [YR2023]', '2024 [YR2024]']


Unnamed: 0,Series Name,Series Code,Country Name,Country Code,1975 [YR1975],1976 [YR1976],1977 [YR1977],1978 [YR1978],1979 [YR1979],1980 [YR1980],1981 [YR1981],1982 [YR1982],1983 [YR1983],1984 [YR1984],1985 [YR1985],1986 [YR1986],1987 [YR1987],1988 [YR1988],1989 [YR1989],1990 [YR1990],1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],2001 [YR2001],2002 [YR2002],2003 [YR2003],2004 [YR2004],2005 [YR2005],2006 [YR2006],2007 [YR2007],2008 [YR2008],2009 [YR2009],2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020],2021 [YR2021],2022 [YR2022],2023 [YR2023],2024 [YR2024]
0,GDP per capita (constant 2015 US$),NY.GDP.PCAP.KD,Afghanistan,AFG,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,308.3182697,277.1180514,338.1399736,346.0716271,338.6372739,363.6401414,367.7583117,410.7577289,417.6472826,488.8306525,542.8710305,525.4269828,568.9290215,580.6038333,575.1462458,565.5697304,563.8723367,562.7695741,553.1251517,557.8615332,527.8345545,408.6258552,377.6656271,378.0663031,..
1,GDP per capita (constant 2015 US$),NY.GDP.PCAP.KD,Albania,ALB,..,..,..,..,..,1823.48926,1890.021975,1905.063527,1885.6928,1823.323563,1818.02507,1883.740414,1831.947039,1772.179358,1894.878274,1682.881087,1218.964273,1138.23774,1254.678884,1367.22059,1558.966386,1683.769655,1495.900353,1630.438741,1841.807628,1991.914359,2188.918545,2297.108535,2428.689347,2567.297637,2712.869838,2894.360326,3105.879676,3345.981618,3459.246956,3579.824132,3677.894579,3720.228765,3790.680163,3883.632628,3981.726623,4143.989883,4283.982627,4452.237147,4563.467363,4437.653469,4880.723462,5178.884315,5444.930001,5726.025705
2,GDP per capita (constant 2015 US$),NY.GDP.PCAP.KD,Algeria,DZA,3290.827601,3408.018208,3459.303996,3672.11778,3830.809592,3742.710247,3731.939054,3840.575386,3912.936591,3993.395405,4002.556194,3888.364759,3746.165667,3606.268388,3666.853079,3605.687012,3478.509763,3455.929363,3302.919506,3201.569144,3255.187269,3322.955381,3297.466034,3410.882544,3471.491963,3553.324205,3610.006056,3754.660815,3945.898161,4066.282048,4223.602521,4279.615471,4339.250558,4367.56538,4336.100541,4456.610274,4501.354301,4518.43979,4543.23444,4634.101492,4685.059027,4768.731401,4742.900755,4717.003589,4672.664087,4363.685338,4456.746876,4544.466881,4660.405457,4747.346248
3,GDP per capita (constant 2015 US$),NY.GDP.PCAP.KD,American Samoa,ASM,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,..,13054.63643,13201.73475,13291.53566,13284.27911,12794.7565,13092.31114,12809.46331,12342.05754,12446.30863,12521.80951,12068.26674,11871.79372,12203.08947,12727.41027,12665.28299,11930.77712,12412.60241,12524.01603,13194.27474,13233.57848,13709.09749,..,..
4,GDP per capita (constant 2015 US$),NY.GDP.PCAP.KD,Andorra,AND,40073.41037,39058.58866,38021.81983,36647.70204,34983.99909,34281.74395,32958.76504,31996.5442,31117.19623,30392.84207,29997.08627,29680.13294,30064.99797,30469.34424,31180.41233,31143.14687,29642.30598,28161.94249,26518.23363,26587.1383,27619.50606,28871.21607,31140.68968,31800.36407,32938.64374,34113.81738,36790.10371,38084.48431,39620.32016,40054.32115,40528.17486,41322.05292,40789.15897,37775.12614,35604.38401,36277.26444,37637.48683,36206.97843,35684.16943,37300.62442,38654.93472,40085.0206,39361.06226,39242.1449,39346.27503,34536.64992,36929.01915,39780.39661,40227.23353,41034.5074


## 3. Explore Dataset Structure

In [4]:
# Examine remittance dataset structure
print("=== REMITTANCE DATASET ANALYSIS ===")
print(f"Shape: {df.shape}")
print(f"Data types:\n{df.dtypes}")
print(f"\nUnique years: {sorted(df['Year'].unique())}")
print(f"Year range: {df['Year'].min()} - {df['Year'].max()}")

# Check if country columns exist and their names
country_columns = [col for col in df.columns if 'Country' in col]
print(f"\nCountry-related columns: {country_columns}")

# Check for missing values
print(f"\nMissing values per column:")
print(df.isnull().sum())

=== REMITTANCE DATASET ANALYSIS ===
Shape: (3980, 11)
Data types:
Sending_Country            object
Receiving_Country          object
Year                        int64
Value                      object
Unit                       object
Source                     object
Region                     object
Sending_Country_Code       object
Receiving_Country_Code     object
Sending_Country_GDP       float64
Receiving_Country_GDP     float64
dtype: object

Unique years: [np.int64(2018), np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2024)]
Year range: 2018 - 2024

Country-related columns: ['Sending_Country', 'Receiving_Country', 'Sending_Country_Code', 'Receiving_Country_Code', 'Sending_Country_GDP', 'Receiving_Country_GDP']

Missing values per column:
Sending_Country            0
Receiving_Country          0
Year                       0
Value                      0
Unit                       0
Source                     0
Region                     0
Sending_Countr

In [5]:
# Examine GDP per capita dataset structure
print("=== GDP PER CAPITA DATASET ANALYSIS ===")
print(f"Shape: {gdp_per_capita_df.shape}")
print(f"Data types:\n{gdp_per_capita_df.dtypes}")

# Check for year columns
year_columns = [col for col in gdp_per_capita_df.columns if '[YR' in col]
available_years = [int(col.split('[YR')[1].split(']')[0]) for col in year_columns]
print(f"\nAvailable GDP per capita years: {sorted(available_years)}")
print(f"Year range: {min(available_years)} - {max(available_years)}")

# Check key identifier columns
print(f"\nKey columns:")
key_cols = ['Country Name', 'Country Code', 'Series Name']
for col in key_cols:
    if col in gdp_per_capita_df.columns:
        print(f"  {col}: {gdp_per_capita_df[col].nunique()} unique values")

# Sample of GDP per capita data
print(f"\nSample of GDP per capita data:")
sample_cols = ['Country Name', 'Country Code', 'Series Name']
if len(year_columns) > 0:
    sample_cols.extend(year_columns[-3:])  # Show last 3 years
print(gdp_per_capita_df[sample_cols].head())

=== GDP PER CAPITA DATASET ANALYSIS ===
Shape: (266, 54)
Data types:
Series Name      object
Series Code      object
Country Name     object
Country Code     object
1975 [YR1975]    object
1976 [YR1976]    object
1977 [YR1977]    object
1978 [YR1978]    object
1979 [YR1979]    object
1980 [YR1980]    object
1981 [YR1981]    object
1982 [YR1982]    object
1983 [YR1983]    object
1984 [YR1984]    object
1985 [YR1985]    object
1986 [YR1986]    object
1987 [YR1987]    object
1988 [YR1988]    object
1989 [YR1989]    object
1990 [YR1990]    object
1991 [YR1991]    object
1992 [YR1992]    object
1993 [YR1993]    object
1994 [YR1994]    object
1995 [YR1995]    object
1996 [YR1996]    object
1997 [YR1997]    object
1998 [YR1998]    object
1999 [YR1999]    object
2000 [YR2000]    object
2001 [YR2001]    object
2002 [YR2002]    object
2003 [YR2003]    object
2004 [YR2004]    object
2005 [YR2005]    object
2006 [YR2006]    object
2007 [YR2007]    object
2008 [YR2008]    object
2009 [YR2009]    ob

## 4. Data Preprocessing and Cleaning

In [6]:
# Get all unique countries from remittance data
if 'Sending_Country' in df.columns and 'Receiving_Country' in df.columns:
    sending_countries = set(df['Sending_Country'].unique())
    receiving_countries = set(df['Receiving_Country'].unique())
else:
    # Check for alternative column names
    sending_col = None
    receiving_col = None
    for col in df.columns:
        if 'send' in col.lower() and 'country' in col.lower():
            sending_col = col
        elif 'receiv' in col.lower() and 'country' in col.lower():
            receiving_col = col
    
    if sending_col and receiving_col:
        sending_countries = set(df[sending_col].unique())
        receiving_countries = set(df[receiving_col].unique())
        print(f"Using columns: {sending_col}, {receiving_col}")
    else:
        print("Could not identify country columns!")
        print(f"Available columns: {df.columns.tolist()}")

# Combine all unique countries
all_countries = sending_countries.union(receiving_countries)
all_countries = sorted(list(all_countries))

print(f"Total unique countries in remittance data: {len(all_countries)}")
print("\nFirst 20 countries:")
for country in all_countries[:20]:
    print(f"- {country}")

Total unique countries in remittance data: 257

First 20 countries:
- Afghanistan
- Albania
- Algeria
- American Samoa
- Andorra
- Angola
- Anguilla
- Antigua and Barbuda
- Argentina
- Armenia
- Aruba
- Australia
- Austria
- Azerbaijan
- Bahamas
- Bahamas. The
- Bahrain
- Bangladesh
- Barbados
- Belarus


In [7]:
# Create comprehensive country code mapping (same as notebook 21)
def create_country_code_mapping(countries):
    """Create a mapping from country names to ISO3 codes"""
    mapping = {}
    unmapped_countries = []
    
    for country in countries:
        # Try to convert to ISO3 code
        iso3_code = coco.convert(country, to='ISO3')
        
        # Check if conversion was successful and is a string (not a list)
        if (iso3_code != 'not found' and 
            iso3_code != country and 
            isinstance(iso3_code, str) and 
            len(iso3_code) == 3):
            mapping[country] = iso3_code
        else:
            unmapped_countries.append(country)
    
    return mapping, unmapped_countries

# Create automatic mapping
country_mapping, unmapped = create_country_code_mapping(all_countries)

print(f"Successfully mapped {len(country_mapping)} countries automatically")
print(f"Could not map {len(unmapped)} countries")

if unmapped:
    print(f"\nUnmapped countries:")
    for country in sorted(unmapped):
        print(f"- {country}")

Channel Islands not found in regex
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of
More than one regular expression match for China, Taiwan Province of

Successfully mapped 250 countries automatically
Could not map 7 countries

Unmapped countries:
- Channel Islands
- China, Taiwan Province of
- Congo. Rep.
- Dem. People's Republic of ..
- Democratic Republic of the
- Grenadines
- Republic


In [8]:
# Add manual mappings for problematic country names (from notebook 21)
manual_mappings = {
    'Bahamas. The': 'BHS',  # The Bahamas
    'Congo. Dem. Rep.': 'COD',  # Democratic Republic of the Congo
    'Congo. Rep.': 'COG',  # Republic of the Congo
    'Cote d\'Ivoire': 'CIV',  # Côte d'Ivoire
    'Egypt. Arab Rep.': 'EGY',  # Egypt
    'Gambia. The': 'GMB',  # The Gambia
    'Iran. Islamic Rep.': 'IRN',  # Iran
    'Korea. Dem. People\'s Rep.': 'PRK',  # North Korea
    'Korea. Rep.': 'KOR',  # South Korea
    'Kyrgyz Republic': 'KGZ',  # Kyrgyzstan
    'Lao PDR': 'LAO',  # Laos
    'Slovak Republic': 'SVK',  # Slovakia
    'Venezuela. RB': 'VEN',  # Venezuela
    'Yemen. Rep.': 'YEM',  # Yemen
    'Micronesia. Fed. Sts.': 'FSM',  # Federated States of Micronesia
    'St. Kitts and Nevis': 'KNA',  # Saint Kitts and Nevis
    'St. Lucia': 'LCA',  # Saint Lucia
    'St. Vincent and the Grenadines': 'VCT',  # Saint Vincent and the Grenadines
    
    # Additional specific mappings
    'Republic': 'LAO',  # Map "Republic" to Laos
    'Central African Republic': 'CAF',
    'Czech Republic': 'CZE', 
    'Dominican Republic': 'DOM',
    'Republic of Korea': 'KOR',  # South Korea
    'Republic of Moldova': 'MDA',
    'Syrian Arab Republic': 'SYR',
    'United Republic of Tanzania': 'TZA',
    
    # China and related territories
    'China': 'CHN',
    'China, Hong Kong SAR': 'HKG',
    'China, Macao SAR': 'MAC',
    'China, Taiwan Province of': 'TWN',
    'Hong Kong SAR. China': 'HKG',
    'Macao SAR. China': 'MAC',
    
    # Truncated or problematic names
    'Dem. People\'s Republic of ..': 'PRK',  # North Korea
    'Democratic Republic of the': 'COD',     # DR Congo
    'Grenadines': 'VCT',                     # St. Vincent and the Grenadines
    'Channel Islands': 'GGY',                # Guernsey as representative
}

# Merge the automatic and manual mappings
final_country_mapping = {**country_mapping, **manual_mappings}

# Check coverage
total_mapped = len(final_country_mapping)
total_countries = len(all_countries)

print(f"\nFinal mapping statistics:")
print(f"Total countries: {total_countries}")
print(f"Successfully mapped: {total_mapped}")
print(f"Remaining unmapped: {total_countries - total_mapped}")
print(f"Coverage: {(total_mapped/total_countries)*100:.1f}%")

# Show remaining unmapped countries
still_unmapped = [country for country in all_countries if country not in final_country_mapping]
if still_unmapped:
    print(f"\nStill unmapped countries:")
    for country in sorted(still_unmapped):
        print(f"- {country}")


Final mapping statistics:
Total countries: 257
Successfully mapped: 258
Remaining unmapped: -1
Coverage: 100.4%


In [9]:
# Function to get GDP per capita value for a specific country code and year
def get_gdp_per_capita_value(country_code, year, gdp_per_capita_dataframe):
    """
    Get GDP per capita value for a specific country code and year
    """
    try:
        # Create the year column name in the GDP dataframe format
        year_col = f"{year} [YR{year}]"
        
        # Check if the year column exists
        if year_col not in gdp_per_capita_dataframe.columns:
            return None
            
        # Find the row for the country code
        country_row = gdp_per_capita_dataframe[gdp_per_capita_dataframe['Country Code'] == country_code]
        
        if country_row.empty:
            return None
            
        # Get the GDP per capita value
        gdp_per_capita_value = country_row[year_col].iloc[0]
        
        # Handle missing or empty values
        if pd.isna(gdp_per_capita_value) or gdp_per_capita_value == '' or gdp_per_capita_value == '..':
            return None
            
        # Convert to float
        gdp_per_capita_float = float(str(gdp_per_capita_value).replace(',', ''))
        
        return gdp_per_capita_float
        
    except Exception as e:
        return None

# Test the function with a few examples
print("Testing GDP per capita lookup function:")
print("="*50)
test_cases = [
    ('USA', 2021),  # United States
    ('DEU', 2021),  # Germany
    ('CHN', 2021)   # China
]

for country_code, year in test_cases:
    gdp_per_capita_value = get_gdp_per_capita_value(country_code, year, gdp_per_capita_df)
    if gdp_per_capita_value:
        print(f"{country_code} {year}: ${gdp_per_capita_value:,.2f}")
    else:
        print(f"{country_code} {year}: Not found")

Testing GDP per capita lookup function:
USA 2021: $62,986.66
DEU 2021: $43,898.77
CHN 2021: $11,469.57


## 5. Merge Datasets on Common Key

In [10]:
# Apply country code mapping to the remittance dataset
print("Adding country codes to remittance dataset...")
print("="*50)

# Determine the correct column names
if 'Sending_Country' in df.columns:
    sending_col = 'Sending_Country'
    receiving_col = 'Receiving_Country'
else:
    # Find country columns
    for col in df.columns:
        if 'send' in col.lower() and 'country' in col.lower():
            sending_col = col
        elif 'receiv' in col.lower() and 'country' in col.lower():
            receiving_col = col

print(f"Using columns: {sending_col}, {receiving_col}")

# Create enhanced dataframe with country codes
df_enhanced = df.copy()

# Add country codes
df_enhanced['Sending_Country_Code'] = df_enhanced[sending_col].map(final_country_mapping)
df_enhanced['Receiving_Country_Code'] = df_enhanced[receiving_col].map(final_country_mapping)

# Check mapping success
sending_missing = df_enhanced['Sending_Country_Code'].isnull().sum()
receiving_missing = df_enhanced['Receiving_Country_Code'].isnull().sum()

print(f"Country Code Mapping Results:")
print(f"Total rows: {len(df_enhanced):,}")
print(f"Rows with missing Sending_Country_Code: {sending_missing}")
print(f"Rows with missing Receiving_Country_Code: {receiving_missing}")
print(f"Mapping success rate: {((len(df_enhanced) - max(sending_missing, receiving_missing))/len(df_enhanced))*100:.2f}%")

# Show sample
print(f"\nSample with country codes:")
sample_cols = [sending_col, 'Sending_Country_Code', receiving_col, 'Receiving_Country_Code', 'Year']
if 'Value' in df_enhanced.columns:
    sample_cols.append('Value')
print(df_enhanced[sample_cols].head())

Adding country codes to remittance dataset...
Using columns: Sending_Country, Receiving_Country
Country Code Mapping Results:
Total rows: 3,980
Rows with missing Sending_Country_Code: 0
Rows with missing Receiving_Country_Code: 0
Mapping success rate: 100.00%

Sample with country codes:
  Sending_Country Sending_Country_Code Receiving_Country  \
0         Algeria                  DZA           Senegal   
1       Australia                  AUS          Ethiopia   
2       Australia                  AUS             Kenya   
3       Australia                  AUS            Uganda   
4         Austria                  AUT             Kenya   

  Receiving_Country_Code  Year              Value  
0                    SEN  2021        0.183414825  
1                    ETH  2020        13.59617511  
2                    KEN  2024  184,497.099695719  
3                    UGA  2022                 22  
4                    KEN  2024   13,169.065145833  


In [11]:
# Add GDP per capita data to the enhanced dataset
print("Adding GDP per capita data...")
print("="*50)

# Add GDP per capita columns for sending countries
print("Processing Sending Country GDP per capita...")
df_enhanced['Sending_Country_GDP_Per_Capita'] = df_enhanced.apply(
    lambda row: get_gdp_per_capita_value(row['Sending_Country_Code'], row['Year'], gdp_per_capita_df), 
    axis=1
)

# Add GDP per capita columns for receiving countries
print("Processing Receiving Country GDP per capita...")
df_enhanced['Receiving_Country_GDP_Per_Capita'] = df_enhanced.apply(
    lambda row: get_gdp_per_capita_value(row['Receiving_Country_Code'], row['Year'], gdp_per_capita_df), 
    axis=1
)

print("✅ GDP per capita data integration completed!")

# Check the results
print(f"\nGDP per capita integration results:")
print(f"Total rows: {len(df_enhanced):,}")
print(f"Rows with Sending Country GDP per capita: {df_enhanced['Sending_Country_GDP_Per_Capita'].notna().sum():,}")
print(f"Rows with Receiving Country GDP per capita: {df_enhanced['Receiving_Country_GDP_Per_Capita'].notna().sum():,}")

# Calculate success rates
sending_gdp_rate = (df_enhanced['Sending_Country_GDP_Per_Capita'].notna().sum() / len(df_enhanced)) * 100
receiving_gdp_rate = (df_enhanced['Receiving_Country_GDP_Per_Capita'].notna().sum() / len(df_enhanced)) * 100

print(f"Sending Country GDP per capita match rate: {sending_gdp_rate:.1f}%")
print(f"Receiving Country GDP per capita match rate: {receiving_gdp_rate:.1f}%")

# Show the enhanced dataframe structure
print(f"\nEnhanced DataFrame Columns:")
for i, col in enumerate(df_enhanced.columns, 1):
    print(f"{i:2d}. {col}")

Adding GDP per capita data...
Processing Sending Country GDP per capita...
Processing Receiving Country GDP per capita...
Processing Receiving Country GDP per capita...
✅ GDP per capita data integration completed!

GDP per capita integration results:
Total rows: 3,980
Rows with Sending Country GDP per capita: 3,894
Rows with Receiving Country GDP per capita: 3,926
Sending Country GDP per capita match rate: 97.8%
Receiving Country GDP per capita match rate: 98.6%

Enhanced DataFrame Columns:
 1. Sending_Country
 2. Receiving_Country
 3. Year
 4. Value
 5. Unit
 6. Source
 7. Region
 8. Sending_Country_Code
 9. Receiving_Country_Code
10. Sending_Country_GDP
11. Receiving_Country_GDP
12. Sending_Country_GDP_Per_Capita
13. Receiving_Country_GDP_Per_Capita
✅ GDP per capita data integration completed!

GDP per capita integration results:
Total rows: 3,980
Rows with Sending Country GDP per capita: 3,894
Rows with Receiving Country GDP per capita: 3,926
Sending Country GDP per capita match rat

## 6. Validate Merged Data

In [12]:
# Comprehensive validation of the merged dataset
print("VALIDATION OF MERGED DATASET")
print("="*60)

print(f"✅ Dataset Shape: {df_enhanced.shape}")
print(f"✅ Original rows preserved: {len(df_enhanced) == len(df)}")

# Check for data integrity
print(f"\n📊 Data Completeness:")
print(f"   - Total records: {len(df_enhanced):,}")
print(f"   - Years covered: {df_enhanced['Year'].min()} - {df_enhanced['Year'].max()}")
print(f"   - Unique sending countries: {df_enhanced[sending_col].nunique()}")
print(f"   - Unique receiving countries: {df_enhanced[receiving_col].nunique()}")

# Validate GDP per capita data
print(f"\n💰 GDP Per Capita Data Quality:")
gdp_cols = ['Sending_Country_GDP_Per_Capita', 'Receiving_Country_GDP_Per_Capita']
for col in gdp_cols:
    non_null_count = df_enhanced[col].notna().sum()
    coverage = (non_null_count / len(df_enhanced)) * 100
    
    if non_null_count > 0:
        mean_val = df_enhanced[col].mean()
        median_val = df_enhanced[col].median()
        min_val = df_enhanced[col].min()
        max_val = df_enhanced[col].max()
        
        print(f"   {col}:")
        print(f"     - Coverage: {non_null_count:,}/{len(df_enhanced):,} ({coverage:.1f}%)")
        print(f"     - Range: ${min_val:,.2f} - ${max_val:,.2f}")
        print(f"     - Mean: ${mean_val:,.2f}, Median: ${median_val:,.2f}")

# Check for any problematic values
print(f"\n🔍 Data Quality Checks:")
negative_gdp_sending = (df_enhanced['Sending_Country_GDP_Per_Capita'] < 0).sum()
negative_gdp_receiving = (df_enhanced['Receiving_Country_GDP_Per_Capita'] < 0).sum()

print(f"   - Negative GDP per capita values (sending): {negative_gdp_sending}")
print(f"   - Negative GDP per capita values (receiving): {negative_gdp_receiving}")

# Show sample of merged data
print(f"\n📋 Sample of Enhanced Dataset:")
sample_cols = [sending_col, 'Sending_Country_Code', 'Sending_Country_GDP_Per_Capita',
               receiving_col, 'Receiving_Country_Code', 'Receiving_Country_GDP_Per_Capita',
               'Year']
if 'Value' in df_enhanced.columns:
    sample_cols.append('Value')

print(df_enhanced[sample_cols].head(10))

VALIDATION OF MERGED DATASET
✅ Dataset Shape: (3980, 13)
✅ Original rows preserved: True

📊 Data Completeness:
   - Total records: 3,980
   - Years covered: 2018 - 2024
   - Unique sending countries: 257
   - Unique receiving countries: 214

💰 GDP Per Capita Data Quality:
   Sending_Country_GDP_Per_Capita:
     - Coverage: 3,894/3,980 (97.8%)
     - Range: $253.69 - $185,582.76
     - Mean: $14,561.80, Median: $4,723.19
   Receiving_Country_GDP_Per_Capita:
     - Coverage: 3,926/3,980 (98.6%)
     - Range: $255.08 - $185,582.76
     - Mean: $11,701.47, Median: $4,975.81

🔍 Data Quality Checks:
   - Negative GDP per capita values (sending): 0
   - Negative GDP per capita values (receiving): 0

📋 Sample of Enhanced Dataset:
  Sending_Country Sending_Country_Code  Sending_Country_GDP_Per_Capita  \
0         Algeria                  DZA                     4456.746876   
1       Australia                  AUS                    58132.798520   
2       Australia                  AUS        

In [13]:
# Show examples of successful GDP per capita matches
print("EXAMPLES OF SUCCESSFUL GDP PER CAPITA MATCHES")
print("="*60)

# Find records with both sending and receiving GDP per capita data
complete_records = df_enhanced[
    (df_enhanced['Sending_Country_GDP_Per_Capita'].notna()) & 
    (df_enhanced['Receiving_Country_GDP_Per_Capita'].notna())
]

print(f"Records with complete GDP per capita data: {len(complete_records):,}/{len(df_enhanced):,} ({len(complete_records)/len(df_enhanced)*100:.1f}%)")

if len(complete_records) > 0:
    print(f"\nSample of complete records:")
    sample = complete_records.head()
    for _, row in sample.iterrows():
        sending_gdp = row['Sending_Country_GDP_Per_Capita']
        receiving_gdp = row['Receiving_Country_GDP_Per_Capita']
        print(f"{row[sending_col]} (${sending_gdp:,.0f}) → {row[receiving_col]} (${receiving_gdp:,.0f}) | {row['Year']}")
        
# Show countries with highest and lowest GDP per capita in the dataset
print(f"\n📈 Countries with highest GDP per capita (in dataset):")
high_gdp_sending = df_enhanced.dropna(subset=['Sending_Country_GDP_Per_Capita']).nlargest(5, 'Sending_Country_GDP_Per_Capita')
for _, row in high_gdp_sending.iterrows():
    print(f"   {row[sending_col]}: ${row['Sending_Country_GDP_Per_Capita']:,.0f} ({row['Year']})")

print(f"\n📉 Countries with lowest GDP per capita (in dataset):")
low_gdp_sending = df_enhanced.dropna(subset=['Sending_Country_GDP_Per_Capita']).nsmallest(5, 'Sending_Country_GDP_Per_Capita')
for _, row in low_gdp_sending.iterrows():
    print(f"   {row[sending_col]}: ${row['Sending_Country_GDP_Per_Capita']:,.0f} ({row['Year']})")

EXAMPLES OF SUCCESSFUL GDP PER CAPITA MATCHES
Records with complete GDP per capita data: 3,840/3,980 (96.5%)

Sample of complete records:
Algeria ($4,457) → Senegal ($1,415) | 2021
Australia ($58,133) → Ethiopia ($799) | 2020
Australia ($61,212) → Kenya ($1,853) | 2024
Australia ($61,010) → Uganda ($933) | 2022
Austria ($45,562) → Kenya ($1,853) | 2024

📈 Countries with highest GDP per capita (in dataset):
   Monaco: $185,583 (2019)
   Monaco: $185,583 (2019)
   Monaco: $173,042 (2018)
   Monaco: $173,042 (2018)
   Monaco: $161,263 (2020)

📉 Countries with lowest GDP per capita (in dataset):
   Burundi: $254 (2022)
   Burundi: $255 (2020)
   Burundi: $255 (2020)
   Burundi: $256 (2021)
   Burundi: $262 (2019)


## 7. Save Final Dataset

In [14]:
# Save the enhanced dataset with GDP per capita to CSV
output_path = r'C:\Users\clint\Desktop\RER\Code\29.csv'

print("SAVING ENHANCED DATASET")
print("="*60)
print(f"Output file: {output_path}")
print(f"Dataset shape: {df_enhanced.shape}")
print(f"Total columns: {len(df_enhanced.columns)}")

# Show final column list
print(f"\nFinal columns in the enhanced dataset:")
for i, col in enumerate(df_enhanced.columns, 1):
    print(f"{i:2d}. {col}")

# Save to CSV
print(f"\nSaving to CSV...")
df_enhanced.to_csv(output_path, index=False)
print(f"✅ Successfully saved!")

# Verify the file was created and show file info
if os.path.exists(output_path):
    file_size = os.path.getsize(output_path)
    print(f"\n📁 File Information:")
    print(f"   - Location: {output_path}")
    print(f"   - Size: {file_size:,} bytes ({file_size/(1024*1024):.2f} MB)")
    print(f"   - Records: {len(df_enhanced):,}")
    print(f"   - Columns: {len(df_enhanced.columns)}")
else:
    print("❌ Error: File was not created!")

print(f"\n🎉 Dataset 29.csv with GDP per capita is ready for analysis!")

SAVING ENHANCED DATASET
Output file: C:\Users\clint\Desktop\RER\Code\29.csv
Dataset shape: (3980, 13)
Total columns: 13

Final columns in the enhanced dataset:
 1. Sending_Country
 2. Receiving_Country
 3. Year
 4. Value
 5. Unit
 6. Source
 7. Region
 8. Sending_Country_Code
 9. Receiving_Country_Code
10. Sending_Country_GDP
11. Receiving_Country_GDP
12. Sending_Country_GDP_Per_Capita
13. Receiving_Country_GDP_Per_Capita

Saving to CSV...
✅ Successfully saved!

📁 File Information:
   - Location: C:\Users\clint\Desktop\RER\Code\29.csv
   - Size: 557,615 bytes (0.53 MB)
   - Records: 3,980
   - Columns: 13

🎉 Dataset 29.csv with GDP per capita is ready for analysis!


In [15]:
# Final summary of the enhanced dataset
print("FINAL SUMMARY - REMITTANCE DATA WITH GDP PER CAPITA")
print("="*70)

print(f"✅ Enhancement Complete!")
print(f"   📊 Original dataset: 22.csv")
print(f"   📊 GDP per capita source: GDP_Capita.csv")
print(f"   📊 Output dataset: 29.csv")

print(f"\n✅ Dataset Statistics:")
print(f"   📈 Total records: {len(df_enhanced):,}")
print(f"   📅 Years covered: {df_enhanced['Year'].min()} - {df_enhanced['Year'].max()}")
print(f"   🌍 Sending countries: {df_enhanced[sending_col].nunique()}")
print(f"   🌍 Receiving countries: {df_enhanced[receiving_col].nunique()}")

print(f"\n✅ GDP Per Capita Integration:")
sending_coverage = df_enhanced['Sending_Country_GDP_Per_Capita'].notna().sum()
receiving_coverage = df_enhanced['Receiving_Country_GDP_Per_Capita'].notna().sum()
print(f"   💰 Sending Country GDP per capita: {sending_coverage:,}/{len(df_enhanced):,} ({sending_coverage/len(df_enhanced)*100:.1f}%)")
print(f"   💰 Receiving Country GDP per capita: {receiving_coverage:,}/{len(df_enhanced):,} ({receiving_coverage/len(df_enhanced)*100:.1f}%)")

print(f"\n✅ New Columns Added:")
new_columns = ['Sending_Country_Code', 'Receiving_Country_Code', 
               'Sending_Country_GDP_Per_Capita', 'Receiving_Country_GDP_Per_Capita']
for i, col in enumerate(new_columns, 1):
    print(f"   {i}. {col}")

print(f"\n✅ Data Notes:")
print(f"   - GDP per capita values are in current USD")
print(f"   - Country codes follow ISO3 standard")
print(f"   - Missing values indicate countries/years not in World Bank dataset")
print(f"   - All original data columns preserved")

print(f"\n💾 Enhanced dataset ready for economic analysis!")
print(f"   Variable: df_enhanced")
print(f"   File: {output_path}")
print(f"   Shape: {df_enhanced.shape}")

# Show the final enhanced dataframe
print(f"\n📋 Final Enhanced Dataset:")
df_enhanced.head()

FINAL SUMMARY - REMITTANCE DATA WITH GDP PER CAPITA
✅ Enhancement Complete!
   📊 Original dataset: 22.csv
   📊 GDP per capita source: GDP_Capita.csv
   📊 Output dataset: 29.csv

✅ Dataset Statistics:
   📈 Total records: 3,980
   📅 Years covered: 2018 - 2024
   🌍 Sending countries: 257
   🌍 Receiving countries: 214

✅ GDP Per Capita Integration:
   💰 Sending Country GDP per capita: 3,894/3,980 (97.8%)
   💰 Receiving Country GDP per capita: 3,926/3,980 (98.6%)

✅ New Columns Added:
   1. Sending_Country_Code
   2. Receiving_Country_Code
   3. Sending_Country_GDP_Per_Capita
   4. Receiving_Country_GDP_Per_Capita

✅ Data Notes:
   - GDP per capita values are in current USD
   - Country codes follow ISO3 standard
   - Missing values indicate countries/years not in World Bank dataset
   - All original data columns preserved

💾 Enhanced dataset ready for economic analysis!
   Variable: df_enhanced
   File: C:\Users\clint\Desktop\RER\Code\29.csv
   Shape: (3980, 13)

📋 Final Enhanced Dataset:


Unnamed: 0,Sending_Country,Receiving_Country,Year,Value,Unit,Source,Region,Sending_Country_Code,Receiving_Country_Code,Sending_Country_GDP,Receiving_Country_GDP,Sending_Country_GDP_Per_Capita,Receiving_Country_GDP_Per_Capita
0,Algeria,Senegal,2021,0.183414825,USD millions,BCEAO,Africa,DZA,SEN,199488.9,24359.596784,4456.746876,1414.539511
1,Australia,Ethiopia,2020,13.59617511,USD millions,National Bank of Ethiopia,Africa,AUS,ETH,1491063.0,95071.775812,58132.79852,799.475595
2,Australia,Kenya,2024,184497.099695719,USD millions,Central Bank of Kenya,Africa,AUS,KEN,1665258.0,104575.203136,61211.89676,1853.087855
3,Australia,Uganda,2022,22.0,USD millions,Bank of Uganda,Africa,AUS,UGA,1587133.0,44147.21689,61009.80771,933.094056
4,Austria,Kenya,2024,13169.065145833,USD millions,Central Bank of Kenya,Africa,AUT,KEN,418190.4,104575.203136,45562.04069,1853.087855
