## **Project: Study On Panel Data Methodologies With Application To Macroeconometrics (Inflation Forecasting)**.

> ### **Title**: Merge of Dataset.


#### **Table of Contents:**
<ul>
<li><a href="#1">1. Merge of Countries Dataset.</a></li>

</ul>

**Import Library**

In [1]:
import os
import numpy as np
import pandas as pd
import seaborn as sns 
import glob
import re


sns.set(rc={'figure.figsize': [15,5]}, font_scale=1.2);
pd.set_option('future.no_silent_downcasting', True)


<a id='1'></a>

### **Merge of Dataset:**

#### 1.1.  Meta Data Tables:
- > A. Table for `Countres` info: {'WEO_Country_Code', 'ISO', 'Country'} P_Key> 'WEO_Country_Code'.
- > B. Table for `Subject` info: {'WEO_Subject_Code', 'Subject_Descriptor', 'Subject_Notes', 'Units', 'Scale'} P_Key>'WEO_Subject_Code'.

- > C. Table for `Country_Subject_Notes` info: {'WEO_Country_Code', 'WEO_Subject_Code','Country_Series-specific_Notes','Estimates_Start_After'} P_Key>('WEO_Country_Code','WEO_Subject_Code').
- > D. Table for `Year` info: {'Years'} P_Key> 'Years'.

#### 1.2. dataset Tables:

- > E. Table for data values `WEO_Data_Countries` name Columns {'WEO_Country_Code' , 'Years' , and all Varibles 'WEO_Subject_Code' {'BCA_NGDPD', 'GGR_NGDP', 'GGSB_NPGDP', 'GGX_NGDP', 'GGXWDG_NGDP', 'LUR', 'NGDP_RPCH', 'NGSD_NGDP', 'NID_NGDP', 'PCPIPCH', 'PPPEX', 'PPPPC', 'PPPSH', 'TM_RPCH', 'TRWMA', 'TX_RPCH'} }

----------------------------------------------

#### 2.1.  Meta Data Tables:
- > A. Table for `Country_Groups` info: {'WEO_Country_Group_Code', 'Country_Group_ID', 'Country_Group_Name'} P_Key> 'WEO_Country_Group_Code'.
- > B. Table for `Groups_Subject_Notes` info: {'Weo_Country_Group_Code', 'WEO_Subject_Code', 'Country_Series-Specific_Notes', 'Estimates_Start_After'} P_Key> ('Weo_Country_Group_Code', 'Weo_Subject_Code').

- > C. Table for `Country_Classes` info: {'Weo_Country_Code', 'Country', 'Weo_Country_Group_Code', and all Varibles 'Country_Group_ID' { 'All_Advanced_41','All_Advanced_Euro_20', 'All_Advanced_G7', 'All_Developing_155','All_Developing_Asia_30', 'All_Developing_Europe_15', 'All_Developing_Latina_Caribbean_33', 'All_Developing_Meast_Casia_32', 'All_Developing_Ssafrica_45', 'All_Asean_5', 'All_Brics_20', 'All_Eur_27'}}  P_Key> 'Weo_Country_Code'.

#### 2.2. dataset Tables:

- > D. Table for data values `Country_Data_Group`name Columns : {'WEO_Country_Group_Code' , 'Years' , and all Varibles 'WEO_Subject_Code' {'BCA_NGDPD', 'GGR_NGDP', 'GGSB_NPGDP', 'GGX_NGDP', 'GGXWDG_NGDP', 'LUR', 'NGDP_RPCH', 'NGSD_NGDP', 'NID_NGDP', 'PCPIPCH', 'PPPEX', 'PPPPC', 'PPPSH', 'TM_RPCH', 'TRWMA', 'TX_RPCH'}}


**Path**

In [2]:
# Use glob to get a list of all files with the .csv extension
folder_path = "../03-Dataset/02-DataBase"
files_list = glob.glob(folder_path + "/*.csv")
files_list

['../03-Dataset/02-DataBase\\01-Countries.csv',
 '../03-Dataset/02-DataBase\\02-Subject.csv',
 '../03-Dataset/02-DataBase\\03-Country_Subject_Notes.csv',
 '../03-Dataset/02-DataBase\\04-Years.csv',
 '../03-Dataset/02-DataBase\\05.1-WEO_Data_Countries.csv',
 '../03-Dataset/02-DataBase\\05.2-WEO_Data_Countries.csv',
 '../03-Dataset/02-DataBase\\06-Country_Groups.csv',
 '../03-Dataset/02-DataBase\\07-Group_Subject_Notes.csv',
 '../03-Dataset/02-DataBase\\08.1-Country_Classes.csv',
 '../03-Dataset/02-DataBase\\08.2-Country_Classes.csv',
 '../03-Dataset/02-DataBase\\08.3-Country_Classes.csv',
 '../03-Dataset/02-DataBase\\09.1-Country_Data_Group.csv',
 '../03-Dataset/02-DataBase\\09.2-Country_Data_Group.csv',
 '../03-Dataset/02-DataBase\\10-WEO_Data.csv',
 '../03-Dataset/02-DataBase\\10-WEO_Data_DB.csv']

**Load Dataset**

In [3]:
df_country = pd.read_csv(folder_path+"/05.2-WEO_Data_Countries.csv")
display(df_country.head())

df_long_classes = pd.read_csv(folder_path+"/08.2-Country_Classes.csv")
display(df_long_classes.head())


Unnamed: 0,WEO_Country_Code,Year,BCA_NGDPD,GGR_NGDP,GGSB_NPGDP,GGXWDG_NGDP,GGX_NGDP,LUR,NGDP_RPCH,NGSD_NGDP,NID_NGDP,PCPIPCH,PPPEX,PPPPC,PPPSH,TM_RPCH,TRWMA,TX_RPCH
0,111,1980,0.081,,,,,7.175,-0.257,22.059,23.31,13.502,1.0,12552.943,21.579,-6.664,,10.778
1,111,1981,0.157,,,,,7.617,2.537,23.206,24.277,10.378,1.0,13948.701,21.683,2.616,,1.213
2,111,1982,-0.165,,,,,9.708,-1.803,21.713,22.071,6.158,1.0,14404.994,21.185,-1.264,,-7.662
3,111,1983,-1.065,,,,,9.6,4.584,19.725,22.253,3.16,1.0,15513.679,21.607,12.609,,-2.589
4,111,1984,-2.337,,,,,7.508,7.236,21.839,25.096,4.368,1.0,17086.441,22.177,24.344,,8.15


Unnamed: 0,WEO_Country_Code,Country,WEO_Country_Group_Code,Country_Group_Id,Value
0,512,Afghanistan,110,all_advanced_41,0
1,914,Albania,110,all_advanced_41,0
2,612,Algeria,110,all_advanced_41,0
3,171,Andorra,110,all_advanced_41,1
4,614,Angola,110,all_advanced_41,0


In [4]:
display(df_country.info())
display(df_long_classes.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8775 entries, 0 to 8774
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   WEO_Country_Code  8775 non-null   int64  
 1   Year              8775 non-null   int64  
 2   BCA_NGDPD         7593 non-null   float64
 3   GGR_NGDP          6448 non-null   float64
 4   GGSB_NPGDP        2471 non-null   float64
 5   GGXWDG_NGDP       5726 non-null   float64
 6   GGX_NGDP          6395 non-null   float64
 7   LUR               4094 non-null   float64
 8   NGDP_RPCH         7916 non-null   float64
 9   NGSD_NGDP         6916 non-null   float64
 10  NID_NGDP          6923 non-null   float64
 11  PCPIPCH           7869 non-null   float64
 12  PPPEX             7829 non-null   float64
 13  PPPPC             7874 non-null   float64
 14  PPPSH             7808 non-null   float64
 15  TM_RPCH           6868 non-null   float64
 16  TRWMA             3643 non-null   float64


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2340 entries, 0 to 2339
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   WEO_Country_Code        2340 non-null   int64 
 1   Country                 2340 non-null   object
 2   WEO_Country_Group_Code  2340 non-null   int64 
 3   Country_Group_Id        2340 non-null   object
 4   Value                   2340 non-null   int64 
dtypes: int64(3), object(2)
memory usage: 91.5+ KB


None

In [5]:
df_adv = df_long_classes[df_long_classes["Country_Group_Id"] =="all_advanced_41" ]
df_adv = df_adv.rename(columns={"Value": "Advanced_Country"})
df_adv.info()

<class 'pandas.core.frame.DataFrame'>
Index: 195 entries, 0 to 194
Data columns (total 5 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   WEO_Country_Code        195 non-null    int64 
 1   Country                 195 non-null    object
 2   WEO_Country_Group_Code  195 non-null    int64 
 3   Country_Group_Id        195 non-null    object
 4   Advanced_Country        195 non-null    int64 
dtypes: int64(3), object(2)
memory usage: 9.1+ KB


In [6]:
df_end = pd.merge(
    df_adv[["WEO_Country_Code","Country","Advanced_Country"]],
    df_country,
    on='WEO_Country_Code',
    how='right'
)

df_end.head()

Unnamed: 0,WEO_Country_Code,Country,Advanced_Country,Year,BCA_NGDPD,GGR_NGDP,GGSB_NPGDP,GGXWDG_NGDP,GGX_NGDP,LUR,NGDP_RPCH,NGSD_NGDP,NID_NGDP,PCPIPCH,PPPEX,PPPPC,PPPSH,TM_RPCH,TRWMA,TX_RPCH
0,111,United States,1,1980,0.081,,,,,7.175,-0.257,22.059,23.31,13.502,1.0,12552.943,21.579,-6.664,,10.778
1,111,United States,1,1981,0.157,,,,,7.617,2.537,23.206,24.277,10.378,1.0,13948.701,21.683,2.616,,1.213
2,111,United States,1,1982,-0.165,,,,,9.708,-1.803,21.713,22.071,6.158,1.0,14404.994,21.185,-1.264,,-7.662
3,111,United States,1,1983,-1.065,,,,,9.6,4.584,19.725,22.253,3.16,1.0,15513.679,21.607,12.609,,-2.589
4,111,United States,1,1984,-2.337,,,,,7.508,7.236,21.839,25.096,4.368,1.0,17086.441,22.177,24.344,,8.15


In [7]:
df_end.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8775 entries, 0 to 8774
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   WEO_Country_Code  8775 non-null   int64  
 1   Country           8775 non-null   object 
 2   Advanced_Country  8775 non-null   int64  
 3   Year              8775 non-null   int64  
 4   BCA_NGDPD         7593 non-null   float64
 5   GGR_NGDP          6448 non-null   float64
 6   GGSB_NPGDP        2471 non-null   float64
 7   GGXWDG_NGDP       5726 non-null   float64
 8   GGX_NGDP          6395 non-null   float64
 9   LUR               4094 non-null   float64
 10  NGDP_RPCH         7916 non-null   float64
 11  NGSD_NGDP         6916 non-null   float64
 12  NID_NGDP          6923 non-null   float64
 13  PCPIPCH           7869 non-null   float64
 14  PPPEX             7829 non-null   float64
 15  PPPPC             7874 non-null   float64
 16  PPPSH             7808 non-null   float64


In [8]:
df_end.describe()

Unnamed: 0,WEO_Country_Code,Advanced_Country,Year,BCA_NGDPD,GGR_NGDP,GGSB_NPGDP,GGXWDG_NGDP,GGX_NGDP,LUR,NGDP_RPCH,NGSD_NGDP,NID_NGDP,PCPIPCH,PPPEX,PPPPC,PPPSH,TM_RPCH,TRWMA,TX_RPCH
count,8775.0,8775.0,8775.0,7593.0,6448.0,2471.0,5726.0,6395.0,4094.0,7916.0,6916.0,6923.0,7869.0,7829.0,7874.0,7808.0,6868.0,3643.0,6912.0
mean,551.969231,0.205128,2002.0,-2.106005,28.688837,-2.556829,55.729301,31.464771,8.970407,3.349049,20.31754,24.005978,41.779044,200.68549,14750.062408,0.574731,5.516364,7.112105,5.67013
std,262.036401,0.403818,12.987913,13.404964,15.510034,4.453511,44.625304,19.371367,6.095195,6.198782,12.132634,10.557857,865.305767,2111.677586,18540.758254,1.876059,17.409113,10.098748,20.49601
min,111.0,0.0,1980.0,-242.188,0.036,-25.115,0.0,1.822,0.025,-54.336,-98.14,-40.199,-72.729,0.001,175.003,0.001,-82.872,0.0,-90.597
25%,313.0,0.0,1991.0,-6.546,17.8935,-4.412,28.9285,20.1775,4.9425,1.1665,13.36075,18.071,2.036,0.812,2666.70625,0.019,-1.9325,2.24,-1.50525
50%,566.0,0.0,2002.0,-2.514,25.39,-2.342,46.391,28.495,7.483,3.5235,20.079,22.865,4.632,2.649,7550.7715,0.071,4.9685,5.2,4.4945
75%,734.0,0.0,2013.0,1.496,37.1765,-0.409,69.84875,39.287,11.217,5.891,26.5255,28.185,10.094,46.154,19416.4515,0.383,11.8605,10.05,11.00025
max,968.0,1.0,2024.0,314.906,164.054,125.135,600.117,594.77,70.0,147.973,120.028,144.45,65374.082,116640.525,151145.794,22.343,431.978,421.5,649.151


In [9]:
df_end.to_csv(folder_path+"/10-WEO_Data.csv",index=False)


# **END**