## **Project: Study On Panel Data Methodologies With Application To Macroeconometrics (Inflation Forecasting)**.

> ### **Title**: Merge of Dataset.


#### **Table of Contents:**
<ul>
<li><a href="#1">1. Merge of Countries Dataset.</a></li>

</ul>

**Import Library**

In [None]:
import os
import numpy as np
import pandas as pd
import seaborn as sns 
import glob
import re


sns.set(rc={'figure.figsize': [15,5]}, font_scale=1.2);
pd.set_option('future.no_silent_downcasting', True)


<a id='1'></a>

### **Merge of Dataset:**

#### 1.1.  Meta Data Tables:
- > A. Table for `Countres` info: {'WEO_Country_Code', 'ISO', 'Country'} P_Key> 'WEO_Country_Code'.
- > B. Table for `Subject` info: {'WEO_Subject_Code', 'Subject_Descriptor', 'Subject_Notes', 'Units', 'Scale'} P_Key>'WEO_Subject_Code'.

- > C. Table for `Country_Subject_Notes` info: {'WEO_Country_Code', 'WEO_Subject_Code','Country_Series-specific_Notes','Estimates_Start_After'} P_Key>('WEO_Country_Code','WEO_Subject_Code').
- > D. Table for `Year` info: {'Years'} P_Key> 'Years'.

#### 1.2. dataset Tables:

- > E. Table for data values `WEO_Data_Countries` name Columns {'WEO_Country_Code' , 'Years' , and all Varibles 'WEO_Subject_Code' {'BCA_NGDPD', 'GGR_NGDP', 'GGSB_NPGDP', 'GGX_NGDP', 'GGXWDG_NGDP', 'LUR', 'NGDP_RPCH', 'NGSD_NGDP', 'NID_NGDP', 'PCPIPCH', 'PPPEX', 'PPPPC', 'PPPSH', 'TM_RPCH', 'TRWMA', 'TX_RPCH'} }

----------------------------------------------

#### 2.1.  Meta Data Tables:
- > A. Table for `Country_Groups` info: {'WEO_Country_Group_Code', 'Country_Group_ID', 'Country_Group_Name'} P_Key> 'WEO_Country_Group_Code'.
- > B. Table for `Groups_Subject_Notes` info: {'Weo_Country_Group_Code', 'WEO_Subject_Code', 'Country_Series-Specific_Notes', 'Estimates_Start_After'} P_Key> ('Weo_Country_Group_Code', 'Weo_Subject_Code').

- > C. Table for `Country_Classes` info: {'Weo_Country_Code', 'Country', 'Weo_Country_Group_Code', and all Varibles 'Country_Group_ID' { 'All_Advanced_41','All_Advanced_Euro_20', 'All_Advanced_G7', 'All_Developing_155','All_Developing_Asia_30', 'All_Developing_Europe_15', 'All_Developing_Latina_Caribbean_33', 'All_Developing_Meast_Casia_32', 'All_Developing_Ssafrica_45', 'All_Asean_5', 'All_Brics_20', 'All_Eur_27'}}  P_Key> 'Weo_Country_Code'.

#### 2.2. dataset Tables:

- > D. Table for data values `Country_Data_Group`name Columns : {'WEO_Country_Group_Code' , 'Years' , and all Varibles 'WEO_Subject_Code' {'BCA_NGDPD', 'GGR_NGDP', 'GGSB_NPGDP', 'GGX_NGDP', 'GGXWDG_NGDP', 'LUR', 'NGDP_RPCH', 'NGSD_NGDP', 'NID_NGDP', 'PCPIPCH', 'PPPEX', 'PPPPC', 'PPPSH', 'TM_RPCH', 'TRWMA', 'TX_RPCH'}}


**Path**

In [3]:
# Use glob to get a list of all files with the .csv extension
folder_path = "../03-Dataset/02-DataBase"
files_list = glob.glob(folder_path + "/*.csv")
files_list

['../03-Dataset/02-DataBase\\01-Countries.csv',
 '../03-Dataset/02-DataBase\\02-Subject.csv',
 '../03-Dataset/02-DataBase\\03-Country_Subject_Notes.csv',
 '../03-Dataset/02-DataBase\\04-Years.csv',
 '../03-Dataset/02-DataBase\\05.1-WEO_Data_Countries.csv',
 '../03-Dataset/02-DataBase\\05.2-WEO_Data_Countries.csv',
 '../03-Dataset/02-DataBase\\06-Country_Groups.csv',
 '../03-Dataset/02-DataBase\\07-Group_Subject_Notes.csv',
 '../03-Dataset/02-DataBase\\08.1-Country_Classes.csv',
 '../03-Dataset/02-DataBase\\08.2-Country_Classes.csv',
 '../03-Dataset/02-DataBase\\08.3-Country_Classes.csv',
 '../03-Dataset/02-DataBase\\09.1-Country_Data_Group.csv',
 '../03-Dataset/02-DataBase\\09.2-Country_Data_Group.csv',
 '../03-Dataset/02-DataBase\\10-WEO_Data.csv',
 '../03-Dataset/02-DataBase\\10-WEO_Data_DB.csv']

**Load Dataset**

In [8]:
df = pd.read_csv(folder_path+"/10-WEO_Data.csv")
display(df.head())

Unnamed: 0,WEO_Country_Code,Country,Advanced_Country,Year,BCA_NGDPD,GGR_NGDP,GGSB_NPGDP,GGXWDG_NGDP,GGX_NGDP,LUR,NGDP_RPCH,NGSD_NGDP,NID_NGDP,PCPIPCH,PPPEX,PPPPC,PPPSH,TM_RPCH,TRWMA,TX_RPCH
0,111,United States,1,1980,0.081,,,,,7.175,-0.257,22.059,23.31,13.502,1.0,12552.943,21.579,-6.664,,10.778
1,111,United States,1,1981,0.157,,,,,7.617,2.537,23.206,24.277,10.378,1.0,13948.701,21.683,2.616,,1.213
2,111,United States,1,1982,-0.165,,,,,9.708,-1.803,21.713,22.071,6.158,1.0,14404.994,21.185,-1.264,,-7.662
3,111,United States,1,1983,-1.065,,,,,9.6,4.584,19.725,22.253,3.16,1.0,15513.679,21.607,12.609,,-2.589
4,111,United States,1,1984,-2.337,,,,,7.508,7.236,21.839,25.096,4.368,1.0,17086.441,22.177,24.344,,8.15


In [9]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8820 entries, 0 to 8819
Data columns (total 20 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   WEO_Country_Code  8820 non-null   int64  
 1   Country           8820 non-null   object 
 2   Advanced_Country  8820 non-null   int64  
 3   Year              8820 non-null   int64  
 4   BCA_NGDPD         7638 non-null   float64
 5   GGR_NGDP          6473 non-null   float64
 6   GGSB_NPGDP        2496 non-null   float64
 7   GGXWDG_NGDP       5751 non-null   float64
 8   GGX_NGDP          6420 non-null   float64
 9   LUR               4139 non-null   float64
 10  NGDP_RPCH         7961 non-null   float64
 11  NGSD_NGDP         6961 non-null   float64
 12  NID_NGDP          6968 non-null   float64
 13  PCPIPCH           7914 non-null   float64
 14  PPPEX             7874 non-null   float64
 15  PPPPC             7919 non-null   float64
 16  PPPSH             7853 non-null   float64


In [10]:
df.describe()

Unnamed: 0,WEO_Country_Code,Advanced_Country,Year,BCA_NGDPD,GGR_NGDP,GGSB_NPGDP,GGXWDG_NGDP,GGX_NGDP,LUR,NGDP_RPCH,NGSD_NGDP,NID_NGDP,PCPIPCH,PPPEX,PPPPC,PPPSH,TM_RPCH,TRWMA,TX_RPCH
count,8820.0,8820.0,8820.0,7638.0,6473.0,2496.0,5751.0,6420.0,4139.0,7961.0,6961.0,6968.0,7914.0,7874.0,7919.0,7853.0,6913.0,3664.0,6957.0
mean,551.377551,0.209184,2002.0,-2.091833,28.722145,-2.569058,55.793672,31.501981,8.955438,3.353171,20.333998,24.006941,41.748441,199.555243,14810.694572,0.572925,5.513936,7.086408,5.667853
std,261.497587,0.406749,12.987909,13.369427,15.490119,4.438902,44.543057,19.343728,6.070979,6.184105,12.102355,10.525728,862.865903,2105.686824,18530.743423,1.870827,17.363133,10.075807,20.437718
min,111.0,0.0,1980.0,-242.188,0.036,-25.115,0.0,1.822,0.025,-54.336,-98.14,-40.199,-72.729,0.001,175.003,0.001,-82.872,0.0,-90.597
25%,313.75,0.0,1991.0,-6.519,17.906,-4.41575,29.0785,20.20275,4.9445,1.176,13.374,18.1125,2.03275,0.81325,2679.1405,0.019,-1.917,2.24,-1.49
50%,565.5,0.0,2002.0,-2.491,25.466,-2.3665,46.585,28.547,7.475,3.535,20.125,22.899,4.631,2.6855,7620.835,0.072,4.964,5.155,4.526
75%,733.25,0.0,2013.0,1.5415,37.203,-0.435,69.9975,39.34825,11.2,5.886,26.537,28.158,10.10775,44.58625,19590.4905,0.378,11.838,10.04,10.971
max,968.0,1.0,2024.0,314.906,164.054,125.135,600.117,594.77,70.0,147.973,120.028,144.45,65374.082,116640.525,151145.794,22.343,431.978,421.5,649.151


In [13]:
import pandas as pd

# افترض أن df هو اسم الداتا فريم

# نأخذ بيانات من 2000 فصاعدًا
df_2000_plus = df[df["Year"] >= 2000]

# نحدد الدول التي لا يوجد لها أي قيم مفقودة في أي عمود بعد 2000
countries_complete = df_2000_plus.groupby("WEO_Country_Code").filter(lambda x: x.notnull().all(axis=None))["WEO_Country_Code"].unique()

# نحتفظ فقط بالدول التي بياناتها كاملة من كل السنوات (أو حسب ما تريد)
df_cleaned = df[df["WEO_Country_Code"].isin(countries_complete)].copy()


In [14]:
df_cleaned

Unnamed: 0,WEO_Country_Code,Country,Advanced_Country,Year,BCA_NGDPD,GGR_NGDP,GGSB_NPGDP,GGXWDG_NGDP,GGX_NGDP,LUR,NGDP_RPCH,NGSD_NGDP,NID_NGDP,PCPIPCH,PPPEX,PPPPC,PPPSH,TM_RPCH,TRWMA,TX_RPCH


# **END**