# **WORLD ECONOMY DATASET**

**Project Selection**

---

## Description
The dataset is compiled from the National Accounts Main Aggregates Database that presents a series of analytical national accounts tables from 1970 onwards for more than 200 countries and areas of the world. It is the product of a global cooperation effort between the Economic Statistics Branch of the United Nations Statistics Division, international statistical agencies, and the national statistical services of these countries and is developed in accordance with the recommendation of the Statistical Commission at its first session in 1947

## Why Use This Dataset?
This dataset is highly valuable due to its comprehensive scope, covering more than 200 countries and areas with data from 1970 onwards, making it a robust tool for long-term economic analysis. It is developed through global collaboration between the United Nations, international agencies, and national statistical services, ensuring the reliability and consistency of the data across diverse regions. Following international recommendations, the dataset is standardized, which makes it ideal for comparing economic performance across countries and time periods. Researchers, policymakers, and analysts can use this dataset to gain insights into global economic trends, conduct comparative studies, and inform decisions in areas such as economic policy, development, and international finance.


**Code Section**

---

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np 
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'seaborn'

In [7]:
url = "https://docs.google.com/spreadsheets/d/1PfnTNbKRkPTeK9PYoIzxpIqzNSFp4Bh0DhAzAYOyjIk/export?format=csv"

In [8]:
df_PaisesEco = pd.read_csv(url)
df_PaisesEco.columns

Index([' CountryID ', ' Country ', ' Year ', ' IMF based exchange rate ',
       ' Population ', ' Currency ', ' Per capita GNI ',
       ' Agriculture, hunting, forestry, fishing (ISIC A-B) ',
       ' Changes in inventories ', ' Construction (ISIC F) ',
       ' Exports of goods and services ', ' Final consumption expenditure ',
       ' General government final consumption expenditure ',
       ' Gross capital formation ',
       ' Gross fixed capital formation (including Acquisitions less disposals of valuables) ',
       ' Household consumption expenditure (including Non-profit institutions serving households) ',
       ' Imports of goods and services ', ' Manufacturing (ISIC D) ',
       ' Mining, Manufacturing, Utilities (ISIC C-E) ',
       ' Other Activities (ISIC J-P) ', ' Total Value Added ',
       ' Transport, storage and communication (ISIC I) ',
       ' Wholesale, retail trade, restaurants and hotels (ISIC G-H) ',
       ' Gross National Income(GNI) in USD ',
       ' G

In [9]:
#summarize
def summarize_dataset(df):
    num_observations = df.shape[0]
    num_variables = df.shape[1]

    print(f"Number of observations: {num_observations}")
    print(f"Number of variables: {num_variables}")

    # Data Types
    print("\nData Types:")
    print(df.dtypes)

    # General infroamtion about the dataset
    print("\nSummary Statistics:")
    print(df.describe())

summarize_dataset(df_PaisesEco)

Number of observations: 10512
Number of variables: 25

Data Types:
CountryID                                                                                      int64
Country                                                                                       object
Year                                                                                           int64
IMF based exchange rate                                                                        int64
Population                                                                                     int64
Currency                                                                                      object
Per capita GNI                                                                                 int64
Agriculture, hunting, forestry, fishing (ISIC A-B)                                           float64
Changes in inventories                                                                       float64
Construction (ISIC F)   

In [None]:
#Variables to be worked on
cols = ()

Before doing the process of analyzing, appending or matching with external data, we will proceed to clean the data and process the NA's in a proper way.

In [12]:
clean_df = pd.read_csv('https://docs.google.com/spreadsheets/d/1hE-XvGz0-AChRUazHiAu1ilheRvMEh9xBG3mCaWMXEo/export?format=csv#gid=186576549')

In [13]:
clean_df

Unnamed: 0,CountryID,Country,Year,AMA exchange rate,IMF based exchange rate,Population,Currency,Per capita GNI,"Agriculture, hunting, forestry, fishing (ISIC A-B)",Changes in inventories,...,Household consumption expenditure (including Non-profit institutions serving households),Imports of goods and services,Manufacturing (ISIC D),"Mining, Manufacturing, Utilities (ISIC C-E)",Other Activities (ISIC J-P),Total Value Added,"Transport, storage and communication (ISIC I)","Wholesale, retail trade, restaurants and hotels (ISIC G-H)",Gross National Income(GNI) in USD,Gross Domestic Product (GDP)
0,4,Afghanistan,1970,44.998.427,44.998.427,10752971,Afghani,164,8.699174e+08,,...,1.551094e+09,1.952772e+08,3.701468e+08,3.766908e+08,1.277478e+08,1.731454e+09,8.391720e+07,2.263871e+08,1.766528e+09,1.731436e+09
1,4,Afghanistan,1971,44.998.427,44.998.427,11015857,Afghani,168,9.108281e+08,,...,1.675426e+09,2.762965e+08,3.875495e+08,3.944012e+08,1.337541e+08,1.812857e+09,8.786038e+07,2.370192e+08,1.850122e+09,1.812838e+09
2,4,Afghanistan,1972,44.998.427,44.998.427,11286753,Afghani,149,8.279453e+08,,...,1.498812e+09,2.903704e+08,3.522847e+08,3.585129e+08,1.215827e+08,1.647918e+09,7.986452e+07,2.154773e+08,1.683948e+09,1.647900e+09
3,4,Afghanistan,1973,44.998.427,44.998.427,11575305,Afghani,150,8.554869e+08,,...,1.508024e+09,2.629629e+08,3.640103e+08,3.704458e+08,1.256302e+08,1.702735e+09,8.252888e+07,2.226243e+08,1.739998e+09,1.702716e+09
4,4,Afghanistan,1974,44.998.427,44.998.427,11869879,Afghani,177,1.035913e+09,,...,1.778819e+09,3.056792e+08,4.407604e+08,4.485528e+08,1.521192e+08,2.061752e+09,9.991860e+07,2.695259e+08,2.106420e+09,2.061729e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10507,894,Zambia,2017,951.950.142,951.950.142,17298054,Kwacha,1448,1.041005e+09,567729313.0,...,1.215309e+10,9.465895e+09,2.102369e+09,7.194153e+09,6.187124e+09,2.416662e+10,1.965026e+09,5.323590e+09,2.505604e+10,2.586816e+10
10508,894,Zambia,2018,1.045.814.322,1.045.814.322,17835893,Kwacha,1451,8.791160e+08,929063411.0,...,1.183200e+10,9.717269e+09,1.801783e+09,6.556884e+09,5.791375e+09,2.422570e+10,2.490720e+09,5.990720e+09,2.588455e+10,2.631198e+10
10509,894,Zambia,2019,1.288.941.789,1.288.941.789,18380477,Kwacha,1246,6.668385e+08,817104395.0,...,9.848112e+09,7.961415e+09,1.582484e+09,5.584425e+09,5.231062e+09,2.153971e+10,2.537646e+09,4.959277e+09,2.291011e+10,2.331036e+10
10510,894,Zambia,2020,1.834.409.265,1.834.409.265,18927715,Kwacha,928,5.391694e+08,439295787.0,...,6.969267e+09,5.892353e+09,1.397303e+09,4.269405e+09,4.217431e+09,1.718020e+10,2.258715e+09,3.234123e+09,1.756233e+10,1.811064e+10


In [24]:
# Seleccionar las columnas 'Country' y 'Population'
historical_population_country = clean_df[['Country', 'Population']]

# Eliminar duplicados para obtener un país con su respectiva población única
historical_population_country_unique = historical_population_country.drop_duplicates()

# Mostrar los valores únicos de 'Country' y 'Population'
print(historical_population_country_unique)

           Country  Population
0      Afghanistan    10752971
1      Afghanistan    11015857
2      Afghanistan    11286753
3      Afghanistan    11575305
4      Afghanistan    11869879
...            ...         ...
10507       Zambia    17298054
10508       Zambia    17835893
10509       Zambia    18380477
10510       Zambia    18927715
10511       Zambia    19473125

[10505 rows x 2 columns]
