## STEP 1: Data Loading 
The first step of this project is to load the necesary data, in which our project is going to be based. Combaining this jupyter notebook with the **data.py** script we will be able to:

        1. Import data from files to independent dataframes.
        2. Reshape dataframes into an appropiate data structure for ML methods.
        3. Combining the features that could be used as predicting values, by Country and Year in a new dataframe. 

In [3]:
import pandas as pd
import numpy as np
import data 

In [2]:
pop_den = data.file2df(file_ = "population_density.csv")
urb_gro = data.file2df(file_ = "urban_growth.csv")
pop = data.file2df(file_ = "population.csv")
pop_gro = data.file2df(file_ = "population_growth.csv")
life_exp = data.file2df(file_ = "life_expectancy.csv")
co2_emi = data.file2df(file_ = "co2_emissions.csv")

target = data.file2df(file_ = "energy_person_ratio.csv")

In [3]:
pop_den.head() #Raw structure of dataframes, directly imported from GapMinder files.

Unnamed: 0,country,1950,1951,1952,1953,1954,1955,1956,1957,1958,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
0,Afghanistan,11.9,12.0,12.2,12.3,12.5,12.7,12.9,13.1,13.3,...,117.0,117.0,117.0,117.0,116.0,116.0,116.0,115.0,115.0,115.0
1,Albania,46.1,47.0,48.0,49.2,50.5,51.8,53.3,54.8,56.3,...,48.5,47.5,46.5,45.5,44.5,43.5,42.5,41.6,40.6,39.7
2,Algeria,3.73,3.79,3.86,3.93,4.01,4.1,4.2,4.31,4.41,...,29.6,29.6,29.6,29.6,29.7,29.7,29.7,29.7,29.7,29.7
3,Andorra,13.2,14.2,15.4,16.7,18.2,19.6,21.2,22.9,24.7,...,134.0,134.0,134.0,134.0,133.0,133.0,133.0,133.0,133.0,133.0
4,Angola,3.65,3.7,3.78,3.87,3.96,4.05,4.12,4.19,4.26,...,135.0,136.0,138.0,140.0,142.0,144.0,146.0,147.0,149.0,151.0


In [6]:
pop_den_melt = data.melt_df(pop_den, "population density")
urb_gro_melt = data.melt_df(urb_gro, "urban growth")
pop_melt = data.melt_df(pop, "population")
pop_gro_melt = data.melt_df(pop_gro, "population_growth")
co2_emi_melt = data.melt_df(co2_emi, "co2_emissions")
life_exp_melt = data.melt_df(life_exp, "life_expectancy")

target_melt = data.melt_df(target, "target")

In [7]:
feature_dict = {"pop_den_melt":pop_den_melt, "urb_gro_melt": urb_gro_melt, "pop_melt": pop_melt, "pop_gro_melt": pop_gro_melt, "co2_emi_melt": co2_emi_melt, "life_exp_melt": life_exp_melt}
merged_data = data.merge_all(feature_dict = feature_dict, keys = ["country", "year"])
merged_data.head() #Final structure of data, that will be used in the prediction of the target.

Unnamed: 0,country,year,population density,urban growth,population,population_growth,co2_emissions,life_expectancy
1950,Afghanistan,1960,13.8,0.0516,9000000.0,0.0183,0.046,39.3
1951,Albania,1960,59.7,0.0539,1640000.0,0.0302,1.24,62.2
1952,Algeria,1960,4.64,0.0553,11100000.0,0.0252,0.557,52.5
1954,Angola,1960,4.38,0.0453,5450000.0,0.0137,0.101,40.6
1955,Antigua and Barbuda,1960,123.0,0.0338,54100.0,0.0169,0.677,63.3


In [10]:
data.df2file(df = merged_data) #Stored data into csv file at Processed_data folder
data.df2file(df = target)

## STEP 2: Database generation and connection with SQL. 
The second stage of the project would be the creation of a database in MySQL in were the data from the dataframes, could be store and organized in their corresponding tables, in order to easy up the queries process and data accessibility. This notebook combined with the *sql.py* script will allow us to:

        1. Automatically obtained the information needed for the creation of MySQL table for each of the dataframes
        2. 
        3. 

In [8]:
merged_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10555 entries, 1950 to 13454
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   country             10555 non-null  object 
 1   year                10555 non-null  object 
 2   population density  10555 non-null  float64
 3   urban growth        10555 non-null  float64
 4   population          10555 non-null  float64
 5   population_growth   10555 non-null  float64
 6   co2_emissions       10555 non-null  float64
 7   life_expectancy     10555 non-null  float64
dtypes: float64(6), object(2)
memory usage: 742.1+ KB
