## Project Title

### Project Description

Our team will focus on studying the relationship (if any) of the average land temperature (i.e., #2 proposed dataset) and the evolution of population in such land, along with its GDP. Datasets will be pulled from the World Bank and are available in `.csv`, `.xml` and `.xlsx` formats in the following links, respectively:

[Population](https://data.worldbank.org/indicator/SP.POP.TOTL)

[GDP (2015 USD)](https://data.worldbank.org/indicator/NY.GDP.MKTP.KD)

Behavior will be analysed mainly by visual means (such as scatterplots, pyramid populations, etc.) and reported in Markdown in the corresponding .ipynb file. 

> Imported libraries

In [32]:
import pandas as pd # For data manipulation
import geopandas # For map visualization
import folium # For map visualization
import pycountry # For inconsistent country names

> Previous calculations

Create a dictionary of inconsistent country names and their standardized names:

In [33]:
country_dict = {}
for country in pycountry.countries:
    country_dict[country.name] = country.name
    country_dict[country.alpha_2] = country.name
    country_dict[country.alpha_3] = country.name

---

## Programming Component


### Reading and cleaning the main dataset


In [21]:
df = pd.read_csv('..\data\land-data\GlobalLandTemperaturesByCountry.csv') # Reading GlobalLandTemperaturesByCountry csv file
df.head()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,Country
0,1743-11-01,4.384,2.294,Åland
1,1743-12-01,,,Åland
2,1744-01-01,,,Åland
3,1744-02-01,,,Åland
4,1744-03-01,,,Åland


Next setp will be convert `dt` colum in to a date type one and we will only be focusing in the `year` information. 

In [22]:
df.dt = pd.to_datetime(df.dt) # Convert to datetime
df.dt = df.dt.dt.year # Extract year

In [23]:
df.rename(columns={"dt": "Year"}, inplace=True) # Rename column
df_year_temp =df.groupby(['Country', 'Year']).mean() # Group by country and year
df_year_temp.drop(columns=['AverageTemperatureUncertainty'], inplace=True) # Drop unnecessary column
df_year_temp.head() 

Unnamed: 0_level_0,Unnamed: 1_level_0,AverageTemperature
Country,Year,Unnamed: 2_level_1
Afghanistan,1838,18.379571
Afghanistan,1839,
Afghanistan,1840,13.413455
Afghanistan,1841,13.9976
Afghanistan,1842,15.154667


In [24]:
df_year_temp.reset_index(inplace=True) # Let's reset the index so that we can use the Country and Year columns as a key to merge with the other datasets
df_year_temp.head()

Unnamed: 0,Country,Year,AverageTemperature
0,Afghanistan,1838,18.379571
1,Afghanistan,1839,
2,Afghanistan,1840,13.413455
3,Afghanistan,1841,13.9976
4,Afghanistan,1842,15.154667


---

### Reading and cleaning the 2 others data sets

#### GDP Data

In [27]:
df_gdp = pd.read_csv('..\data\gdp-data\GDP_Country.csv', skiprows=4) # Reading GDP_Country csv file
df_gdp.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,Unnamed: 66
0,Aruba,ABW,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,2862306000.0,2861720000.0,2963128000.0,3025850000.0,3191738000.0,3359555000.0,3380889000.0,2752412000.0,3225070000.0,
1,Africa Eastern and Southern,AFE,GDP (constant 2015 US$),NY.GDP.MKTP.KD,153696400000.0,154061100000.0,166362100000.0,174952800000.0,182972100000.0,192720900000.0,...,862334100000.0,897164500000.0,923143900000.0,946092800000.0,971065300000.0,996417800000.0,1016728000000.0,985792300000.0,1029191000000.0,
2,Afghanistan,AFG,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,19189250000.0,19712070000.0,19998160000.0,20450180000.0,20991490000.0,21241130000.0,22072000000.0,21553060000.0,17083570000.0,
3,Africa Western and Central,AFW,GDP (constant 2015 US$),NY.GDP.MKTP.KD,105675500000.0,107614700000.0,111674900000.0,119808200000.0,126269100000.0,131391300000.0,...,704676000000.0,746466400000.0,766958000000.0,767829900000.0,785533200000.0,808676300000.0,834480200000.0,826966700000.0,859759200000.0,
4,Angola,AGO,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,82433770000.0,86407070000.0,87219300000.0,84969040000.0,84841590000.0,83724810000.0,83138740000.0,78482970000.0,79346280000.0,


Drop unnecessary columns: 

In [28]:
df_gdp.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code','Unnamed: 66'], inplace=True)
df_gdp.rename(columns={"Country Name": "Country"}, inplace=True)
df_gdp.head()

Unnamed: 0,Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,,,,,,,,,,...,2689383000.0,2862306000.0,2861720000.0,2963128000.0,3025850000.0,3191738000.0,3359555000.0,3380889000.0,2752412000.0,3225070000.0
1,Africa Eastern and Southern,153696400000.0,154061100000.0,166362100000.0,174952800000.0,182972100000.0,192720900000.0,200263800000.0,210788300000.0,219275300000.0,...,827342400000.0,862334100000.0,897164500000.0,923143900000.0,946092800000.0,971065300000.0,996417800000.0,1016728000000.0,985792300000.0,1029191000000.0
2,Afghanistan,,,,,,,,,,...,18171510000.0,19189250000.0,19712070000.0,19998160000.0,20450180000.0,20991490000.0,21241130000.0,22072000000.0,21553060000.0,17083570000.0
3,Africa Western and Central,105675500000.0,107614700000.0,111674900000.0,119808200000.0,126269100000.0,131391300000.0,129016700000.0,116636300000.0,118322800000.0,...,664107300000.0,704676000000.0,746466400000.0,766958000000.0,767829900000.0,785533200000.0,808676300000.0,834480200000.0,826966700000.0,859759200000.0
4,Angola,,,,,,,,,,...,78545750000.0,82433770000.0,86407070000.0,87219300000.0,84969040000.0,84841590000.0,83724810000.0,83138740000.0,78482970000.0,79346280000.0


In [29]:
df_gdp = df_gdp.melt(id_vars=['Country'], var_name='Year', value_name='GDP') # Melt the dataset to have a single column for the years
df_gdp.head()

Unnamed: 0,Country,Year,GDP
0,Aruba,1960,
1,Africa Eastern and Southern,1960,153696400000.0
2,Afghanistan,1960,
3,Africa Western and Central,1960,105675500000.0
4,Angola,1960,


In [30]:
df_gdp.sort_values(by=['Country'], inplace=True) # Order by country name
df_gdp.head()

Unnamed: 0,Country,Year,GDP
2396,Afghanistan,1969,
4790,Afghanistan,1978,
14632,Afghanistan,2015,19998160000.0
15696,Afghanistan,2019,22072000000.0
14100,Afghanistan,2013,19189250000.0


#### Population data

In [35]:
df_pop = pd.read_csv('..\data\population-data\population_data.csv', skiprows=4) # Reading population_data csv file

Drop unnecesary columns:

In [36]:
df_pop.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code','Unnamed: 66'], inplace=True)
df_pop.rename(columns={"Country Name": "Country"}, inplace=True) # Rename Country column
df_pop.head()

Unnamed: 0,Country,1960,1961,1962,1963,1964,1965,1966,1967,1968,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Aruba,54608.0,55811.0,56682.0,57475.0,58178.0,58782.0,59291.0,59522.0,59471.0,...,102112.0,102880.0,103594.0,104257.0,104874.0,105439.0,105962.0,106442.0,106585.0,106537.0
1,Africa Eastern and Southern,130692579.0,134169237.0,137835590.0,141630546.0,145605995.0,149742351.0,153955516.0,158313235.0,162875171.0,...,552530654.0,567891875.0,583650827.0,600008150.0,616377331.0,632746296.0,649756874.0,667242712.0,685112705.0,702976832.0
2,Afghanistan,8622466.0,8790140.0,8969047.0,9157465.0,9355514.0,9565147.0,9783147.0,10010030.0,10247780.0,...,30466479.0,31541209.0,32716210.0,33753499.0,34636207.0,35643418.0,36686784.0,37769499.0,38972230.0,40099462.0
3,Africa Western and Central,97256290.0,99314028.0,101445032.0,103667517.0,105959979.0,108336203.0,110798486.0,113319950.0,115921723.0,...,376797999.0,387204553.0,397855507.0,408690375.0,419778384.0,431138704.0,442646825.0,454306063.0,466189102.0,478185907.0
4,Angola,5357195.0,5441333.0,5521400.0,5599827.0,5673199.0,5736582.0,5787044.0,5827503.0,5868203.0,...,25188292.0,26147002.0,27128337.0,28127721.0,29154746.0,30208628.0,31273533.0,32353588.0,33428486.0,34503774.0


In [38]:
df_pop = df_pop.melt(id_vars=['Country'], var_name='Year', value_name='Population') # Melt the dataset to have a single column for the years
df_pop.head()

Unnamed: 0,Country,Year,Population
0,Aruba,1960,54608.0
1,Africa Eastern and Southern,1960,130692579.0
2,Afghanistan,1960,8622466.0
3,Africa Western and Central,1960,97256290.0
4,Angola,1960,5357195.0


In [39]:
df_pop.sort_values(by=['Country'], inplace=True) # Order by country name
df_pop.head()

Unnamed: 0,Country,Year,Population
2396,Afghanistan,1969,10494489.0
4790,Afghanistan,1978,12938862.0
14632,Afghanistan,2015,33753499.0
15696,Afghanistan,2019,37769499.0
14100,Afghanistan,2013,31541209.0


#### Geopandas Dataset

In [40]:
world = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
print(world.head())

       pop_est      continent                      name iso_a3  gdp_md_est  \
0     889953.0        Oceania                      Fiji    FJI        5496   
1   58005463.0         Africa                  Tanzania    TZA       63177   
2     603253.0         Africa                 W. Sahara    ESH         907   
3   37589262.0  North America                    Canada    CAN     1736425   
4  328239523.0  North America  United States of America    USA    21433226   

                                            geometry  
0  MULTIPOLYGON (((180.00000 -16.06713, 180.00000...  
1  POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...  
2  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...  
3  MULTIPOLYGON (((-122.84000 49.00000, -122.9742...  
4  MULTIPOLYGON (((-122.84000 49.00000, -120.0000...  



### Combining and cleaning data



### Calculations