# Life Expectancy and GDP Merge

The goal of this project is to investigate the relation between the life expectancy of its citizens and the economic output of a country (GDP).
For that we need to join specific data from *World Bank Group*.


**Data sources**

- GDP Source: [World Bank Group](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD)

- Life expectancy Data Source: [World Bank Group](https://data.worldbank.org/indicator/SP.DYN.LE00.IN)


### Choose your own countries

If would like to analyze other countries. Just pick your countries and put them in `countries`.

## Import libraries

In [15]:
import pandas as pd

pd.options.mode.chained_assignment = None

## Load csv

In [16]:
all_gdp = pd.read_csv("world_gdp.csv")
all_life_expectancy = pd.read_csv("world_life_expectancy.csv")

## Prepare the data

The data from `world_gdp` and `world_life_expectancy` are in a "wide" format, with each year as a separate column. This can be challenging for many graphing tools. Let us convert them from "wide" format to "long" format. This would create a structure like: **Country Name | Country Code | Indicator Name | Year | Value**. We can get rid of **Indicator Code**.

In [17]:
def melt_dataframe(df, id_vars, value_name):
  years_colums = [column for column in df.columns if column.isdigit()]
  return df.melt(id_vars=id_vars, value_vars=years_colums, var_name="Year", value_name=value_name)

In [18]:
id_vars = ["Country Name", "Country Code", "Indicator Name"]
all_gdp_melted = melt_dataframe(df=all_gdp, id_vars=id_vars, value_name="GDP")
all_life_expectancy_melted = melt_dataframe(df=all_life_expectancy, id_vars=id_vars, value_name="Life Expectancy")

## Merge dataframes

Merges the two melted dataframes based on country and year.

In [19]:
merged_df = pd.merge(all_gdp_melted, all_life_expectancy_melted, on=['Country Name', 'Country Code', 'Year'], suffixes=(' GDP', ' LifeExp'))

## Select countries

In [20]:
countries = ["Peru", "Chile", "Mexico", "United Kingdom", "China", "United States"]
filtered_df = merged_df[merged_df["Country Name"].isin(countries)]
filtered_df["Year"] = pd.to_numeric(filtered_df["Year"], errors="coerce")
filtered_df.to_csv("filtered_gdp_lifeExp.csv", index=False)