# Exploring the Relationship Between GDP and Life Expectancy in Six Countries

## Overview:
    This project involves:
        1. Using the provided data set on GDP and life expectancy for six countries.
        2. Data cleaning and preperation for analysis.
        3. Statistical analysis and visualization to explore the relationship between GDP and life expectancy.
        4. Preparation of a blog post summarizing findings and insights.

### Hypothesis:
     Life Expectancy is directly related to GDP .

### Objective:

* To analyze data from the World Health Organization to identify the relationship between GDP and Life Expectancy.
* Additionally, through analyzing data, answer questions about additional relationships:
  * What is the average life expectancy in each country?
  * At what GDP does a countries life expectancy exceed the average?
  * If GDP increases or decreases, in a country with low life expectancy, how quickly does the life expectancy change?
  * If GDP decreases or increases, in a country with high life expectancy, how quickly does the life expectancy change?
  * Can we draw conclusions about future life expectancies based on historical trends in GDP?

### Findings:

### Summary:
    

In [3]:
# import necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
# load CSV file for analysis and print head
gdp_le_data = pd.read_csv('~//Documents//Coding//Life-Expectancy-and-GDP-Starter//Life-Expectancy-and-GDP//all_data.csv')
print(gdp_le_data.head())

  Country  Year  Life expectancy at birth (years)           GDP
0   Chile  2000                              77.3  7.786093e+10
1   Chile  2001                              77.3  7.097992e+10
2   Chile  2002                              77.8  6.973681e+10
3   Chile  2003                              77.9  7.564346e+10
4   Chile  2004                              78.0  9.921039e+10


In [7]:
# Clean data set
## Normalize Column Names
gdp_le_data.columns = (gdp_le_data.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '', regex=False).str.replace(')', '', regex=False))
                       
## Ensure numeric columns are properly formatted
gdp_le_data['gdp'] = pd.to_numeric(gdp_le_data['gdp'], errors='coerce')
gdp_le_data['life_expectancy_at_birth_years'] = pd.to_numeric(gdp_le_data['life_expectancy_at_birth_years'], errors='coerce')

# Verify Cleaning
print(gdp_le_data.info())
print(gdp_le_data.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96 entries, 0 to 95
Data columns (total 4 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   country                         96 non-null     object 
 1   year                            96 non-null     int64  
 2   life_expectancy_at_birth_years  96 non-null     float64
 3   gdp                             96 non-null     float64
dtypes: float64(2), int64(1), object(1)
memory usage: 3.1+ KB
None
  country  year  life_expectancy_at_birth_years           gdp
0   Chile  2000                            77.3  7.786093e+10
1   Chile  2001                            77.3  7.097992e+10
2   Chile  2002                            77.8  6.973681e+10
3   Chile  2003                            77.9  7.564346e+10
4   Chile  2004                            78.0  9.921039e+10


In [9]:
# Find Summary Statistics
print("The Average Life expectancy across all countries is: " + str(np.mean(gdp_le_data['life_expectancy_at_birth_years'])))
print("The median Life expectancy across all countries is: " + str(np.median(gdp_le_data['life_expectancy_at_birth_years'])))
print("The Max Life Expectancy across all countries is: " + str(np.max(gdp_le_data['life_expectancy_at_birth_years'])))
print("The Minimum Life Expectancy across all countries is: " + str(np.min(gdp_le_data['life_expectancy_at_birth_years'])))
print("The Average GDP across all countries is: " + str(np.mean(gdp_le_data['gdp'])))
print("The median GDP across all countries is: " + str(np.median(gdp_le_data['gdp'])))
print("The Max GDP across all countries is: " + str(np.max(gdp_le_data['gdp'])))
print("The Minimum GDP across all countries is: " + str(np.min(gdp_le_data['gdp'])))


The Average Life expectancy across all countries is: 72.78958333333334
The median Life expectancy across all countries is: 76.75
The Max Life Expectancy across all countries is: 81.0
The Minimum Life Expectancy across all countries is: 44.3
The Average GDP across all countries is: 3880498570768.396
The median GDP across all countries is: 1280220000000.0
The Max GDP across all countries is: 18100000000000.0
The Minimum GDP across all countries is: 4415702800.0


Key Insights:
1. Life Expectancy:
    * Range: 44.3 to 81.0 years
    * Median (50%): 76.75 years
    * Potential Outliers: Values near the minimum (44.3) may require further exploration
2. GDP:
    * Range: Approximately $4.42 billion to $18.1 Trillion
    * Median (50%): Approximately $1.28 trillion
    * Potential Outliers: GDP values near the lower or upper extremes should be visualized  

In [None]:
# Beginning Visualization and Analysis
