# Life Expectancy and GDP Analysis
## Project Goals
- Analyze the relationship between GDP and life expectancy for six countries.
- Answer: Does higher GDP correlate with higher life expectancy?
- Summarize life expectancy and GDP trends over time per country.
- Visualize trends using line plots, scatter plots, and other charts.
- Identify differences in GDP-life expectancy relationships across countries.
- Document findings in this notebook and track progress in GitHub.

In [4]:
# Load the dataset
import pandas as pd
data = pd.read_csv('all_data.csv')

# Display the first few rows
print(data.head())

  Country  Year  Life expectancy at birth (years)           GDP
0   Chile  2000                              77.3  7.786093e+10
1   Chile  2001                              77.3  7.097992e+10
2   Chile  2002                              77.8  6.973681e+10
3   Chile  2003                              77.9  7.564346e+10
4   Chile  2004                              78.0  9.921039e+10


In [5]:
# Check column names and data types
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96 entries, 0 to 95
Data columns (total 4 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Country                           96 non-null     object 
 1   Year                              96 non-null     int64  
 2   Life expectancy at birth (years)  96 non-null     float64
 3   GDP                               96 non-null     float64
dtypes: float64(2), int64(1), object(1)
memory usage: 3.1+ KB
None


In [6]:
# Check for missing values
print(data.isnull().sum())

Country                             0
Year                                0
Life expectancy at birth (years)    0
GDP                                 0
dtype: int64


## Analytical Steps
1. **Data Cleaning**: Verify no missing values or outliers; rename columns if needed.
2. **Summary Statistics**: Calculate mean, min, max, and std for life expectancy and GDP per country.
3. **Visualizations**:
   - Line plots: Life expectancy and GDP over time.
   - Scatter plots: GDP vs. life expectancy.
   - Bar plots: Compare averages across countries.
4. **Correlation Analysis**: Compute correlation between GDP and life expectancy.
5. **Insights**: Answer whether GDP correlates with life expectancy; note country differences.
6. **Finalize**: Organize notebook and push to GitHub.

## Data Cleaning
Checking for missing values, outliers, and renaming columns for convenience.

In [9]:
# Import libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [11]:
# Check for unrealistic values
print("\nLife expectancy range:\n", data['Life expectancy at birth (years)'].describe())
print("\nGDP range:\n", data['GDP'].describe())


Life expectancy range:
 count    96.000000
mean     72.789583
std      10.672882
min      44.300000
25%      74.475000
50%      76.750000
75%      78.900000
max      81.000000
Name: Life expectancy at birth (years), dtype: float64

GDP range:
 count    9.600000e+01
mean     3.880499e+12
std      5.197561e+12
min      4.415703e+09
25%      1.733018e+11
50%      1.280220e+12
75%      4.067510e+12
max      1.810000e+13
Name: GDP, dtype: float64


In [12]:
# Check unique countries and years
print("\nUnique countries:", data['Country'].unique())
print("Year range:", data['Year'].min(), "to", data['Year'].max())


Unique countries: ['Chile' 'China' 'Germany' 'Mexico' 'United States of America' 'Zimbabwe']
Year range: 2000 to 2015


In [13]:
# Rename columns for convenience
data = data.rename(columns={'Life expectancy at birth (years)': 'Life_Expectancy'})
print("\nUpdated column names:", data.columns)


Updated column names: Index(['Country', 'Year', 'Life_Expectancy', 'GDP'], dtype='object')
