`Global Insights`
- What is the average global life expectancy for each year?

- What is the average global GDP per capita for each year?

- What is the total global population for each year?

- In which year was the highest global life expectancy recorded?

- In which year was the lowest global GDP per capita recorded?

`Continent-Level Insights`
- What is the average life expectancy per continent over the years?

- What is the average GDP per capita per continent over the years?

- Which continent had the highest population in 2007?

`Country-Level Insights`
- What is the average life expectancy for each country across all years?

- What is the average GDP per capita for each country?

- Which country had the highest life expectancy in 2007?

- Which country had the lowest GDP per capita in 1992?

- What is the total population for each country in 2007?

`Growth and Trend Analysis`
- Which countries saw a significant increase in life expectancy from 1952 to 2007?

`Comparative Insights`

- Compare countries in Asia vs. Europe in terms of average life expectancy.

- How many countries never had a GDP per capita > $1000 in any year?

- Are there any countries with life expectancy below 40 years in 2007?


Import Necessary Liberaries

In [1]:
import pandas as pd
import numpy as np

Read the Raw Data

In [2]:
data = pd.read_csv("mckinsey.csv") #import the data 
data.head() #shows dataframe for 5 rows

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


Data Inspection

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     1704 non-null   object 
 1   year        1704 non-null   int64  
 2   population  1704 non-null   int64  
 3   continent   1704 non-null   object 
 4   life_exp    1703 non-null   float64
 5   gdp_cap     1704 non-null   float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB


In [4]:
data.describe()

Unnamed: 0,year,population,life_exp,gdp_cap
count,1704.0,1704.0,1703.0,1704.0
mean,1979.5,29601210.0,59.483627,7215.327081
std,17.26533,106157900.0,12.915331,9857.454543
min,1952.0,60011.0,23.599,241.165876
25%,1965.75,2793664.0,48.228,1202.060309
50%,1979.5,7023596.0,60.765,3531.846988
75%,1993.25,19585220.0,70.846,9325.462346
max,2007.0,1318683000.0,82.603,113523.1329


Data Cleaning

In [5]:
# Check Missing Values in data
data.isnull().sum()

country       0
year          0
population    0
continent     0
life_exp      1
gdp_cap       0
dtype: int64

In [6]:
# As we see above 1 Null Value in life_exp
# Fix this by using dropna() or fillna()

data["life_exp"].fillna(data["life_exp"].mean(),inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data["life_exp"].fillna(data["life_exp"].mean(),inplace=True)


In [7]:
# Verify that is any null/none/missing value
data.isnull().sum()

country       0
year          0
population    0
continent     0
life_exp      0
gdp_cap       0
dtype: int64

In [8]:
# Let's see is any duplicate data or not
data.duplicated().sum()

np.int64(0)

Exploratory Data Analysis (EDA) for `Global Insights`

- What is the average global life expectancy for each year?

- What is the average global GDP per capita for each year?

- What is the total global population for each year?

- In which year was the highest global life expectancy recorded?

- In which year was the lowest global GDP per capita recorded?

In [9]:
# What is the average global life expenctancy for each year?
avg_global_life_exp = data.groupby("year")["life_exp"].mean()
print("Global Life Expenctancy for each year: \n",avg_global_life_exp)

Global Life Expenctancy for each year: 
 year
1952    49.057620
1957    51.507401
1962    53.609249
1967    55.678290
1972    57.647386
1977    59.570157
1982    61.533197
1987    63.212613
1992    64.160338
1997    65.014676
2002    65.694923
2007    67.117673
Name: life_exp, dtype: float64


In [10]:
# What is the average global GDP per capita for each year?
avg_global_gdp_cap = data.groupby("year")["life_exp"].mean()
print("Global GDP per capita for each year: \n",avg_global_gdp_cap)

Global GDP per capita for each year: 
 year
1952    49.057620
1957    51.507401
1962    53.609249
1967    55.678290
1972    57.647386
1977    59.570157
1982    61.533197
1987    63.212613
1992    64.160338
1997    65.014676
2002    65.694923
2007    67.117673
Name: life_exp, dtype: float64


In [11]:
# What is the total global population for each year?
total_global_population = data.groupby("year")["population"].sum()
print("Total Gloabl Population each year: \n",total_global_population)

Total Gloabl Population each year: 
 year
1952    2406957150
1957    2664404580
1962    2899782974
1967    3217478384
1972    3576977158
1977    3930045807
1982    4289436840
1987    4691477418
1992    5110710260
1997    5515204472
2002    5886977579
2007    6251013179
Name: population, dtype: int64


In [12]:
# In which year was the highest global life expectancy recorded?
highest_life_exp = data.groupby("year")["life_exp"].max()
print("The highest global life expectancy :",highest_life_exp.max(),"in year",data["year"][highest_life_exp.argmax()])

The highest global life expectancy : 82.603 in year 2007


In [13]:
# In which year was the lowest global GDP per capita recorded?
lowest_gloabl_gdp_cap = data.groupby("year")["gdp_cap"].min()
print("The highest global life expectancy :",lowest_gloabl_gdp_cap.min(),"in year", data["year"][lowest_gloabl_gdp_cap.argmin()])

The highest global life expectancy : 241.1658765 in year 2002


`Continent-Level Insights`
- What is the average life expectancy per continent over the years?

- What is the average GDP per capita per continent over the years?

- Which continent had the highest population in 2007?

- How does life expectancy vary by continent in a specific year (e.g., 2002)?

- What is the trend of GDP per capita for each continent over time?

In [14]:
# What is the average life expectancy per continent over the years?
avg_life_exp_per_continent = data.groupby("continent")["life_exp"].mean()
avg_life_exp_per_continent

continent
Africa      48.865330
Americas    64.658737
Asia        60.104438
Europe      71.903686
Oceania     74.326208
Name: life_exp, dtype: float64

In [15]:
# What is the average GDP per capita per continent over the years?
avg_gdp_cap_per_continent = data.groupby("continent")["gdp_cap"].mean()
avg_gdp_cap_per_continent

continent
Africa       2193.754578
Americas     7136.110356
Asia         7902.150428
Europe      14469.475533
Oceania     18621.609223
Name: gdp_cap, dtype: float64

In [16]:
# Which continent had the highest population in 2007?
data[data["year"]==2007].groupby("continent")["population"].max()

continent
Africa       135031164
Americas     301139947
Asia        1318683096
Europe        82400996
Oceania       20434176
Name: population, dtype: int64

`Country-Level Insights`
- What is the average life expectancy for each country across all years?

- What is the average GDP per capita for each country?

- Which country had the highest life expectancy in 2007?

- Which country had the lowest GDP per capita in 1992?

- What is the total population for each country in 2007?

In [17]:
# What is the average life expectancy for each country across all years?
data.groupby("country")["life_exp"].mean()

country
Afghanistan           38.783469
Albania               68.432917
Algeria               59.030167
Angola                37.883500
Argentina             69.060417
                        ...    
Vietnam               57.479500
West Bank and Gaza    60.328667
Yemen, Rep.           46.780417
Zambia                45.996333
Zimbabwe              52.663167
Name: life_exp, Length: 142, dtype: float64

In [18]:
# What is the average GDP per capita for each country?
data.groupby("country")["gdp_cap"].mean()

country
Afghanistan            802.674598
Albania               3255.366633
Algeria               4426.025973
Angola                3607.100529
Argentina             8955.553783
                         ...     
Vietnam               1017.712615
West Bank and Gaza    3759.996781
Yemen, Rep.           1569.274672
Zambia                1358.199409
Zimbabwe               635.858042
Name: gdp_cap, Length: 142, dtype: float64

In [19]:
# Which country had the highest life expectancy in 2007?
data[data["year"]==2007].groupby("country")["life_exp"].max()

country
Afghanistan           59.483627
Albania               76.423000
Algeria               72.301000
Angola                42.731000
Argentina             75.320000
                        ...    
Vietnam               74.249000
West Bank and Gaza    73.422000
Yemen, Rep.           62.698000
Zambia                42.384000
Zimbabwe              43.487000
Name: life_exp, Length: 142, dtype: float64

In [20]:
# Which country had the lowest GDP per capita in 1992?
lowest_gdp_cap = data[data["year"]==1992]["gdp_cap"].min()
lowest_gdp_cap_country = data["country"][data[data["year"]==1992]["gdp_cap"].argmin()]
print(lowest_gdp_cap_country,"have",lowest_gdp_cap,"GDP per capita in 1992")

Bahrain have 347.0 GDP per capita in 1992


In [21]:
# What is the total population for each country in 2007?
data[data["year"]==2007][["country","population"]]

Unnamed: 0,country,population
11,Afghanistan,31889923
23,Albania,3600523
35,Algeria,33333216
47,Angola,12420476
59,Argentina,40301927
...,...,...
1655,Vietnam,85262356
1667,West Bank and Gaza,4018332
1679,"Yemen, Rep.",22211743
1691,Zambia,11746035


`Growth and Trend Analysis`
- Which countries saw a significant increase in life expectancy from 1952 to 2007?

In [22]:

data_agg =  data.groupby("country")[["life_exp","year"]].agg({"life_exp":["max","min"],"year":["max","min"]})
data_agg

Unnamed: 0_level_0,life_exp,life_exp,year,year
Unnamed: 0_level_1,max,min,max,min
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Afghanistan,59.483627,28.801,2007,1952
Albania,76.423000,55.230,2007,1952
Algeria,72.301000,43.077,2007,1952
Angola,42.731000,30.015,2007,1952
Argentina,75.320000,62.485,2007,1952
...,...,...,...,...
Vietnam,74.249000,40.412,2007,1952
West Bank and Gaza,73.422000,43.160,2007,1952
"Yemen, Rep.",62.698000,32.548,2007,1952
Zambia,51.821000,39.193,2007,1952


In [23]:
data_agg.columns

MultiIndex([('life_exp', 'max'),
            ('life_exp', 'min'),
            (    'year', 'max'),
            (    'year', 'min')],
           )

In [24]:
data_agg.columns = ["_".join(col) for col in data_agg.columns]
data_agg

Unnamed: 0_level_0,life_exp_max,life_exp_min,year_max,year_min
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Afghanistan,59.483627,28.801,2007,1952
Albania,76.423000,55.230,2007,1952
Algeria,72.301000,43.077,2007,1952
Angola,42.731000,30.015,2007,1952
Argentina,75.320000,62.485,2007,1952
...,...,...,...,...
Vietnam,74.249000,40.412,2007,1952
West Bank and Gaza,73.422000,43.160,2007,1952
"Yemen, Rep.",62.698000,32.548,2007,1952
Zambia,51.821000,39.193,2007,1952


In [25]:
data_agg["Total_Life_Exp"]= data_agg["life_exp_max"] - data_agg["life_exp_min"]

In [26]:
data_agg.sort_values(by="Total_Life_Exp",ascending=False)

Unnamed: 0_level_0,life_exp_max,life_exp_min,year_max,year_min,Total_Life_Exp
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Oman,75.640,37.578,2007,1952,38.062
Vietnam,74.249,40.412,2007,1952,33.837
Indonesia,70.650,37.468,2007,1952,33.182
Saudi Arabia,72.777,39.875,2007,1952,32.902
Libya,73.952,42.723,2007,1952,31.229
...,...,...,...,...,...
"Congo, Dem. Rep.",47.804,39.143,2007,1952,8.661
Netherlands,79.762,72.130,2007,1952,7.632
Denmark,78.332,70.780,2007,1952,7.552
Liberia,46.027,38.480,2007,1952,7.547


In [27]:
# Which countries saw a significant increase in life expectancy from 1952 to 2007
data_agg.sort_values(by="Total_Life_Exp",ascending=False).head(1)

Unnamed: 0_level_0,life_exp_max,life_exp_min,year_max,year_min,Total_Life_Exp
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Oman,75.64,37.578,2007,1952,38.062


`Comparative Insights`

- Compare countries in Asia vs. Europe in terms of average life expectancy.

- How many countries never had a GDP per capita > $1000 in any year?

- Are there any countries with life expectancy below 40 years in 2007?

In [28]:
# Compare countries in Asia vs. Europe in terms of average life expectancy.

# For Asia
print(data[data["continent"]=="Asia"].groupby("country")["life_exp"].mean())


country
Afghanistan           38.783469
Bahrain               65.605667
Bangladesh            49.834083
Cambodia              47.902750
China                 61.785140
Hong Kong, China      73.492833
India                 53.166083
Indonesia             54.335750
Iran                  58.636583
Iraq                  56.581750
Israel                73.645833
Japan                 74.826917
Jordan                59.786417
Korea, Dem. Rep.      63.607333
Korea, Rep.           65.001000
Kuwait                68.922333
Lebanon               65.865667
Malaysia              64.279583
Mongolia              55.890333
Myanmar               53.321667
Nepal                 48.986333
Oman                  58.442667
Pakistan              54.882250
Philippines           60.967250
Saudi Arabia          58.678750
Singapore             71.220250
Sri Lanka             66.526083
Syria                 61.346167
Taiwan                70.336667
Thailand              62.200250
Vietnam               57.479500


In [29]:
# For Europe
print(data[data["continent"]=="Europe"].groupby("country")["life_exp"].mean())

country
Albania                   68.432917
Austria                   73.103250
Belgium                   73.641750
Bosnia and Herzegovina    67.707833
Bulgaria                  69.743750
Croatia                   70.055917
Czech Republic            71.510500
Denmark                   74.370167
Finland                   72.991917
France                    74.348917
Germany                   73.444417
Greece                    73.733167
Hungary                   69.393167
Iceland                   76.511417
Ireland                   73.017250
Italy                     74.013833
Montenegro                70.299167
Netherlands               75.648500
Norway                    75.843000
Poland                    70.176917
Portugal                  70.419833
Romania                   68.290667
Serbia                    68.551000
Slovak Republic           70.696083
Slovenia                  71.600750
Spain                     74.203417
Sweden                    76.177000
Switzerland         

In [30]:
df_cont = data.groupby("continent")["life_exp"].mean()
df_cont

continent
Africa      48.865330
Americas    64.658737
Asia        60.104438
Europe      71.903686
Oceania     74.326208
Name: life_exp, dtype: float64

In [31]:
if (df_cont["Asia"]>df_cont["Europe"]):
    print("Comarision between Asia and Europe in term of Avg Life Exp:", df_cont["Asia"]-df_cont["Europe"], "and Asia has higher.")
else:
    print("Comarision between Europe and Asia in term of Avg Life Exp:", df_cont["Europe"]-df_cont["Asia"],"and Europe has higher.")

Comarision between Europe and Asia in term of Avg Life Exp: 11.79924846732149 and Europe has higher.


In [32]:
# How many countries never had a GDP per capita > $1000 in any year?
print(data[data["gdp_cap"]<1000]["country"].nunique(), "countries never had a GDP per capita > $1000 in any year.")

46 countries never had a GDP per capita > $1000 in any year.


In [33]:
# Are there any countries with life expectancy below 40 years in 2007?

print((data[data["year"]==2007]["life_exp"]<40).sum(),"country with life expectancy below 40 years in 2007")

1 country with life expectancy below 40 years in 2007
