# Task 2: Mind the gap
---

In this exercise, we'll practice some basic Pandas operations on the [Gapminder dataset](https://www.gapminder.org/about/). Gapminder is an educational foundation that aims to use data to unbiasedly describe trends in health and socioeconomics; it is a great source of geographical, socioeconomic, and health data - a subset of which we'll be exploring here. In particular, we'll be exploring a dataframe of the following features in this exercise:

|     | country     | continent | year | lifeExp | pop      | gdpPercap  |
|:---:|:-----------:|:---------:|:----:|:-------:|:--------:|:----------:|
|  0  | Afghanistan | Asia      | 1952 | 28.801  | 8425333  | 779.445314 |
|  1  | Afghanistan | Asia      | 1957 | 30.332  | 9240934  | 820.853030 |
|  2  | Afghanistan | Asia      | 1962 | 31.997  | 10267083 | 853.100710 |
|  3  | Afghanistan | Asia      | 1967 | 34.020  | 11537966 | 836.197138 |
|  4  | Afghanistan | Asia      | 1972 | 36.088  | 13079460 | 739.981106 |
| ... |     ...     |    ...    | ...  |   ...   |   ...    |    ...     |

In [2]:
import pandas as pd

## 2.1

Read the gapminder dataset into a dataframe called `df` from this url: <https://raw.githubusercontent.com/jstaf/gapminder/master/gapminder/gapminder.csv>

In [4]:
# Your solution here
data=pd.read_csv('https://raw.githubusercontent.com/jstaf/gapminder/master/gapminder/gapminder.csv')
data

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.853030
2,Afghanistan,Asia,1962,31.997,10267083,853.100710
3,Afghanistan,Asia,1967,34.020,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623


## 2.2

Which continent has the most observations in the gapminder dataset? You can leave your answer as the output of a dataframe operation (*hint*: `.value_counts()`).

In [5]:
# Your solution here
counts=data["continent"].value_counts()
print(counts)
print("The most one is", counts.idxmax())

Africa      624
Asia        396
Europe      360
Americas    300
Oceania      24
Name: continent, dtype: int64
The most one is Africa


## 2.3

What are the minimum and maximum life expectancies in the dataset, and what are the corresponding countries and the years? (*hint*: `.argmin()`/`.argmax()`)

In [6]:
# Your solution here
min_lifeExp_index = df['lifeExp'].idxmin()
max_lifeExp_index = df['lifeExp'].idxmax()



# Retrieve the rows with the minimum and maximum life expectancies
min_lifeExp_row = df.loc[min_lifeExp_index]
max_lifeExp_row = df.loc[max_lifeExp_index]

# Display the results
print(f"Minimum Life Expectancy:\n{min_lifeExp_row}\n")
print(f"Maximum Life Expectancy:\n{max_lifeExp_row}")

Minimum Life Expectancy:
country          Rwanda
continent        Africa
year               1992
lifeExp          23.599
pop             7290203
gdpPercap    737.068595
Name: 1292, dtype: object

Maximum Life Expectancy:
country            Japan
continent           Asia
year                2007
lifeExp           82.603
pop            127467972
gdpPercap    31656.06806
Name: 803, dtype: object


In [7]:
min_lifeExp_index = df['lifeExp'].argmin()
max_lifeExp_index = df['lifeExp'].argmax()

min_lifeExp_row = df.loc[min_lifeExp_index]
max_lifeExp_row = df.loc[max_lifeExp_index]

# Display the results
print(f"Minimum Life Expectancy:\n{min_lifeExp_row}\n")
print(f"Maximum Life Expectancy:\n{max_lifeExp_row}")

Minimum Life Expectancy:
country          Rwanda
continent        Africa
year               1992
lifeExp          23.599
pop             7290203
gdpPercap    737.068595
Name: 1292, dtype: object

Maximum Life Expectancy:
country            Japan
continent           Asia
year                2007
lifeExp           82.603
pop            127467972
gdpPercap    31656.06806
Name: 803, dtype: object


## 2.4

How much larger is the total population in this dataset in 2007 compared to 1952? You can give you answer as a float, e.g., "the population is 1.8 times larger in 2007 than in 1952." (*hint*: you can use `.query()` to subset the dataframe for 1952 and then calculate the `.sum()` of the population, then repeat for 2007).

In [8]:
# Your solution here
population_1952 = df.query('year == 1952')['pop'].sum()


population_2007 = df.query('year == 2007')['pop'].sum()


population_ratio = population_2007 / population_1952

print(f"The population is {population_ratio:.1f} times larger in 2007 than in 1952.")

The population is 2.6 times larger in 2007 than in 1952.


## 2.5

What is the mean life expectancy of countries with the highest 50% of `gdpPercap` and countries with the lowest 50% of `gdpPercap`? (*hint*: try combining `.query()` and `.median()`)

In [9]:
# Your solution here
median_gdpPercap = df['gdpPercap'].median()


mean_lifeExp_high_gdp = df.query('gdpPercap > @median_gdpPercap')['lifeExp'].mean()


mean_lifeExp_low_gdp = df.query('gdpPercap <= @median_gdpPercap')['lifeExp'].mean()


print(f"Mean life expectancy for countries with the highest 50% of GDP per capita: {mean_lifeExp_high_gdp:.2f}")
print(f"Mean life expectancy for countries with the lowest 50% of GDP per capita: {mean_lifeExp_low_gdp:.2f}")


Mean life expectancy for countries with the highest 50% of GDP per capita: 68.93
Mean life expectancy for countries with the lowest 50% of GDP per capita: 50.02
