# Literacy Around the World

We want you to create a notebook and:  
- Load the data in the CSV List the columns Get the number of rows & columns 
- Describe the data - what's the best and worse country/year combination 
- Filter the data for the year 1980 and order the result by decreasing value 
- Group the data by year - what's the mean value for year 1980? What's the difference with 2000?

In [115]:
! pip install pandas



In [116]:
import pandas as pd

## Import CSV

In [117]:
literacy_df = pd.read_csv('data/literacy.csv',\
    encoding='ISO-8859-1')

# List of columns and Get the number of rows and columns

In [118]:
print(literacy_df.columns)
print(f'Rows:{literacy_df.shape[0]}, Columns:{literacy_df.shape[1]}')

Index(['Entity', 'Code', 'Year',
       'Youth literacy rate, population 15-24 years, male (%)',
       'Youth literacy rate, population 15-24 years, female (%)',
       'Total population (Gapminder, HYDE & UN)', 'Continent'],
      dtype='object')
Rows:56255, Columns:7


# Best female literacy rate per country/year combo

In [119]:
best_f_rate = literacy_df.groupby(["Entity", "Year"])[\
    "Youth literacy rate, population 15-24 years, female (%)"\
            ]\
                .max().sort_values(\
                        ascending=False)
print(f'Best female countries/year combo is:\n{best_f_rate.head(7)}')

Best female countries/year combo is:
Entity      Year
Uzbekistan  2014    100.0
Andorra     2015    100.0
Uzbekistan  2015    100.0
Azerbaijan  2007    100.0
Uzbekistan  2013    100.0
Greenland   2015    100.0
Andorra     2016    100.0
Name: Youth literacy rate, population 15-24 years, female (%), dtype: float64


# Best male literacy rate per country/year combo

In [120]:
best_m_rate = literacy_df.groupby(["Entity", "Year"])[\
    "Youth literacy rate, population 15-24 years, male (%)"\
            ]\
                .max().sort_values(\
                        ascending=False)
print(f'Best male countries/year combo is:\n{best_m_rate.head(7)}')

Best male countries/year combo is:
Entity      Year
Andorra     2015    100.0
Greenland   2015    100.0
Uzbekistan  2015    100.0
            2014    100.0
            2013    100.0
Andorra     2016    100.0
Azerbaijan  2007    100.0
Name: Youth literacy rate, population 15-24 years, male (%), dtype: float64


# Worst female literacy rate per country/year combo

In [121]:
worst_f_rate = literacy_df.groupby(["Entity", "Year"])[\
    "Youth literacy rate, population 15-24 years, female (%)"\
            ]\
                .min().sort_values(\
                        ascending=True)
print(f'Worst female countries/year combo is:\n{worst_f_rate.head(1)}')

Worst female countries/year combo is:
Entity        Year
Burkina Faso  1975    6.66406
Name: Youth literacy rate, population 15-24 years, female (%), dtype: float64


# Worst male literacy rate per country/year combo

In [122]:
worst_m_rate = literacy_df.groupby(["Entity", "Year"])[\
    "Youth literacy rate, population 15-24 years, male (%)"\
            ]\
                .min().sort_values(\
                        ascending=True)
print(f'Worst male countries/year combo is:\n{worst_m_rate.head(1)}')

Worst male countries/year combo is:
Entity        Year
Burkina Faso  1975    22.377131
Name: Youth literacy rate, population 15-24 years, male (%), dtype: float64


# filter 1980 and order by decreasing value

In [123]:
filter_by_year = literacy_df[literacy_df["Year"]==1980]
filter_by_year

Unnamed: 0,Entity,Code,Year,"Youth literacy rate, population 15-24 years, male (%)","Youth literacy rate, population 15-24 years, female (%)","Total population (Gapminder, HYDE & UN)",Continent
219,Afghanistan,AFG,1980,,,1.335650e+07,
477,Africa,,1980,,,4.763862e+08,
741,Albania,ALB,1980,,,2.682697e+06,
1000,Algeria,DZA,1980,,,1.922166e+07,
1038,American Samoa,ASM,1980,97.888618,97.443649,3.264800e+04,
...,...,...,...,...,...,...,...
55171,Western Sahara,ESH,1980,,,1.508600e+05,
55460,World,OWID_WRL,1980,,,4.458275e+09,
55691,Yemen,YEM,1980,,,7.941903e+06,
55957,Zambia,ZMB,1980,,,5.851818e+06,


# Group the female data by year 

In [124]:
group_by_year_f = literacy_df.groupby("Year")["Youth literacy rate, population 15-24 years, female (%)"].mean().sort_values(ascending=True)
group_by_year_f

Year
1979    45.026871
1975    59.208149
1976    62.695742
1984    68.524131
1988    69.231857
          ...    
2017          NaN
2018          NaN
2019          NaN
2020          NaN
2021          NaN
Name: Youth literacy rate, population 15-24 years, female (%), Length: 259, dtype: float64

# What's the mean value for year 1980? What's the difference with 2000?
- mean 1980 = 85.979322
- mean 2000 = 82.30936

# Group the male data by year

In [126]:
group_by_year_m = literacy_df.groupby("Year")["Youth literacy rate, population 15-24 years, male (%)"].mean().sort_values(ascending=True)
group_by_year_m

Year
1979    65.682600
1975    70.815209
1976    75.574111
1978    79.717470
1988    81.270839
          ...    
2017          NaN
2018          NaN
2019          NaN
2020          NaN
2021          NaN
Name: Youth literacy rate, population 15-24 years, male (%), Length: 259, dtype: float64

# What's the mean value for year 1980? What's the difference with 2000?
- mean 1980 = 90.240170
- mean 2000 = 88.327283