![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=master&subPath=world-children's-day/world-children's-day.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto's Weekly Data Visualization


## World Children's Day

### Recommended Grade levels: 

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

## Question

What are the percentage of the global population around the world that is under 14, and how many children worldwide are out of primary school? What can be interpreted based on this information? 

### Goal

Our goal in this notebook is to uncover trends in the global population, focusing on the percentage of individuals aged 14 and under, and the number of children worldwide who are out of primary school. Specifically, we want to see if these trends are scaling up or down, and find out whether educational efforts are being made to help children attend primary school.

In this notebook, our objective is to discern trends in the global population, focusing on the percentage of individuals aged 14 and under, and the count of children worldwide who lack access to primary school education. Our primary focus is to discover whether these trends indicate an upward or downward trajectory, and find out whether educational efforts are being made to help children attend primary school.

### Background

The well-being and education of younger generations play a pivotal role in shaping the future of our global community.  Children represent our future leaders, innovators, and caretakers, and their success contributes to the ongoing cycle of progress.

This background sets the stage for our exploration, emphasizing the significance of understanding and addressing trends in the global population, particularly those related to the percentage of individuals aged 14 and under, and the educational status of children worldwide.

## Gather

Our data is collected through [the World Bank](https://data.worldbank.org/). Population ages 0-14 (% of total population) is sourced by the United Nations Population Division, and can be found [here](https://data.worldbank.org/indicator/SP.POP.0014.TO.ZS). Children out of school, primary is sourced by the UNESCO Institute for Statistics, and can be found [here](https://data.worldbank.org/indicator/SE.PRM.UNER).

### Code: 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
try:
    import pycountry_convert as pc
except:
    !pip install pycountry_convert
    import pycountry_convert as pc
import ipywidgets
from ipywidgets import interact
import geopandas as gpd
import folium 

print("Libaries imported")

In [None]:
percentage_pop = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/world-children's-day/percentageofpopchildren.csv", skiprows=4)
primary_school = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/world-children's-day/outofprimaryschool.csv", skiprows=4)
del percentage_pop['Unnamed: 67']
del primary_school['Unnamed: 67']
display(percentage_pop, primary_school)

In [None]:
def country_code_to_continent(country_code):
    mapping = pc.map_country_alpha3_to_country_name()
    try:
        country_name = mapping[country_code]
        country_alpha2 = pc.country_name_to_country_alpha2(country_name)
        country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
        country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
        return country_continent_name
    except:
        return None

print(country_code_to_continent('ABW'))

In [None]:
country_codes = list(percentage_pop['Country Code'])
print(country_codes)

In [None]:
for country in country_codes:
    try:
        percentage_pop.loc[percentage_pop['Country Code'] == country, 'Continent'] = country_code_to_continent(country)
    except:
        pass
percentage_pop

In [None]:
pd.set_option('display.max_rows', None)
countries_without_label = percentage_pop[percentage_pop['Continent'].isna()].reset_index(drop=True)
temp = list(countries_without_label['Country Name'])
display(temp)
pd.reset_option('display.max_rows')

In [None]:
world_df = percentage_pop[percentage_pop['Country Name'] == 'World']
world_df

In [None]:
percentage_pop = percentage_pop.dropna(subset=['Continent'])
percentage_pop

In [None]:
cols_to_check = percentage_pop.columns[4:-1]
hashmap_countries_max = {}
for column in cols_to_check:
    max_val = percentage_pop[column].max()
    results = percentage_pop.loc[percentage_pop[column] == max_val]
    
    country_name = results['Country Name'].values[0]
    hashmap_countries_max[country_name] = 1+hashmap_countries_max.get(country_name, 0)
    
    print(f"{country_name} - {column}: {max_val}%")

In [None]:
cols_to_check = percentage_pop.columns[4:-1]
hashmap_countries_min = {}
for column in cols_to_check:
    min_val = percentage_pop[column].min()
    results = percentage_pop.loc[percentage_pop[column] == min_val]
    
    country_name = results['Country Name'].values[0]
    hashmap_countries_min[country_name] = 1+hashmap_countries_min.get(country_name, 0)
    
    print(f"{country_name} - {column}: {min_val}%")

In [None]:
hashmap_countries_max = {k: v for k, v in sorted(hashmap_countries_max.items(), key=lambda item: item[1])}

print(f"Total number of unique countries for max value: {len(hashmap_countries_max)}")
for country in hashmap_countries_max:
    print(f"{country}, Total count: {hashmap_countries_max[country]}")

print('\n')
hashmap_countries_min = {k: v for k, v in sorted(hashmap_countries_min.items(), key=lambda item: item[1])}

print(f"Total number of unique countries for min val: {len(hashmap_countries_min)}")
for country in hashmap_countries_min:
    print(f"{country}, Total count: {hashmap_countries_min[country]}")

In [None]:
years = []
percentages = []

for column in cols_to_check:
    years.append(int(column))
    percentages.append(world_df[column].values[0])

px.line(x=years, y=percentages, labels={'x': 'Year', 'y': 'Percentage'}, title='World Population Percentage Under 14 from 1960-2022').show()

In [None]:
individual_continents = percentage_pop[['Continent'] + list(cols_to_check)]

continents_avg = individual_continents.groupby('Continent').mean().reset_index()
continents_avg

In [None]:
continental_melted = pd.melt(continents_avg, id_vars=['Continent'], value_vars=cols_to_check, var_name='Year', value_name='Percentage')

px.line(continental_melted, x='Year', y='Percentage', color='Continent',labels={'x': 'Year', 'y': 'Percentage'}, title='Average Continental Population Percentage Under 14 from 1960-2022').show()

In [None]:
columns_to_plot = percentage_pop.columns[4:-1]

continents = ['North America', 'Asia', 'Europe', 'Oceania', 'Africa', 'South America']

continent_dropdown = ipywidgets.Dropdown(options=continents, description='Continent')
def update_plot(continent):
    continent_filtered = percentage_pop[percentage_pop['Continent'] == continent]

    per_year_df = pd.melt(continent_filtered, id_vars=['Country Name'], value_vars=columns_to_plot, var_name='Year', value_name='Percentage')

    world_avg_df = pd.melt(world_df, id_vars='Country Name', value_vars=columns_to_plot, var_name='Year', value_name='Percentage')
    # Set the country name for the world average
    world_avg_df['Country Name'] = 'World'  

    final_df = pd.concat([per_year_df, world_avg_df], ignore_index=True)

    continental_fig = px.line(final_df, x='Year', y='Percentage', color='Country Name', line_group='Country Name', hover_name='Country Name')
    continental_fig.update_layout(title=f'Countries in {continent} with Population Percentage Under 14', xaxis_title='Year', yaxis_title='Percentage', legend_title='Country', height=800).show()

interact(update_plot, continent=continent_dropdown)

In [None]:
countries_geojson = gpd.read_file('https://raw.githubusercontent.com/callysto/data-files/main/SocialStudies/UnitedNations/countries.geojson')
countries_geojson

In [None]:
merged_df = pd.merge(countries_geojson, percentage_pop, left_on='ISO_A3', right_on='Country Code', how='left')
merged_df

In [None]:
percentage_country_map = ipywidgets.Output(layout={'border': '1px solid black'})

column_names = merged_df.columns[7:-1].tolist()
dropdown_options = ipywidgets.Dropdown(
    options=column_names,
    value=column_names[0],
    description='Column:',
    disabled=False
)

def update_choropleth(change):
    percentage_country_map.clear_output()
    with percentage_country_map:
        m = folium.Map()
        folium.Choropleth(
            geo_data=countries_geojson,
            data=merged_df,
            columns=['ADMIN', dropdown_options.value],  
            key_on='feature.properties.ADMIN',  
            fill_color='YlGn',
            fill_opacity=0.7,
            line_opacity=0.2,
            legend_name=f'{dropdown_options.value}',
        ).add_to(m)
        display(m)

dropdown_options.observe(update_choropleth, names='value')
display(dropdown_options)
update_choropleth({'new': column_names[0]})

percentage_country_map

In [None]:
primary_school

In [None]:
total_none = primary_school.isnull().sum().sum()
print(f"Total number of missing values: {total_none}")

In [None]:
cols_to_check = primary_school.columns[4:-1]
cols_to_check

In [None]:
none_counts = primary_school[cols_to_check].count(axis=1)

# Eliminate any rows with more than half of the year columns missing
filtered_primary_school = primary_school[none_counts >= len(cols_to_check) / 2]
filtered_primary_school = filtered_primary_school.reset_index(drop=True)
display(filtered_primary_school)

In [None]:
country_dropdown = ipywidgets.Dropdown(options=filtered_primary_school['Country Name'].unique(), description='Country')

def update_plot(country):
    country_data = filtered_primary_school[filtered_primary_school['Country Name'] == country]
    melted_country_data = pd.melt(country_data, id_vars=['Country Name'], value_vars=cols_to_check, var_name='Year', value_name='Percentage')
    px.line(melted_country_data, x='Year', y='Percentage',labels={'x': 'Year', 'y': 'Percentage'},title=f'Progression of Number of Children out of Primary School from 1960-2022 in {country}').show()
    
interact(update_plot, country=country_dropdown)

In [None]:
# Extract the last 23 columns
years_to_check = filtered_primary_school.columns[-23:]

country_names = []
recent_years = []
recent_values = []
average_values = []

for index, row in filtered_primary_school.iterrows():
    value_2022 = row.get('2022', None)
    
    if pd.isna(value_2022):
        for year in range(2021, 1999, -1):
            value_2022 = row.get(str(year), None)
            if not pd.isna(value_2022):
                break

    elif value_2022 is not None:
        year = 2022

    values_2002_to_2021 = row[years_to_check]

    average_2002_to_2021 = values_2002_to_2021.mean()

    country_names.append(row['Country Name'])
    recent_years.append(year)
    recent_values.append(value_2022)
    average_values.append(average_2002_to_2021)

comparison_fig = go.Figure()

comparison_fig.add_trace(go.Bar(x=country_names, y=recent_values,text=recent_years, hovertemplate='Year: %{text}<br>Number of Children out of Primary School: %{y}',name='Most Recent Year', marker_color='blue'))
comparison_fig.add_trace(go.Bar(x=country_names, y=average_values, name='20 Years Average', marker_color='orange'))

comparison_fig.update_layout(title="Comparison of Number of Children out of Primary School for Most Recent Year and 20 Years Averages from 1960-2022",xaxis_title='Country', yaxis_title='Number of Children out of Primary School',barmode='group', height=800)

comparison_fig.show()