![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=world-teachers-day/world-teachers-day.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto’s Weekly Data Visualization

## World Teachers' Day

### Recommended Grade levels: 1-6
<br>

### Instructions
#### “Run” the cells to see the graphs
Click “Cell” and select “Run All”.<br> This will import the data and run all the code, so you can see this week's data visualization. Scroll to the top after you’ve run the cells.<br> 

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don’t need to do any coding to view the visualizations**.
The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

# Question

<center><img src="img/love_to_learn.jpg" width=500 height=300><br>
<i style="font-size:9px;">Photo from <a href="https://unsplash.com/photos/WE_Kv_ZB1l0">Unsplash</a> by <a href="https://unsplash.com/@timmossholder">Tim Mossholder</a></i></center>

**World Teachers' Day** is a day of celebration for all teachers around the globe, returning annaully on **October 5th**. It celebrates the positive impacts teachers have made in transforming education and reflects on the support they provided to their students. More than 100 countries celebrate World Teachers' Day with slight differences on the date. For example, India celerbates World Teachers' Day on September 5th, a month earlier than Canada. 

Since 1996, [Uncesco](https://www.unesco.org/en/days/teachers) designs a campaign for teachers with specialzied themes for every year. 

This year, the theme is centered around **"The teachers we need for the education we want: The global imperative to reverse the teacher shortage"**, which is aimed to stop an international trend of declining number of teachers. 

### Goal
In this notebook, our objective is to examine changes in the number of teachers in Canada over the past decade by gathering and analyzing pertinent data.  

Following that, we will present students' survey responses regarding their experiences and sentiments related to their current or previous teachers. These survey findings will be visualized, possibly accompanied by selected quotes or testimonials, to highlight the extent of student appreciation.

# Gather

### Code:
The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
# import basic data wrangling libraries
%pip install -q pyodide_http plotly nbformat nltk pyspellchecker openpyxl
import pyodide_http
pyodide_http.patch_all()
import pandas as pd
import numpy as np

# import NLP data wrangling libraries
import nltk
nltk.download('averaged_perceptron_tagger')
from spellchecker import SpellChecker

# import visualization libraries
from wordcloud import WordCloud
from wordcloud import STOPWORDS
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as io
import plotly.graph_objects as go
print('libraries imported.')

### Data:
We have two datasets that we will be focusing on this notebook. 

- ##### Statistics Canada
Dataset on the number of educators across Canadian provices is collected from [Statistics Canada website](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3710001001&pickMembers%5B0%5D=2.1&pickMembers%5B1%5D=3.1&pickMembers%5B2%5D=4.1&cubeTimeFrame.startYear=2002+%2F+2003&cubeTimeFrame.endYear=2016+%2F+2017&referencePeriods=20020101%2C20160101) (2003-2017). This dataset includes information on the number of full-time and part-time educators across Canadian provinces, with detailed information on their genders and age groups. In our exploration we divided them into three categories: *all-gender*, *females*, and *males*. 

- ##### Callysto Survey on World Teachers' Day
The Callysto Team distributed a brief survey to students, offering a Google Form link for them to respond to questions about their favorite teachers. We emphasized the importance of maintaining anonymity when referring to a particular teacher and encouraged them to highlight the specific characteristics that made these teachers their favorites. 

### Import the data

##### Statistics Canada

In [None]:
male = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/world-teachers-day/data/male_teacher_count.csv")
female = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/world-teachers-day/data/female_teacher_count.csv")
all_genders = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/world-teachers-day/data/teacher_count.csv")

print("This dataset is derived from Statistics Canada, on the number of full-time & part-time all-gender educators.")
display(male.head())
print("This dataset is derived from Statistics Canada, on the number of full-time & part-time female educators.")
display(female.head())
print("This dataset is derived from Statistics Canada, on the number of full-time & part-time male educators.")
display(all_genders.head())

##### Callysto Student Survey
This dataset includes responses to three short answer questions: 

-What makes your teacher your *favourite?*

-What impact did they have on you?

-What makes them unique?

-What would you describe your favourite teacher as?": which the students could choose multipledropdown-list responses as their answers

In [None]:
student_df = pd.read_excel("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/world-teachers-day/data/Favourite_Teacher_Student_Feedback_(Responses).xlsx")
student_df.head()

### Comment on the data
##### Statistics Canada

In [None]:
print(f"Dataframe size for number of male educators in Canadian provinces: {male.shape[0]} rows and {male.shape[1]} columns.")
print(f"Dataframe size for number of female educators in Canadian provinces: {female.shape[0]} rows and {female.shape[1]} columns.")
print(f"Dataframe size for number of all educators in Canadian provinces: {all_genders.shape[0]} rows and {all_genders.shape[1]} columns.")

#### Callysto Student Survey

In [None]:
print(f"The Callysto Student Survey data consists of {student_df.shape[0]} student responses, answering {student_df.shape[1]-1} questions each.")

# Organize

The code below will arrange the data cleanly so that we can do analysis on it. This is a quality control step for our data and involves examining the data to detect anything odd with the data (e.g. structure, missing values), fixing the oddities, and checking if the fixes worked. 

#### Statistics Canada

In [None]:
def organize_df(df):
    df["Year"] = df["REF_DATE"].str.split("/").str[-1]
    df["Year"] = pd.DatetimeIndex(df["Year"]).year
    df = df[df["GEO"] != "Canada"]
    return df

all_df = pd.concat([organize_df(all_genders), organize_df(female), organize_df(male)])
all_df.rename(columns={"VALUE": "Count"}, inplace=True)
all_df.head()

#### Callysto Student Survey

In [None]:
columns = ["Timestamp", "Favourite", "Impact", "Unique", "Characteristics"]
student_df.columns = columns

def spell_check(df, col):
    spell_checker = SpellChecker()
    
    for ind, row in df.iterrows():
        df[col] = df[col].astype(str)
        try:
            all_list = row[col].split(" ")
            final_list = []
        
            for word in all_list:
                word = spell_checker.correction(word)
                final_list.append(word)
                df.loc[ind, col] = " ".join(final_list)
        except:
            continue
    return df
    
col_list = student_df.columns[1:-1]

for column in col_list:
    spell_check(student_df, column)
    
student_df.head()

Using the `spellchecker` library, we replaced potential spelling mistakes with proper words.

# Explore

The code below will be used to help us look for evidence to answer our question. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations to represent our data.

### Insight into the number of educators across Canadian provinces

In [None]:
colors=['#1f77b4',  # muted blue
        '#ff7f0e',  # safety orange
        '#2ca02c',  # cooked asparagus green
        '#d62728',  # brick red
        '#9467bd',  # muted purple
        '#8c564b',  # chestnut brown
        '#e377c2',  # raspberry yogurt pink
        '#17becf',  # blue-teal
        '#210240',  # dark purple
        '#21DC49',  # bright green
        '#3F5063',  # dark navy
        '#6C7075',  # dark grey
        '#F4BC1A']  # mustard

# Set unique colors for each Canadian provinces.
color_dict = dict(zip(all_df["GEO"].unique(), colors))
all_df["Color"] = all_df["GEO"].map(color_dict)

In [None]:
all_df_filtered = all_df[all_df["Sex"] == "Both sexes"]
teacher_fig = px.line(all_df_filtered, x="Year", y="Count", markers=True, color="GEO", height=500)
teacher_fig.update_layout(title={"text":"Historical Number of Full Time & Part Time Education in Canada", "x":0.5})
teacher_fig.show()

The graph above is a little bit difficult to comprehend, and is hard to draw a conclusion on whether the number of total educators are declining. Instead, we will look into the [student enrollment data (Statistics Canada)](https://www150.statcan.gc.ca/n1/daily-quotidien/221122/dq221122e-eng.htm) from 2003 to 2017. This data is manually compiled with provincial data retrieved from Statistics Canada. 

Notice that for the comparison below, we are only focusing on **all genders (combining male and females)** to ease our understanding.

In [None]:
student_enrol_df = pd.read_csv("https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/world-teachers-day/data/all_enrollment.csv")
count_df = all_df[["GEO", "Count", "Year", "Sex"]]
count_df = count_df[count_df["Sex"] == "Both sexes"]
enrol_df = student_enrol_df[["GEO", "VALUE", "Year"]].rename(columns={"VALUE": "Enrollment"})
comb_df = count_df.merge(enrol_df, on=["GEO", "Year"])
comb_df.drop_duplicates(inplace=True)
comb_df["Ratio"] = comb_df["Count"] / comb_df["Enrollment"]
comb_df.head()

In [None]:
enrollment_fig = px.line(comb_df, x="Year", y="Ratio", markers=True, color="GEO")
enrollment_fig.update_layout(title={"text":"Teacher-Student Enrollment Ratio", "x":0.5},
                            height=500)
enrollment_fig.show()

Overall, we notice that the teacher-student enrollment varies from province to province. From the graph, we can conclude that **British Columbia** experienced the greatest reduction in teacher-student enrollment ratio while **Newfoundland and Labrador** showed the greatest increase. 

### Callysto Student Survey Analysis
#### Bar Graph

In [None]:
def count_values(df, col):
    all_lists = df[col].values
    count = {}
    for ind, item in enumerate(all_lists):
        item_words = item.split(", ")
        for ind_item in item_words:
            if ind_item in list(count.keys()):
                count[ind_item] += 1
            else:
                count[ind_item] = 1
    return count

count_dict = count_values(student_df, "Characteristics")
count_df = pd.DataFrame.from_dict(count_dict, orient="index").reset_index()
count_df.columns = ["Characteristic", "Count"]
count_df = count_df.sort_values(by="Count")
count_df.head()

In [None]:
student_bar = px.bar(count_df, x="Count", y="Characteristic", 
                     title="What makes your teacher a 'Favourite Teacher'?", text_auto='.2s')
student_bar.update_layout(width=800, height=800, title=dict(x=0.5))
student_bar.show()

#### WordCloud
Worldcloud images are generated based on student answers to the following questions:
- What makes your favourite teacher *favourite*?
- What *impact* did they have on you?
- What makes them *unique*?

To generate a meaningful wordcloud image, we extracted all **adjectives** from student comments and created our images based on that.

In [None]:
def create_wordcloud(df, col):
    print(f"Displaying word cloud for {col}.")
    all_text =  " ".join(text for text in df[col] if nltk.pos_tag([text])[0][1] == "JJ")
    stopwords = list(STOPWORDS) + ["teacher", "teachers", "students", "make", "class"]
    word_cloud = WordCloud(stopwords=stopwords, background_color="white").generate(all_text)

    plt.figure(figsize=(7,7))
    plt.imshow(word_cloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()
    
for column in col_list:
    create_wordcloud(student_df, column)

# Communicate
Below are some writing prompts to help you reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have?

- I used to think ____________________ but now I think ____________________. 
- I wish I knew more about ____________________. 
- This visualization reminds me of ____________________. 
- I really like ____________________.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)