![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcallysto-sample-notebooks&branch=master&subPath=notebooks/Digital_Citizenship/Exam_data_provincial_results.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

In [1]:
%%html

<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }
  
  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from numpy import nan as Nan
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go

## Analysing Diploma Exam Data  - provincial results

 <img src="images/Alberta_education.jpg" width="200px" align="right"/>
 Provincial diploma exam results are located here: https://education.alberta.ca/diploma-exam-administration/diploma-results/?searchMode=3 .

We will download provincial results xlsx spreadsheet: diploma-multiyear-province-annual.xlsx

In [3]:
provincial_results = pd.read_excel('https://education.alberta.ca/media/3680581/diploma-multiyear-province-annual.xlsx')  

Examin the contens of the dataset (head function gets first 5 lines).

In [4]:
provincial_results.head()

Unnamed: 0,Diploma Course,2013 Prov Students Writing,2013 Prov School Mark % Exc,2013 Prov School Mark % Acc,2013 Prov School Average %,2013 Prov School Standard Deviation %,2013 Prov Exam Mark % Exc,2013 Prov Exam Mark Exc Sig,2013 Prov Exam Mark % Acc,2013 Prov Exam Mark Acc Sig,...,2017 Prov School Mark % Acc,2017 Prov School Average %,2017 Prov School Standard Deviation %,2017 Prov Exam Mark % Exc,2017 Prov Exam Mark Exc Sig,2017 Prov Exam Mark % Acc,2017 Prov Exam Mark Acc Sig,2017 Prov Exam Average %,2017 Prov Exam Standard Deviation %,Unnamed: 56
0,Biology 30,22429.0,42.9,96.0,74.4,13.9,32.2,+,84.4,+,...,97.0,76.3,13.7,32.3,=,84.2,-,68.7,16.9,
1,Chemistry 30,16159.0,41.9,95.4,74.0,13.9,31.8,+,78.8,+,...,97.5,77.3,13.5,38.6,+,83.1,+,70.1,18.9,
2,English Lang Arts 30-1,29034.0,30.7,97.1,72.1,11.8,10.4,=,85.9,=,...,97.9,73.0,11.6,11.7,=,86.5,=,64.0,12.8,
3,English Lang Arts 30-2,15383.0,11.8,93.8,65.4,11.7,10.9,+,89.4,=,...,95.6,66.8,11.5,11.4,-,89.5,=,65.4,12.2,
4,Français 30-1,154.0,44.8,100.0,76.5,10.1,18.2,,96.8,,...,97.4,75.9,10.9,18.6,,98.1,,70.9,10.2,


Dataset need to be reshaped to make the analysis easier. We will make separate column for year and columns for every result. Let's examin the column names:

In [5]:
provincial_results.columns.values[1:-1]

array(['2013 Prov Students Writing', '2013 Prov School Mark % Exc',
       '2013 Prov School Mark % Acc', '2013 Prov School Average %',
       '2013 Prov School Standard Deviation %',
       '2013 Prov Exam Mark % Exc', '2013 Prov Exam Mark Exc Sig',
       '2013 Prov Exam Mark % Acc', '2013 Prov Exam Mark Acc Sig',
       '2013 Prov Exam Average %', '2013 Prov Exam Standard Deviation %',
       '2014 Prov Students Writing', '2014 Prov School Mark % Exc',
       '2014 Prov School Mark % Acc', '2014 Prov School Average %',
       '2014 Prov School Standard Deviation %',
       '2014 Prov Exam Mark % Exc', '2014 Prov Exam Mark Exc Sig',
       '2014 Prov Exam Mark % Acc', '2014 Prov Exam Mark Acc Sig',
       '2014 Prov Exam Average %', '2014 Prov Exam Standard Deviation %',
       '2015 Prov Students Writing', '2015 Prov School Mark % Exc',
       '2015 Prov School Mark % Acc', '2015 Prov School Average %',
       '2015 Prov School Standard Deviation %',
       '2015 Prov Exam Mark % Ex

From column names we can get all possible years and results:

In [6]:
years = []
stats = []
for value in provincial_results.columns.values[1:-1]:
    year = value[0:4]
    stat = value[5:]
    int(year)
    if year not in years:
        years.append(year)
    if stat not in stats:
        stats.append(stat)
print(years)
print(stats)

['2013', '2014', '2015', '2016', '2017']
['Prov Students Writing', 'Prov School Mark % Exc', 'Prov School Mark % Acc', 'Prov School Average %', 'Prov School Standard Deviation %', 'Prov Exam Mark % Exc', 'Prov Exam Mark Exc Sig', 'Prov Exam Mark % Acc', 'Prov Exam Mark Acc Sig', 'Prov Exam Average %', 'Prov Exam Standard Deviation %']


Reshaping dataset and checking  first 5 lines:

In [7]:
provincial_results_reshaped = pd.DataFrame(columns=(['Diploma Course','Year'] + stats))
for ind,row in provincial_results.drop(provincial_results.index[len(provincial_results)-1]).iterrows():
    new_row = pd.DataFrame(columns=(['Diploma Course','Year'] + stats))
    new_row.loc[0] = [Nan,Nan,Nan,Nan,Nan,Nan,Nan,Nan,Nan,Nan,Nan,Nan,Nan]
    new_row.loc[0]['Diploma Course'] = row['Diploma Course']                                       
    for year in years:
        new_row.loc[0]['Year'] = year
        for stat in stats:
            new_row.loc[0][stat] = row[year+" "+stat]
        provincial_results_reshaped=provincial_results_reshaped.append(new_row)                               
provincial_results_reshaped = provincial_results_reshaped.reset_index(drop=True)
provincial_results_reshaped.head()                                             

Unnamed: 0,Diploma Course,Year,Prov Students Writing,Prov School Mark % Exc,Prov School Mark % Acc,Prov School Average %,Prov School Standard Deviation %,Prov Exam Mark % Exc,Prov Exam Mark Exc Sig,Prov Exam Mark % Acc,Prov Exam Mark Acc Sig,Prov Exam Average %,Prov Exam Standard Deviation %
0,Biology 30,2013,22429,42.9,96.0,74.4,13.9,32.2,+,84.4,+,68.8,16.5
1,Biology 30,2014,21733,43.6,96.2,74.7,13.8,31.8,+,85.2,+,68.9,16.6
2,Biology 30,2015,21257,45.3,96.4,75.2,13.8,33.0,+,85.8,+,69.4,16.5
3,Biology 30,2016,22550,47.0,97.1,75.9,13.7,32.4,=,85.1,=,69.1,16.8
4,Biology 30,2017,22993,48.4,97.0,76.3,13.7,32.3,=,84.2,-,68.7,16.9


Displaying diploma courses:

In [8]:
provincial_results_reshaped['Diploma Course'].drop_duplicates()

0                 Biology 30
5               Chemistry 30
10    English Lang Arts 30-1
15    English Lang Arts 30-2
20             Français 30-1
25     French Lang Arts 30-1
30          Mathematics 30-1
35          Mathematics 30-2
40                Physics 30
45                Science 30
50       Social Studies 30-1
55       Social Studies 30-2
Name: Diploma Course, dtype: object

Choosing 5 random courses and making 3d plot of number of students taking these courses over 5 years:
(Try playing with another set of courses)

In [9]:
init_notebook_mode(connected=True)         # initiate notebook for offline plot
subjects = ['Biology 30', 'Chemistry 30', 'English Lang Arts 30-1', 'Social Studies 30-1', 'Mathematics 30-1']
fill_colors = ['#66c2a5', '#fc8d62', '#8da0cb', '#e78ac3', '#a6d854']
gf = provincial_results_reshaped.groupby('Diploma Course')

data = []

for subject, fill_color in zip(subjects[::-1], fill_colors):
    group = gf.get_group(subject)
    years = group['Year'].tolist()
    length = len(years)
    subject_coords = [subject] * length
    pop = group['Prov Students Writing'].tolist()
    zeros = [0] * length
    
    data.append(dict(
        type='scatter3d',
        mode='lines',
        x=years + years[::-1] + [years[0]],  # year loop: in incr. order then in decr. order then years[0]
        y=subject_coords * 2 + [subject_coords[0]],
        z=pop + zeros + [pop[0]],
        name='',
        surfaceaxis=1, # add a surface axis ('1' refers to axes[1] i.e. the y-axis)
        surfacecolor=fill_color,
        #surfacecolor="white",
        #opacity=0.8,
        line=dict(
            color='black',
            #color=fill_color,
            width=4
        ),
    ))

layout = dict(
    title='Number of students taking course by year',
    showlegend=False,
    scene=dict(
        xaxis=dict(title=''),
        yaxis=dict(title=''),
        zaxis=dict(title=''),
        camera=dict(
            eye=dict(x=-1.7, y=-1.7, z=0.5)
        )
    )
)

fig = dict(data=data, layout=layout)

iplot(fig)

Let's choose one course and print the results for this course:

In [10]:
diploma_course='Biology 30'

In [11]:
result = provincial_results_reshaped[provincial_results_reshaped['Diploma Course'] == diploma_course]
result

Unnamed: 0,Diploma Course,Year,Prov Students Writing,Prov School Mark % Exc,Prov School Mark % Acc,Prov School Average %,Prov School Standard Deviation %,Prov Exam Mark % Exc,Prov Exam Mark Exc Sig,Prov Exam Mark % Acc,Prov Exam Mark Acc Sig,Prov Exam Average %,Prov Exam Standard Deviation %
0,Biology 30,2013,22429,42.9,96.0,74.4,13.9,32.2,+,84.4,+,68.8,16.5
1,Biology 30,2014,21733,43.6,96.2,74.7,13.8,31.8,+,85.2,+,68.9,16.6
2,Biology 30,2015,21257,45.3,96.4,75.2,13.8,33.0,+,85.8,+,69.4,16.5
3,Biology 30,2016,22550,47.0,97.1,75.9,13.7,32.4,=,85.1,=,69.1,16.8
4,Biology 30,2017,22993,48.4,97.0,76.3,13.7,32.3,=,84.2,-,68.7,16.9


Let's check how many students achieved a "standard of excellence" relative to the number of students taking this course:

In [12]:
trace2 = go.Bar( x= result['Year'], y=result["Prov Students Writing"]/100*(100-result["Prov Exam Mark % Exc"]), 
                name='The rest of the students taken the course')
trace1 = go.Bar(x=result['Year'], y=result["Prov Students Writing"]/100*result["Prov Exam Mark % Exc"],
    name='Students achieved standard of excellence')

data = [trace1, trace2]
layout = go.Layout(
    barmode='stack',
    title=diploma_course,
)

fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='stacked-bar')     

Let's create a  bar chart comparing number of students taken the course, number of students achieved acceptable standard and number of students achieved standard of excellence.

In [13]:
trace1 = go.Bar( x= result['Year'], y=result["Prov Students Writing"],
                name='Number of students taken the course', marker=dict(color='#59606D'))

trace2 = go.Bar(x=result['Year'], y=result["Prov Students Writing"]/100*result["Prov Exam Mark % Acc"],
                name='Students achieved acceptable standard', marker=dict(color='#ffcdd2'))

trace3 = go.Bar(x=result['Year'], y=result["Prov Students Writing"]/100*result["Prov Exam Mark % Exc"],
    name='Students achieved standard of excellence',marker=dict(color='#A2D5F2'))


data = [trace1, trace2, trace3]
layout = go.Layout(title=diploma_course,
                xaxis=dict(title='Year'),
                yaxis=dict(title='Number of students'))
fig = go.Figure(data=data, layout=layout)

iplot(fig)

Try generating the same plots for the other subjects

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)