# Example Notebook

(Last updated: May 22, 2023)

You can also create content with Jupyter Notebooks.
This means that you can include code blocks and their outputs in your book.
In this notebook, we show some examples of loading and plotting data.
Check [this documentation](https://jupyterbook.org/en/stable/content/executable/index.html) about how to write executable content.

In [2]:
# Import packages
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Load data

You can put your data at the same directory as the notebook file and then use `pandas` to load the data.

In [6]:
arbeidsmarkt = pd.read_csv("../data/arbeidsmarkt.csv", sep=';')
uurloon = pd.read_csv("../data/uurloon.csv", sep=';')
uurloon['Uurloon werknemers na verlaten ho (euro)']  = pd.to_numeric(uurloon['Uurloon werknemers na verlaten ho (euro)'] , errors='coerce')
arbeidsmarkt['Uitstromers ho (aantal)']  = pd.to_numeric(arbeidsmarkt['Uitstromers ho (aantal)'] , errors='coerce')
arbeidsmarkt.fillna(0, inplace = True)

peilmoment1 = arbeidsmarkt.loc[arbeidsmarkt['Peilmoment'] == 'Direct na verlaten onderwijs']

verdeling = peilmoment1.groupby(['Geslacht','Studierichting'])['Uitstromers ho (aantal)'].sum().reset_index()
verdeling.rename(columns={'Uitstromers ho (aantal)': 'Total studierichting'}, inplace=True)

In [7]:
labels = []
parents = []
values = []

unique_geslacht = set()

label_map = {
    'Recht': 'Law',
    'Gedrag en Maatschappij': 'Behaviour and society',
    'Gezondheidszorg': 'Healthcare',
    'Onderwijs': 'Education',
    'Economie': 'Economics',
    'Natuur': 'Nature',
    'Techniek': 'Technics',
    'Onderwijs': 'Education',
    'Sectoroverstijgend': 'Other',
    'Landbouw en natuurlijke omgeving': 'Agriculture',
    'Taal en cultuur': 'Language & culture',
    'Vrouwen': 'Women',
    'Mannen': 'Men'
}

for _, row in verdeling.iterrows():
    geslacht = row['Geslacht']
    studierichting = row['Studierichting']
    aantal = row['Total studierichting']

    if label_map[geslacht] not in unique_geslacht:
        labels.append(label_map[geslacht])
        parents.append("")
        if geslacht == 'Vrouwen':
            values.append(44000.0)
        elif geslacht == 'Mannen':
            values.append(35170.0)
        unique_geslacht.add(label_map[geslacht])

    if geslacht == 'Mannen':
        labels.append(label_map[studierichting] + ' ♂')
    elif geslacht == 'Vrouwen':
        labels.append(label_map[studierichting] + ' ♀')
    parents.append(label_map[geslacht])
    values.append(aantal)

fig = go.Figure(go.Sunburst(
    labels=labels,
    parents=parents,
    values=values,
    branchvalues='total',
    insidetextfont=dict(size=20),
))

fig.update_layout(
    title='Study subject by sex',
    margin=dict(t=50, l=25, r=25, b=25),
    height=600
)

fig.show()

The sunburst chart shows which degree subjects women and men study most. We can compare how these differ between the sexes. To follow this up we will look at the difference in wage between sexes per study subject.

In [9]:
uurloongem = pd.read_csv("../data/uurloongem.csv", sep=';')
hbo_bachelor = uurloongem.loc[uurloongem['Uitstromers ho met en zonder diploma'] == 'Hbo-bachelor']
hbo_master = uurloongem.loc[uurloongem['Uitstromers ho met en zonder diploma'] == 'Hbo-master']
wo_bachelor = uurloongem.loc[uurloongem['Uitstromers ho met en zonder diploma'] == 'Wo-bachelor']
wo_master = uurloongem.loc[uurloongem['Uitstromers ho met en zonder diploma'] == 'Wo-master']

In [10]:
trace = [go.Bar(
    x=uurloon['Peilmoment'],
    y=hbo_bachelor['Uurloon werknemers na verlaten ho (euro)'],
    name='Hbo-bachelor',
    marker_color='rgb(102,194,165)',
    hoverinfo='y+name'
    ),
    go.Bar(
    x=uurloon['Peilmoment'],
    y=hbo_master['Uurloon werknemers na verlaten ho (euro)'],
    name='Hbo-master',
    marker_color='rgb(252,141,98)',
    hoverinfo='y+name'
    ),
    go.Bar(
    x=uurloon['Peilmoment'],
    y=wo_bachelor['Uurloon werknemers na verlaten ho (euro)'],
    name='Wo-bachelor',
    marker_color='rgb(141,160,203)',
    hoverinfo='y+name'
    ),
    go.Bar(
    x=uurloon['Peilmoment'],
    y=wo_master['Uurloon werknemers na verlaten ho (euro)'],
    name='Wo-master',
    marker_color='rgb(231,138,195)',
    hoverinfo='y+name'
    )]

layout = go.Layout(
    title='Hourly wage after leaving university per degree',
    height=400,
    xaxis=go.layout.XAxis(
        title='Benchmark',
        type='category',
        tickvals = uurloongem['Peilmoment'].unique(),
        ticktext=['Directly after', '1 year after', '2 years after', '3 years after', '4 years after', '5 years after']
    ),
    yaxis=go.layout.YAxis(
        title='Hourly wage',
        tickprefix = '€', 
        tickformat = ',.'
    ),
    legend=dict(
            x=1.0,
            y=1.0,
            bgcolor='rgba(255, 255, 255, 0)',
            bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1,
)

fig = go.Figure(data=trace, layout=layout)
fig.show()

This barplot compares the hourly wage per degree. It's interesting to see that even though WO is a "higher" degree it initially earns less than the so called "lower" degrees.