# Practice notebook
Please enter the necessary code in the cells below. Follow the instructions in the comment for each cell#, they tell you what the code should _do_ but not _how_ to do it. This is called **pseudocode**: Step-by-step instructions for what the code should do, leaving it up to you to write the executable code.

In [None]:
# Import pandas and seaborn with the abbreviations pd and sns
import pandas as pd
import seaborn as sns
# Seaborn display options
sns.set_theme()
sns.set_context('talk')

Try loading a single sheet from the workbook students.xlsx

In [None]:
# Load the students.xlsx spreadsheet from the data-subfolder (look on GitHub)
# When loading the dataframe:
#  Specify the header-row
#  Specify which sheet you're loading (there are three)

In [None]:
# display the first few rows of your dataframe

Now, load all of the sheets and use pandas' `concat()` function in combination with the `read_excel()` function to join them together so that you get all three years of test scores in one table. 

For reference: Here's a great, simple guide for how to [combine multiple Excel worksheets in a Pandas dataframe](https://pbpython.com/pandas-excel-tabs.html).

In [None]:
# Load all the sheets into one dataframe, using concat()
df = pd.concat(pd.read_excel(r'data\students.xlsx', header=1, sheet_name=None), ignore_index=True)

In [None]:
df[:5]

In [None]:
df['Score'].unique()

In [None]:
# The Score-column in the dataset contains letter-values. 
# Use replace() to convert the letter-values into numeric values.
# Hint: replace() allows you to use a dictionary to replace systematically (e.g, {'A':5, 'B':4, etc.})
df['Score'].replace({'A':4, 'B':3, 'C':2, 'D':1, 'F':0}, inplace=True)

In [None]:
df.head()

# Reshape long to wide
Use `pd.pivot()` to reshape a dataframe from long to wide.

## Example 1: Separate columns by subject


In [None]:
# Declare a separate dataframe. Use pivot() 
# to spread the Subject-column into separate columns, 
# and calculate the average for each subject
df_wide = pd.pivot(
    df,
    index=['Student', 'Year '],
    columns=['Subject'],
    values='Score').reset_index()

In [None]:
df_wide

Bonus: Show average for selected subjects (by column):

In [None]:
subject_list = ['English', 'Maths', 'Skateboarding']
for s in subject_list:
    mean_s = df_wide[s].mean()
    print(f'Average for {s} is: {mean_s}')

## Example 2: Columns by year

In [None]:
# Declare a new dataframe. Spread the Year-column into separate columns.
# Sort the dataframe by student name and subject, using sort_values()
df_wide2 = pd.pivot(
    df,
    index=['Student', 'Subject'],
    columns='Year ',
    values='Score'
    ).reset_index().sort_values(by=['Student', 'Subject'])

In [None]:
df_wide2

# Reshape wide to long
Use `pd.melt()` to bring multiple columns (the year-columns in this case) into one, transforming the table from wide to long.

In [None]:
df_wide2.keys()

In [None]:
df2_long = pd.melt(df_wide2, id_vars=['Student', 'Subject'], value_vars=[2020, 2021, 2022])

In [None]:
df2_long

In [None]:
# Sort df2_long by student, subject, year
df2_long.sort_values(
    by=['Student', 'Subject', 'Year '],
    ascending=True,
    inplace=True
    )

In [None]:
df2_long

# Visualisations

## Categorical bar graph
For visualising categorical data (e.g., by year, subject, student) see this [Seaborn tutorial](https://seaborn.pydata.org/tutorial/categorical.html)

In [None]:
# Use sns.catplot() to visualise scores by student and/or by year
sns.catplot(data=df, x='Year ', y='Score', hue='Student', kind='bar', aspect=2)

## Line graph

In [None]:
df2_long.keys()

In [None]:
df2_long.head()

In [None]:
# Line graph from the df2_long
sns.relplot(data=df2_long, kind='line', x='Year ', y='value', hue='Student', row='Subject')

## Slopegraph
Turn this into a **slopegraph** (visualising change between two points) by filtering out the middle year:

In [None]:
# Select only 2020 and 2022 results
df2_long = df2_long[(df2_long['Year ']==2020) | (df2_long['Year ']==2022)] # include 2020 or 2022
# Repeat the line graph code above
sns.relplot(data=df2_long, kind='line', x='Year ', y='value', hue='Student', row='Subject')