# Example Use of ASAG2024

This is an example on how to use the ASAG2024 dataset and also shows how we calculated the Baseline results in our paper. 

The grades in the dataset are not normally distributed. This means that any model trained on it, will favor higher grades, since these appear more often. 
To counter this, we included weights per entry. This example will show you how to calculate the RMSE with these weights. 


First load the dataset:

In [1]:
import pandas as pd
import numpy as np

In [2]:
asag_data = pd.read_parquet("hf://datasets/Meyerger/ASAG2024/combined_asag2024.parquet")
asag_data.head()

  from .autonotebook import tqdm as notebook_tqdm


Unnamed: 0_level_0,question,provided_answer,reference_answer,grade,data_source,normalized_grade,weight
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,Explain why you got a voltage reading of 0 for...,terminal 1 is connected to terminal 2,Terminals 1 and 2 are connected,3.0,Beetle,1.0,0.00015
1,Explain why you got a voltage reading of 0 for...,they are connected,Terminals 1 and 2 are connected,3.0,Beetle,1.0,0.00015
2,Explain why you got a voltage reading of 0 for...,Because terminal 1 and 2 are connected to the ...,Terminals 1 and 2 are connected,2.0,Beetle,0.666667,0.000272
3,Explain why you got a voltage reading of 0 for...,terminal 1 and 2 have the same difference in e...,Terminals 1 and 2 are connected,1.0,Beetle,0.333333,0.000238
4,Explain why you got a voltage reading of 0 for...,the voltage was 0 because they were on the sam...,Terminals 1 and 2 are connected,2.0,Beetle,0.666667,0.000272


## Predicting a grade

This part you will probably do differently. Use a model to generate a prediction for each grade.
In this example, we calculate the average grade per `data_source` and then use that as the prediction. (We're giving every student the average grade.) 


The score that we will calculate from this will serve as a baseline that you should beat. This means that your score should be LOWER.

In [3]:
relevant_columns = asag_data[['data_source', 'normalized_grade']]
average_grade_per_data_source = relevant_columns.groupby("data_source").mean()
average_grade_per_data_source

Unnamed: 0_level_0,normalized_grade
data_source,Unnamed: 1_level_1
Beetle,0.666667
CU-NLP,0.279532
DigiKlausur,0.684985
Mohler,0.811746
SAF,0.761612
SciEntsBank,0.605054
Stita,0.666075


In [4]:
# We make a copy because we do not want to change the original asag_data dataframe 
predictions = asag_data.copy()
predictions["predicted_grade"] = predictions["data_source"].map(lambda data_source: average_grade_per_data_source.loc[data_source]["normalized_grade"])

predictions.head()

Unnamed: 0_level_0,question,provided_answer,reference_answer,grade,data_source,normalized_grade,weight,predicted_grade
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Explain why you got a voltage reading of 0 for...,terminal 1 is connected to terminal 2,Terminals 1 and 2 are connected,3.0,Beetle,1.0,0.00015,0.666667
1,Explain why you got a voltage reading of 0 for...,they are connected,Terminals 1 and 2 are connected,3.0,Beetle,1.0,0.00015,0.666667
2,Explain why you got a voltage reading of 0 for...,Because terminal 1 and 2 are connected to the ...,Terminals 1 and 2 are connected,2.0,Beetle,0.666667,0.000272,0.666667
3,Explain why you got a voltage reading of 0 for...,terminal 1 and 2 have the same difference in e...,Terminals 1 and 2 are connected,1.0,Beetle,0.333333,0.000238,0.666667
4,Explain why you got a voltage reading of 0 for...,the voltage was 0 because they were on the sam...,Terminals 1 and 2 are connected,2.0,Beetle,0.666667,0.000272,0.666667


## Calculating the weighted RMSE

Now that we have predictions, we can calculate the weighted RMSE as our score.   
Again, the score calculated in this example serves as the minimum and a real approach should achieve a LOWER score.

In [5]:
def weighted_squared_error(row):
    error = row["predicted_grade"] - row["normalized_grade"]
    squared_error = error ** 2
    return row["weight"] * squared_error

In [6]:
predictions["squared_error_portion"] = predictions.apply(weighted_squared_error, axis=1)
predictions.head()

Unnamed: 0_level_0,question,provided_answer,reference_answer,grade,data_source,normalized_grade,weight,predicted_grade,squared_error_portion
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,Explain why you got a voltage reading of 0 for...,terminal 1 is connected to terminal 2,Terminals 1 and 2 are connected,3.0,Beetle,1.0,0.00015,0.666667,1.668335e-05
1,Explain why you got a voltage reading of 0 for...,they are connected,Terminals 1 and 2 are connected,3.0,Beetle,1.0,0.00015,0.666667,1.668335e-05
2,Explain why you got a voltage reading of 0 for...,Because terminal 1 and 2 are connected to the ...,Terminals 1 and 2 are connected,2.0,Beetle,0.666667,0.000272,0.666667,3.353088e-36
3,Explain why you got a voltage reading of 0 for...,terminal 1 and 2 have the same difference in e...,Terminals 1 and 2 are connected,1.0,Beetle,0.333333,0.000238,0.666667,2.648025e-05
4,Explain why you got a voltage reading of 0 for...,the voltage was 0 because they were on the sam...,Terminals 1 and 2 are connected,2.0,Beetle,0.666667,0.000272,0.666667,3.353088e-36


In [7]:
# wMSE stands for "weighted Mean Squared Error"
wMSE_by_data_source = predictions[["data_source", "squared_error_portion"]].groupby("data_source").sum()

# wRMSE stands for "weighted Root Mean Squared Error"
wRMSE_by_data_source = wMSE_by_data_source.apply(np.sqrt)
wRMSE_by_data_source.rename(columns={"squared_error_portion": "wRMSE"}, inplace=True)

wRMSE_by_data_source.round(2)

Unnamed: 0_level_0,wRMSE
data_source,Unnamed: 1_level_1
Beetle,0.41
CU-NLP,0.4
DigiKlausur,0.45
Mohler,0.43
SAF,0.41
SciEntsBank,0.39
Stita,0.34


If you would like to compare multiple approaches without checking each `data_source` individually, you can use the average across all data sources.

In [8]:
wRMSE_by_data_source.mean().round(2)

wRMSE    0.4
dtype: float64