# Comparison
This notebook will take the calculated SHAP-values and the dataset. Based on the SHAP-values, the most important features will be selected. We will then compare the average G3 grade of the students by applying a filter for the important features (e.g. students with <2 absences and those with >=2 absences). We will then calculate the average G3 grade for both groups and compare them using a t-test to see if the difference is statistically significant. 

By doing this, we can come up with advice for students on how to improve their grades.

In [34]:
import numpy as np
import pandas as pd

In [35]:
# Read the data
maths = pd.read_csv("data/Maths.csv")
portugese = pd.read_csv("data/Portuguese.csv")

# Combine the two datasets
data = pd.concat([maths, portugese])

# Read the SHAP values
shap_values = pd.read_csv("results.csv")

# Normalize the SHAP values so that they sum to 1 per row. Also use absolute values.
names = shap_values.iloc[:, 0]
numeric_data = shap_values.iloc[:, 1:]
numeric_data = numeric_data.abs()
normalized_data = numeric_data.div(numeric_data.sum(axis=1), axis=0)
shap_values = pd.concat([names, normalized_data], axis=1)

# Split setup into three columns, split by "/"
shap_values[["Course", "Prediction type", "Experiment setup"]] = shap_values["Setup"].str.split("/", expand=True)

shap_values.head()


Unnamed: 0,Setup,age,Medu,Fedu,traveltime,studytime,failures,famrel,freetime,goout,...,activities_yes,nursery_yes,higher_yes,internet_yes,romantic_yes,G1,G2,Course,Prediction type,Experiment setup
0,Math/Binary/A,0.069024,0.00386,0.02078,0.021368,0.002704,0.131075,0.018134,0.042126,0.026291,...,0.003136,0.006176,0.026839,0.003016,0.001008,0.093396,0.278427,Math,Binary,A
1,Portugese/Binary/A,0.088529,0.000828,0.033998,0.020814,0.001016,0.033581,0.066963,0.012228,0.02878,...,0.001782,0.002745,0.02709,0.014752,0.006986,0.110664,0.049311,Portugese,Binary,A
2,Math/Binary/B,0.093496,0.017426,0.01683,0.012027,0.004601,0.12219,0.001367,0.052013,0.026773,...,0.007571,0.005298,0.000616,0.00768,0.015893,0.242301,0.0,Math,Binary,B
3,Portugese/Binary/B,0.049389,0.034969,0.030801,0.041139,0.050732,0.14756,0.053099,0.021522,0.00394,...,0.018369,0.020341,0.085143,0.009483,0.010487,0.084904,0.0,Portugese,Binary,B
4,Portugese/Binary/C,0.044217,0.053245,0.022359,0.057568,0.049941,0.120938,0.039288,0.022166,0.018022,...,0.017709,0.02275,0.062574,0.011689,0.016126,0.0,0.0,Portugese,Binary,C


# Average standard deviation
To see if different models find the same important features, we will calculate the average standard deviation of the SHAP-values for each feature. If the average standard deviation is low, we see that all models agree on the importance of the feature. If the average standard deviation is high, we see that the models disagree on the importance of the feature.

We must note that the SHAP-values for G1 and G2 are 0 for models that do not use these features. In these cases, we exclude the rows where they are 0 from the calculation of the average and standard deviation.

In [36]:
# For each column (expect the setup name columns), calculate the mean SHAP value and the standard deviation and print it
shap_values_avg = {}
for column in shap_values.columns[1:-3]:
    shap_values_avg[column] = {}

    # Exclude 0-values for G1 and G2 (because they are not used in the model)
    if column in ["G1", "G2"]:
        shap_values_avg[column]["avg"] = shap_values[shap_values[column] != 0][column].mean()
        shap_values_avg[column]["std"] = shap_values[shap_values[column] != 0][column].std()
    else:
        shap_values_avg[column]["avg"] = shap_values[column].mean()
        shap_values_avg[column]["std"] = shap_values[column].std()

# Order by average SHAP value before printing
shap_values_avg = {key: value for key, value in sorted(shap_values_avg.items(), key=lambda item: item[1]["avg"], reverse=True)}
for column in shap_values_avg:
    print(f"{column}: {shap_values_avg[column]['avg']} +/- {shap_values_avg[column]['std']}")

# Print average standard deviation for all setups combined
avg_std = np.mean([shap_values_avg[column]["std"] for column in shap_values.columns[1:-3]])
print(f"The average standard deviation is {avg_std}")


G2: 0.20163069066890035 +/- 0.12019813179400625
G1: 0.13975408501128553 +/- 0.12291084520818002
failures: 0.1045826435496951 +/- 0.10495680611006138
absences: 0.07886957989668228 +/- 0.08000729903821362
age: 0.05509473450643008 +/- 0.041050822330502834
freetime: 0.03314338357204017 +/- 0.029060825314933977
Medu: 0.02842204473413712 +/- 0.022975034010421776
higher_yes: 0.02817949215873684 +/- 0.032150927966379535
famrel: 0.026236680730043677 +/- 0.026208058410680087
Dalc: 0.02586998336146125 +/- 0.04043864633659654
traveltime: 0.025272500127587638 +/- 0.02348245545949017
goout: 0.025147231663798657 +/- 0.021465961863205166
school_MS: 0.023687548005078305 +/- 0.02547941651872839
Fedu: 0.023425602891929746 +/- 0.01718657981253667
Walc: 0.023363921210357713 +/- 0.020462217962178095
health: 0.020645961828504266 +/- 0.0134004582324558
studytime: 0.019893469537853814 +/- 0.01988935476305075
paid_yes: 0.019044839298371877 +/- 0.017474804899100686
romantic_yes: 0.01841039799868147 +/- 0.0260700

# Important features and their impact on the G3 grade
We will calculate the average G3 grade based on all students and then filter the students based on the important features. We consider a feature to be important if the average SHAP-value is higher than 0.05 (5%). 

For each of these features we will set the threshold to different values in the dataset and calculate the average G3 grade for the students that meet the threshold and those that do not. We will then compare the average G3 grade for both groups using a t-test to see if the difference is statistically significant.

By doing this, we can come up with conclusions on when a student should meet a certain threshold to improve their grades.

In [43]:
import scipy.stats as stats

# Take the features where the average SHAP value is above the SHAP value threshold (0.05)
shap_values_avg_above_threshold = {key: value for key, value in shap_values_avg.items() if value["avg"] > 0.05}

# We will loop over all of the features above the threshold and see the impact
for feature in shap_values_avg_above_threshold.keys():
    # Find all values for the feature
    feature_values = sorted(data[feature].unique().tolist())

    # Loop over all these feature values, and calculate average G3 for above and below this feature
    # We skip the first value, because the filter would cause all values to belong to the same group
    for value in feature_values[1:]:
        above = data[data[feature] >= value]
        below = data[data[feature] < value]

        above_avg = above["G3"].mean()
        below_avg = below["G3"].mean()

        # Calculate if the difference is significant using scipy
        t, p = stats.ttest_ind(above["G3"], below["G3"])

        print(f"{feature} >= {value}: {above_avg} vs {feature} < {value}: {below_avg} (p-value: {p}, significant: {p < 0.05})")


G2 >= 4: 11.54669260700389 vs G2 < 4: 0.0 (p-value: 3.0959446446392704e-43, significant: True)
G2 >= 5: 11.557935735150926 vs G2 < 5: 0.0 (p-value: 1.4387794667421668e-45, significant: True)
G2 >= 6: 11.712301587301587 vs G2 < 6: 1.6 (p-value: 2.2204865717363116e-67, significant: True)
G2 >= 7: 11.854969574036511 vs G2 < 7: 2.9193548387096775 (p-value: 8.808823436893096e-82, significant: True)
G2 >= 8: 12.08113804004215 vs G2 < 8: 4.090909090909091 (p-value: 8.378110260660868e-105, significant: True)
G2 >= 9: 12.436716077537058 vs G2 < 9: 5.631578947368421 (p-value: 2.5954124096869497e-126, significant: True)
G2 >= 10: 12.986754966887418 vs G2 < 10: 7.047781569965871 (p-value: 1.022267057939195e-147, significant: True)
G2 >= 11: 13.575079872204473 vs G2 < 11: 7.990521327014218 (p-value: 1.9863724558668515e-159, significant: True)
G2 >= 12: 14.268993839835728 vs G2 < 12: 8.771836007130124 (p-value: 6.67993594410918e-160, significant: True)
G2 >= 13: 14.919220055710307 vs G2 < 13: 9.4542