# Statistical comparison of models

In this notebook a statistical comparison of models is performed.

First, go to the parent directory so you can import all modules.

In [1]:
import sys
sys.path.insert(0,'..')

The metric used for comparison is Spearman's correlation.

In [2]:
from scipy.stats import spearmanr

metric = lambda predA, predB: abs(spearmanr(predA, predB)[0])

The statistical comparison will be performed with the bootstrap significance testing.

In [3]:
from comparison.bootstrap import bootstrap_significance_testing

## 1. Comparison of formulas

In [4]:
import pandas as pd

X_train = pd.read_csv("../features/weebit_train_with_features.csv", index_col=0)
X_test = pd.read_csv("../features/weebit_test_with_features.csv", index_col=0)

# get Y
y_train = X_train["Level"]
y_test = X_test["Level"]

# remove Y and Text columns 
X_train.drop(columns=['Text', 'Level'], inplace=True)
X_test.drop(columns=['Text', 'Level'], inplace=True)

# whole set
X = pd.concat([X_train, X_test]).reset_index(drop=True)
y = pd.concat([y_train, y_test]).reset_index(drop=True)

In [5]:
from formulas.readability_formulas import flesch, dale_chall, gunning_fog

X = flesch(X)
X = dale_chall(X)
X = gunning_fog(X)

### 1.1 Compare Flesch vs Dale-Chall

In [6]:
bootstrap_significance_testing(y, X['Flesch'], X['Dale_Chall'], metric, n=1000)

0.384

In [10]:
bootstrap_significance_testing(y, X['Gunning_fog'], X['Flesch'], metric, n=1000)

0.0