# CommonLit Readability
Currently, most educational texts are matched to readers using traditional readability methods or commercially available formulas. However, each has its issues. Tools like Flesch-Kincaid Grade Level are based on weak proxies of text decoding (i.e., characters or syllables per word) and syntactic complexity (i.e., number or words per sentence). As a result, they lack construct and theoretical validity. At the same time, commercially available formulas, such as Lexile, can be cost-prohibitive, lack suitable validation studies, and suffer from transparency issues when the formula's features aren't publicly available.

CommonLit, Inc., is a nonprofit education technology organization serving over 20 million teachers and students with free digital reading and writing lessons for grades 3-12. Together with Georgia State University, an R1 public research university in Atlanta, they are challenging Kagglers to improve readability rating methods.

In this competition, you’ll build algorithms to rate the complexity of reading passages for grade 3-12 classroom use. To accomplish this, you'll pair your machine learning skills with a dataset that includes readers from a wide variety of age groups and a large collection of texts taken from various domains. Winning models will be sure to incorporate text cohesion and semantics.

In [None]:
import numpy as np
import pandas as pd
import spacy as sp
import tensorflow as tf

import seaborn as sns
import matplotlib.pyplot as plt

## [Deep Evidential Regression](https://www.mit.edu/~amini/pubs/pdf/deep-evidential-regression.pdf)
Deterministic neural networks (NNs) are increasingly being deployed in safety critical domains, where calibrated, robust, and efficient measures of uncertainty are crucial. In this paper, we use a method for training non-Bayesian NNs to estimate a continuous target as well as its associated evidence in order to learn both aleatoric and epistemic uncertainty. We accomplish this by placing evidential priors over the original Gaussian likelihood function and training the NN to infer the hyperparameters of the evidential distribution.

In [None]:
class DenseNormalGamma(tf.keras.layers.Layer):
    """Implements dense layer for Deep Evidential Regression
    
    Reference: https://www.mit.edu/~amini/pubs/pdf/deep-evidential-regression.pdf
    Source: https://github.com/aamini/evidential-deep-learning
    """
    
    def __init__(self, units):
        super(DenseNormalGamma, self).__init__()
        self.units = int(units)
        self.dense = tf.keras.layers.Dense(4 * self.units, activation=None)

    def evidence(self, x):
        return tf.nn.softplus(x)

    def call(self, x):
        output = self.dense(x)
        mu, logv, logalpha, logbeta = tf.split(output, 4, axis=-1)
        v = self.evidence(logv)
        alpha = self.evidence(logalpha) + 1
        beta = self.evidence(logbeta)
        return tf.concat([mu, v, alpha, beta], axis=-1)

    def compute_output_shape(self, input_shape):
        return (input_shape[0], 4 * self.units)

    def get_config(self):
        base_config = super(DenseNormalGamma, self).get_config()
        base_config['units'] = self.units
        return base_config

In [None]:
def NIG_NLL(y, gamma, v, alpha, beta, reduce=True):
    twoBlambda = 2*beta*(1+v)

    nll = 0.5*tf.math.log(np.pi/v)  \
        - alpha*tf.math.log(twoBlambda)  \
        + (alpha+0.5) * tf.math.log(v*(y-gamma)**2 + twoBlambda)  \
        + tf.math.lgamma(alpha)  \
        - tf.math.lgamma(alpha+0.5)

    return tf.reduce_mean(nll) if reduce else nll

def NIG_Reg(y, gamma, v, alpha, beta, reduce=True):
    error = tf.abs(y-gamma)

    evi = 2*v+(alpha)
    reg = error*evi

    return tf.reduce_mean(reg) if reduce else reg

def EvidentialRegression(y_true, evidential_output, coeff=1.0):
    """Implements loss for Deep Evidential Regression
    
    Reference: https://www.mit.edu/~amini/pubs/pdf/deep-evidential-regression.pdf
    Source: https://github.com/aamini/evidential-deep-learning
    """
    
    gamma, v, alpha, beta = tf.split(evidential_output, 4, axis=-1)
    loss_nll = NIG_NLL(y_true, gamma, v, alpha, beta)
    loss_reg = NIG_Reg(y_true, gamma, v, alpha, beta)
    return loss_nll + coeff * loss_reg

## Data Preparation
First we start by loading in the training and test data and view the first 5 samples.

In [None]:
# load the training and test data
train = pd.read_csv('../input/commonlitreadabilityprize/train.csv').drop(columns = ['url_legal', 'license'])
test = pd.read_csv('../input/commonlitreadabilityprize/test.csv').drop(columns = ['url_legal', 'license'])

# display a sample of the training data
train.sample(5)

Next we convert the text snippets to feature vectors using the pre-trained spaCy model. Any other embedding model or feature engineering techniques can be used here as long as they are properly preprocessed and compatible with TensorFlow.

In [None]:
# load large English spacy model
nlp = sp.load('en_core_web_lg')

# get spacy embeddings for training data
with nlp.disable_pipes():
    train_vectors = pd.DataFrame(
        np.array([nlp(text).vector for text in train['excerpt']])
    )
    
# get spacy embeddings for test data
with nlp.disable_pipes():
    test_vectors = pd.DataFrame(
        np.array([nlp(text).vector for text in test['excerpt']])
    )

Now we train a simple dense neural network with the deep evidential loss described earlier.

In [None]:
# build model
model = tf.keras.Sequential(
    [
        tf.keras.layers.Dense(units =20, activation = 'relu'),
        tf.keras.layers.Dense(units = 10, activation = 'relu'),
        DenseNormalGamma(1)
        
    ]
)

# compile model
model.compile(
    optimizer = 'adam', 
    loss = EvidentialRegression,
    metrics = ['mse']
)

# create early stopping callback
callback1 = tf.keras.callbacks.EarlyStopping(
    monitor = 'val_loss', 
    mode = 'min',
    patience = 25,
    restore_best_weights = True
)

# create reduce LR callback
callback2 = tf.keras.callbacks.ReduceLROnPlateau(
    monitor = 'val_loss', 
    factor = 0.25, 
    patience = 5, 
    verbose = 0,
    mode = 'min'
)

# fit model to training data
history = model.fit(
    x = train_vectors, 
    y = train['target'], 
    validation_split = 0.2, 
    batch_size = 4,
    epochs = 500,
    callbacks = [callback1, callback2],
    verbose = 1
)

From the trained model, we can compute the predicted target as well as measures of aleatoric and epistemic uncertainty.

> [**Aleatoric uncertainty** is also known as statistical uncertainty, and is representative of unknowns that differ each time we run the same experiment. ... **Epistemic uncertainty** is also known as systematic uncertainty, and is due to things one could in principle know but do not in practice.](https://en.wikipedia.org/wiki/Uncertainty_quantification)

In [None]:
# compute predictions on training data
y_pred = model.predict(train_vectors)

# compute variance and std from learned parameters
mu, v, alpha, beta = (y_pred[:, i] for i in range(y_pred.shape[1]))

var_a = beta / (alpha - 1)
var_e = beta / (v * (alpha - 1))

The model performs decently well in predicting the expected target variable. However, there is definitely room for improvement as the predicted target distribution is a truncated Gaussian.

In [None]:
sns.jointplot(x = train['target'], y = mu, kind = 'hex')
plt.xlabel('Target')
plt.ylabel('Predicted Target')
plt.show()

The scale of aleatoric uncertainty is ~10 times smaller than the epistemic uncertainty, meaning that significant improvements can likely be made to the model without overfitting to the specific data. However, the scale of aleatoric uncertainty is still much larger than the scale of the target variable which is very worrying. This could indicate data quality issues of which there have been several examples posted.

In [None]:
sns.jointplot(x = np.sqrt(var_e), y = np.sqrt(var_a), kind = 'hex')
plt.xlabel('Epistemic Uncertainty')
plt.ylabel('Aleatoric Uncertainty')
plt.show()

Aleatoric uncertainty seems to increase with the target variable (i.e. texts labeled as 'easier to read' themselves are labeled more noisily).

In [None]:
sns.jointplot(x = train['target'], y = np.sqrt(var_a), kind = 'hex')
plt.xlabel('Target')
plt.ylabel('Aleatoric Uncertainty')
plt.show()

This trend is more pronounced when looking at the predicted target variable.

In [None]:
sns.jointplot(x = mu, y = np.sqrt(var_a), kind = 'hex')
plt.xlabel('Predicted Target')
plt.ylabel('Aleatoric Uncertainty')
plt.show()

There is no clear relationship between the target variable and epistemic uncertainty. Again, this is good news for indicating room to improve the model. The bad news is that the uncertainty is extremely high.

In [None]:
sns.jointplot(x = train['target'], y = np.sqrt(var_e), kind = 'hex')
plt.xlabel('Target')
plt.ylabel('Epistemic Uncertainty')
plt.show()

Nothing particularly noteworthy changes for the predicted target here.

In [None]:
sns.jointplot(x = mu, y = np.sqrt(var_e), kind = 'hex')
plt.xlabel('Predicted Target')
plt.ylabel('Epistemic Uncertainty')
plt.show()

There does not seem to be any dependence between RMSE and either type of uncertainty.

In [None]:
sns.lmplot(
    data = pd.DataFrame({
        'RMSE' : np.sqrt((train['target'] - mu)**2),
        'STD_A' : np.sqrt(var_a)
    }),
    x = 'RMSE',
    y = 'STD_A'
)
plt.xlabel('RMSE')
plt.ylabel('Aleatoric Uncertainty')
plt.show()

In [None]:
sns.lmplot(
    data = pd.DataFrame({
        'RMSE' : np.sqrt((train['target'] - mu)**2),
        'STD_E' : np.sqrt(var_e)
    }),
    x = 'RMSE',
    y = 'STD_E'
)
plt.xlabel('RMSE')
plt.ylabel('Epistemic Uncertainty')
plt.show()

In [None]:
# compute predictions
y_pred = model.predict(test_vectors)
mu, v, alpha, beta = (y_pred[:, i] for i in range(y_pred.shape[1]))
test['prediction'] = mu

In [None]:
# initialize dataframe to hold predictions
predictions = pd.DataFrame()

# add ID and final predicted target column
predictions['id'] = test['id']
predictions['target'] = test['prediction']

# save predictions to CSV for submission
predictions.to_csv('/kaggle/working/submission.csv', index = False)

# display first five predictions
predictions.head(5)