# ✏️ Using Convolution Neural Networks to Grade Essays📈

#### Application of image processing methods in natural language processing

This project uses one dimensional convolutions in order to predict scores for IELTS essay responses through a regression model.

### Dependencies

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv("/kaggle/input/ielts-writing-scored-essays-dataset/ielts_writing_dataset.csv")

# Data Preprocessing 

In [None]:
df.head(5)

### Removing Outliers

In [None]:
lengths = [len(i) for i in df["Essay"]]

sns.boxplot(lengths)

In [None]:
def lengthFilter(arr, maxLength):
    returnArr = []
    for i in range(len(arr)):
        if len(arr[i]) < maxLength:
            returnArr.append(i)    
    
    return np.array(returnArr)

In [None]:
filteredLength = lengthFilter(df["Essay"], 3000)

X = np.array(df["Essay"])[filteredLength]
y = np.array(df["Overall"])[filteredLength]


In [None]:
lengths = [len(i) for i in X]

sns.boxplot(lengths)

### Encoding and Zero Padding

- Charecter level encoding of words into a 1 dimensional vector using a vocabulary of size 166
- Zero padding the vectors to create array of uniform dimension

In [None]:
vocab = list(sorted(set(np.sum(X))))
vocab = {vocab[i]:i for i in range(len(vocab))}

In [None]:
def tokenizeZeroPadding(text):
    averageLength = []
    textNums = []
    for i in range(len(text)):
        nums = [vocab[j] for j in list(text[i])]
        averageLength.append(len(nums))
        numsLength = len(nums)

        missingElements = 3000 - numsLength #make sure that the right amount of zeros are being added
        zeroArray = list(np.zeros(missingElements))
        nums = nums + zeroArray
        textNums.append(nums)
        
    return averageLength, textNums

In [None]:
averageLength, textNums = tokenizeZeroPadding(X)

In [None]:
X = np.array(textNums)

### Regualarization of Input Data

In [None]:
from sklearn.preprocessing import MinMaxScaler

scalerX = MinMaxScaler()

X = scalerX.fit_transform(X)

### Splitting Dataset

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=42, shuffle=True)

# Modelling using Neural Network

Use 1D convolutions to find and encode patterns in the text sequences into filters. This allows for the extraction of higher level features like relationship between letters, words and sentences depending on the filter size
- In this case, filter size was chosen to be 3 which allows for finding patterns between letters. 

Encoding layer allows for further feature extraction through encoding the similarity and differnece of the data using vector embeddings. 
- Not a neccesity in this case, but allows for the model to capture similarity between charecters
- Embedding layer will retain the sequence length but express every element as a higher dimensional vector in order to represent similarity through spatial proximity

Feature maps of the convolutions are used as input to a deep neural network to perform regression.

In [None]:
import tensorflow as tf

inputLength = 3000
vocabSize = len(vocab)

in1 = tf.keras.Input(shape=(inputLength,))
m = tf.keras.layers.Embedding(input_dim = vocabSize, output_dim=32, input_length=inputLength)(in1) #embedding will retain the sequence length but increase the dimensions of the 

m = tf.keras.layers.Conv1D(filters=32, kernel_size=3, padding='same', activation='relu')(m)
m = tf.keras.layers.MaxPooling1D(pool_size=2)(m)
m = tf.keras.layers.Flatten()(m)


m = tf.keras.layers.Dense(16, activation='relu')(m)
m = tf.keras.layers.Dropout(0.3)(m)
m = tf.keras.layers.Dense(8, activation='relu')(m)
m = tf.keras.layers.Dropout(0.3)(m)
out1 = tf.keras.layers.Dense(1, activation='linear')(m) #multiclass classification

model = tf.keras.Model(inputs=in1, outputs=out1)


model.summary()

In [None]:
model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=tf.keras.optimizers.Adam(), 
              metrics=[tf.keras.metrics.RootMeanSquaredError()])
early = tf.keras.callbacks.EarlyStopping(monitor='loss', mode='min', verbose=1)

In [None]:
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

In [None]:
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_test, y_test), batch_size=5, callbacks=[early], verbose=1)

In [None]:
sns.lineplot(history.history['loss'], label="loss")
sns.lineplot(history.history['val_loss'], label="val loss")
plt.legend()

# Evaluation, Visualization and Prediction

In [None]:
predictions = model.predict(X_test)

In [None]:
np.sum((y_test - predictions.reshape(-1,))**2) / y_test.shape[0] #Validation Mean Sqaured Error

### Filter Visualization

In [None]:
for layer in model.layers:
    if "conv" in layer.name:
        filters , bias = layer.get_weights()

In [None]:
filters.shape #(filter size, filters per sequence, total number of 

In [None]:
from matplotlib import pyplot as plt

In [None]:
fig, axs = plt.subplots(1, 5)

for i in range(5):
    axs[i].imshow(filters[:,:,i].T, interpolation='nearest')

plt.show()
    

The visualiation above is the first 5 filters of 32 filters in the convolution layer. These have the shape of (3, 32) as the filter size is 3 elements with each element of the text sequence having the embedding shape (32,1). 

Further extensions could include trying to visualize the feature maps maps by only using the input, embedding and convlution layers which will create a model that outputs the feature map for an input text sequence. Extracting feature map predictions from the convolution part of the model using input data examples will allow for analysis of the types of features extracted by the model. 

Input (3000) -> Embedding (3000,32) -> Convolution (3000,32) -> Output (3000,32)

The output will be 32 (3000,1) feature maps corresponding to the output of each of the filters (is this the right interpretation?)

# Further Work


- Feature engineer different metrics from the raw text like the kinds of words used, word length, number of sentences ect. in order for find correlations in the data for higher scores. 
- Train hyperparameters including batch size, learning rate.
- Change filter size to extract word or sentence level information using additional convolution layers. This could include building a stack of convolutions that first find charecter level abstractions and use this to find higher level word and sentence level abstractions.
- Chaage the model archeture to decrease the loss