# Artificial Neural Network for Automatic Short Answer Grading
Automatic Short Answer Grading (ASAG) is the task of implementing a system that automatically assigns a class value related to the quality of an short-answer question (correct - incorrect). The aim of this project is to take a question from the ScientistBank dataset for train an Artifical Neural Network (ANN) that clasifies the answers of the students. The architecture is based on the model of  [Alikaniotis(2016)](https://arxiv.org/abs/1606.04289)

# Estructura trabajo
## Planteamiento proyecto
Redes neuronales para calificación automática de respuestas cortas
## Diseño experimental con diferntes parámetros (variable independiente factor)
Valores de variable independiente por factores: cada setup experimental es una forma de la variable independiente

- Entrenar word embeddings
- Comenzar con un perceptrón y terminar con una red LSTM (mirar otras arquitecturas posibles)
- Combinar otros métodos (SVM...)

## Definir medida de desempeño (variable dependiente)
AUC(ROC),Concordancias entre jueces, coeficientes de correlación.


In [83]:
# Libraries
import xml.etree.ElementTree as ET
import sswe
import pandas as pd
import numpy as np
import keras
import theano
import nltk
import matplotlib.pyplot as plt
%matplotlib inline


# Embeddings
from nltk.tokenize import word_tokenize
from sklearn.cross_validation import StratifiedKFold
from sklearn.metrics import f1_score as f1
from sklearn import cross_validation
import re
from nltk.util import ngrams
from collections import *
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

# Clasifiers
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC


# LSTM 
from __future__ import print_function
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.layers import LSTM
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
from keras.layers.wrappers import TimeDistributed
import numpy as np
import random
import sys


## Load Data


In [75]:
tree = ET.parse('PS-inv1-2a_ScientisTrain.xml')
root = tree.getroot()
question=root[0].text
grade=[branch.attrib["accuracy"] for branch in root[2]]
answers_st=[branch.text for branch in root[2]]
answers_ref=[branch.text for branch in root[1]]### If there are more of 1 reference answer it will give an array of an answers. Make sure you integrate it before processing

In [39]:
califs=[1 if resp=="correct" else 0 for resp in grade]

In [40]:
question

'Why does a rubber band make a sound when you pluck it (pull and let go quickly)?'

In [41]:
answers_ref

['The rubber band vibrates.']

In [42]:
len(answers_st)
anwsr_refArry=[answers_ref[0] for answer in answers_st]

In [43]:
pd.DataFrame(np.column_stack([answers_st,grade]),columns=["Respuesta","Calif"])[:15]
#answers_st[:25]

Unnamed: 0,Respuesta,Calif
0,Because it is sticky!,incorrect
1,Because it hits the other side.,incorrect
2,Because vibration.,correct
3,It makes the plucking sound from stretching an...,correct
4,Because if you stretch it and let it go it mak...,incorrect
5,Plucking is pull.,incorrect
6,Because it hits the other part of the rubber b...,incorrect
7,Because it vibrates.,correct
8,Because your making vibrations.,correct
9,Because it vibrates.,correct


## Preprocessing
- stemming
- tolower
- tf-idf(opcional)

- http://www.nltk.org/book/ch03.html
- https://de.dariah.eu/tatom/preprocessing.html

- Query expantion
wordnet


## Baseline methods
- wordcount con word2vec (preentrenado google news)
- ASOBEK
- c&w (???)

### word2vec Embeddings

In [44]:
import gensim, logging

In [45]:
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
model = gensim.models.Word2Vec(answers_st)

In [61]:
model = gensim.models.Word2Vec(iter=1)  # an empty model, no training yet
model.build_vocab(answers_st)  # can be a non-repeatable, 1-pass generator
w2v_emb=model.train(answers_st)  # can be a non-repeatable, 1-pass generator

### ASOBEK Embedding


ASOBEK feature extractor

In [51]:
def encode_asobek(dataA, dataB):
    '''
    Takes the paraphrases and returns the asobek features ## (Array de respuestas)
    
    Arguments
    --
    dataA: List containing the source sentences of paraphrasing
    dataB: List containing the candidate sentences of paraphrasing
    
    Returns
    --
    [Unigram word features, Bigram word features, Unigram character features, Bigram character features]
    '''
    if len(dataA) != len(dataB):
        print ('Check length of your data')
        return
    features = []
    def get_cardinalities(ngramA, ngramB):
        vector = []
        vector.append(union(ngramA, ngramB))
        vector.append(intersect(ngramA, ngramB))
        vector.append(set(ngramA))
        vector.append(set(ngramB))
        return vector
    
    for x in np.arange(len(dataA)):
        unigram_1 = get_wordngram(dataA[x],1)
        unigram_2 = get_wordngram(dataB[x],1)
        bigram_1 = get_wordngram(dataA[x],2)
        bigram_2 = get_wordngram(dataB[x],2)
        unigram_c_1 = get_characterngram(dataA[x],1)
        unigram_c_2 = get_characterngram(dataB[x],1)
        bigram_c_1 = get_characterngram(dataA[x],2)
        bigram_c_2 = get_characterngram(dataB[x],2)
        w1 = [len(x) for x in get_cardinalities(unigram_1, unigram_2)]
        w2 = [len(x) for x in get_cardinalities(bigram_1, bigram_2)]
        c1 = [len(x) for x in get_cardinalities(unigram_c_1, unigram_c_2)]
        c2 = [len(x) for x in get_cardinalities(bigram_c_1, bigram_c_2)]
        features.append([w1, w2, c1, c2])
    return features

And these are some helpers for encode_asobek

In [52]:
def union(list1, list2):
    cnt1 = Counter()
    cnt2 = Counter()
    for tk1 in list1:
        cnt1[tk1] += 1
    for tk2 in list2:
        cnt2[tk2] += 1
    inter = cnt1 | cnt2
    return set(inter.elements())
def intersect (list1, list2) :
    cnt1 = Counter()
    cnt2 = Counter()
    for tk1 in list1:
        cnt1[tk1] += 1
    for tk2 in list2:
        cnt2[tk2] += 1
    inter = cnt1 & cnt2
    return list(inter.elements())

def get_characterngram(string, n):
    '''Returns n-grams of characters'''
    char1 = [c for c in string]
    return list(ngrams(char1, n))

def get_wordngram(string, n):
    '''Returns n-grams of words'''
    words = word_tokenize(string)
    return list(ngrams(words, n))

#### Elaborate ASOBEK features

In [53]:
dataA, dataB = [], []
#decode("utf-8") is really important
#Preprocessing: Only lowercase all the words
dataA = [' '.join(word_tokenize(x.decode("utf-8").lower())) for x in anwsr_refArry]# Respuesta referencia - Tantas como rtas de estudiantes hayan
dataB = [' '.join(word_tokenize(x.decode("utf-8").lower())) for x in answers_st]# Respuesta estudiantes
X_train = encode_asobek(dataA, dataB)

In [54]:
X_train[0]

c1w2train = np.array([x[2]+x[1] for x in X_train])#Here the index indicates the position w1=0, w2=1, c1=2, c2=3
w1w2train = np.array([x[0]+x[1] for x in X_train])
c1c2train = np.array([x[2]+x[3] for x in X_train])
w2c2train = np.array([x[1]+x[3] for x in X_train])

training_combinations=[c1w2train, w1w2train, c1c2train, w2c2train]

In [58]:
print ('Evaluating...')
scaling = True
for i in range(len(training_combinations)):
    clf = LogisticRegression(C=1)
    if scaling:
        scaler = StandardScaler()
        X = scaler.fit_transform(training_combinations[i])
        clf.fit(X, califs)
        X_test = scaler.transform(training_combinations[i])
    else:
        clf.fit(training_combinations[i],califs)
        X_test = training_combinations[i]
    yhat = clf.predict(X_test)
    print ('Test accuracy: ' + str(clf.score(X_test,califs)))
    print ('Test F1: ' + str(f1(califs, yhat)))

Evaluating...
Test accuracy: 0.769230769231
Test F1: 0.75
Test accuracy: 0.644230769231
Test F1: 0.564705882353
Test accuracy: 0.836538461538
Test F1: 0.828282828283
Test accuracy: 0.903846153846
Test F1: 0.901960784314


In [88]:
X_train = w2c2train
X_test = w2c2train
y_train = califs
y_test = califs

# Create classifiers
lr = LogisticRegression()
gnb = GaussianNB()
svc = LinearSVC(C=1.0)
rfc = RandomForestClassifier(n_estimators=100)

for clf, name in [(lr, 'Logistic'),
                  (gnb, 'Naive Bayes'),
                  (svc, 'Support Vector Lineal Classificator'),
                  (rfc, 'Random Forest')]:
    clf.fit(X_train, y_train)
    print (name,clf.score(X_test,califs))
    

Logistic 0.903846153846
Naive Bayes 0.730769230769
Support Vector Lineal Classificator 0.865384615385
Random Forest 1.0


In [92]:
print ('Evaluating...')
scaling = True
for i in range(len(training_combinations)):
    # Create classifiers
    lr = LogisticRegression()
    gnb = GaussianNB()
    svc = LinearSVC(C=1.0)
    rfc = RandomForestClassifier(n_estimators=100)

    scaler = StandardScaler()
    X = scaler.fit_transform(training_combinations[i])

    for clf, name in [(lr, 'Logistic'),
                      (gnb, 'Naive Bayes'),
                      (svc, 'Support Vector Lineal Classificator'),
                      (rfc, 'Random Forest')]:
        clf.fit(X, y_train)
        yhat = clf.predict(X)
        print ("training_combination", i)
        print (name,clf.score(X,califs))
        print ('Test F1: ' + str(f1(califs, yhat)))

Evaluating...
training_combination 0
Logistic 0.769230769231
Test F1: 0.75
training_combination 0
Naive Bayes 0.798076923077
Test F1: 0.774193548387
training_combination 0
Support Vector Lineal Classificator 0.759615384615
Test F1: 0.747474747475
training_combination 0
Random Forest 0.990384615385
Test F1: 0.989898989899
training_combination 1
Logistic 0.644230769231
Test F1: 0.564705882353
training_combination 1
Naive Bayes 0.625
Test F1: 0.541176470588
training_combination 1
Support Vector Lineal Classificator 0.653846153846
Test F1: 0.581395348837
training_combination 1
Random Forest 0.807692307692
Test F1: 0.811320754717
training_combination 2
Logistic 0.836538461538
Test F1: 0.828282828283
training_combination 2
Naive Bayes 0.798076923077
Test F1: 0.792079207921
training_combination 2
Support Vector Lineal Classificator 0.846153846154
Test F1: 0.84
training_combination 2
Random Forest 1.0
Test F1: 1.0
training_combination 3
Logistic 0.903846153846
Test F1: 0.901960784314
training_

#### tf-idf for trainning neural network

http://scikit-learn.org/stable/modules/feature_extraction.html

http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html


In [67]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

In [70]:
vectorizer = CountVectorizer(min_df=1)
tf_st = vectorizer.fit_transform(answers_st)
transformer = TfidfTransformer(smooth_idf=False)
tfidf_st = transformer.fit_transform(tf_st)
tfidf_st.toarray().shape

In [98]:
X_train = tfidf_st.toarray()
X_test = tfidf_st.toarray()

In [103]:
# Create classifiers
lr = LogisticRegression()
gnb = GaussianNB()
svc = LinearSVC(C=1.0)
rfc = RandomForestClassifier(n_estimators=100)

for clf, name in [(lr, 'Logistic'),
                  (gnb, 'Naive Bayes'),
                  (svc, 'Support Vector Lineal Classificator'),
                  (rfc, 'Random Forest')]:
    clf.fit(X_train, y_train)
    yhat = clf.predict(X_train)
    print (name, "= Score:",clf.score(X_train,califs),"F1 score:",f1(califs, yhat))

Logistic = Score: 0.980769230769 F1 score: 0.979166666667
Naive Bayes = Score: 0.951923076923 F1 score: 0.95145631068
Support Vector Lineal Classificator = Score: 1.0 F1 score: 1.0
Random Forest = Score: 1.0 F1 score: 1.0


# ANN for clasification
### Long-short term memory network


## Validation

In [59]:
np.array(training_combinations).shape


(4, 104, 8)

In [60]:
np.array(w2c2train).shape

(104, 8)