## Project Description

In this project we explore how to use basic natural languate processing and machine learning techniques to automatically grade essays (AES) in Brazilian Portuguese. In general, AES is a very difficult problem even in English. The available tools that can be used for Portuguese are even less. 

### Load Data

In [1]:
%qtconsole

In [2]:
# -*- coding: utf-8 -*-

In [241]:
import numpy as np
import pandas as pd
import re
import nltk

### Essay Question

In [244]:
question = pd.read_excel(io="C:/Users/alin/Documents/ORAnalytics/AES/data/Brazil_essay.xlsx", sheetname="Essay Question" )

In [245]:
question = question.Original[0]


In [246]:
print question

Conforme apresentado no vídeo da Revista Exame, o comportamento é muito importante no trabalho, nas organizações. Em sua opinião, o comportamento é responsabilidade da própria pessoa ou da empresa em que trabalha? O que as organizações podem fazer para ajudar seus colaboradores a desenvolverem comportamentos melhores e mais adequados às necessidades do trabalho?


###  Essays

In [247]:
essay_df = pd.read_excel(io="C:/Users/alin/Documents/ORAnalytics/AES/data/Brazil_essay.xlsx", sheetname="result" )

In [249]:
essay_df.columns = essay_df.columns.str.lower()

In [250]:
essay_df.head()

Unnamed: 0,user_id,pk1,title,average_score,resposta
0,1.071078,2500,ATIVIDADE 1,2.5,<p> Em minha opinião que quando se trata de co...
1,1.081038,4817,ATIVIDADE 1,2.5,"<p>Olá Patricia,</p> \n<p>Concordo com sua col..."
2,1.081924,7253,ATIVIDADE 1,2.5,<p>O comportamento do profissional dentro da e...
3,1.091905,11815,ATIVIDADE 1,2.5,"<p><span style=""font-family: georgia , palatin..."
4,1.101049,704153,ATIVIDADE 1,1.5,"<p><span style=""font-size: 10.0pt;font-family:..."


### Count # of paragraphs and remove html tags

In [251]:
essay_df['paragraphs'] = essay_df.apply(lambda r: len(re.findall(r'<p', r['resposta'])), axis = 1)

In [252]:
essay_df['text'] = essay_df.apply(lambda r: re.sub(r'<[^<>]*>', ' ', r['resposta']), axis = 1)



In [253]:
essay_df.head()

Unnamed: 0,user_id,pk1,title,average_score,resposta,paragraphs,text
0,1.071078,2500,ATIVIDADE 1,2.5,<p> Em minha opinião que quando se trata de co...,6,Em minha opinião que quando se trata de comp...
1,1.081038,4817,ATIVIDADE 1,2.5,"<p>Olá Patricia,</p> \n<p>Concordo com sua col...",4,"Olá Patricia, \n Concordo com sua colocação ..."
2,1.081924,7253,ATIVIDADE 1,2.5,<p>O comportamento do profissional dentro da e...,2,O comportamento do profissional dentro da emp...
3,1.091905,11815,ATIVIDADE 1,2.5,"<p><span style=""font-family: georgia , palatin...",4,\n Em um primeiro momento o comportamen...
4,1.101049,704153,ATIVIDADE 1,1.5,"<p><span style=""font-size: 10.0pt;font-family:...",2,Questões comportamentais estão relacionadas ...


### Create some basic features

In [254]:
essay_df['tokens'] = essay_df.apply(lambda r: nltk.wordpunct_tokenize(r['text']), axis = 1)
essay_df['nlp_text'] = essay_df.apply(lambda r: nltk.Text(r['tokens']), axis = 1) 


In [255]:
essay_df.head()

Unnamed: 0,user_id,pk1,title,average_score,resposta,paragraphs,text,tokens,nlp_text
0,1.071078,2500,ATIVIDADE 1,2.5,<p> Em minha opinião que quando se trata de co...,6,Em minha opinião que quando se trata de comp...,"[Em, minha, opinião, que, quando, se, trata, d...","(Em, minha, opinião, que, quando, se, trata, d..."
1,1.081038,4817,ATIVIDADE 1,2.5,"<p>Olá Patricia,</p> \n<p>Concordo com sua col...",4,"Olá Patricia, \n Concordo com sua colocação ...","[Olá, Patricia, ,, Concordo, com, sua, colocaç...","(Olá, Patricia, ,, Concordo, com, sua, colocaç..."
2,1.081924,7253,ATIVIDADE 1,2.5,<p>O comportamento do profissional dentro da e...,2,O comportamento do profissional dentro da emp...,"[O, comportamento, do, profissional, dentro, d...","(O, comportamento, do, profissional, dentro, d..."
3,1.091905,11815,ATIVIDADE 1,2.5,"<p><span style=""font-family: georgia , palatin...",4,\n Em um primeiro momento o comportamen...,"[Em, um, primeiro, momento, o, comportamento, ...","(Em, um, primeiro, momento, o, comportamento, ..."
4,1.101049,704153,ATIVIDADE 1,1.5,"<p><span style=""font-size: 10.0pt;font-family:...",2,Questões comportamentais estão relacionadas ...,"[Questões, comportamentais, estão, relacionada...","(Questões, comportamentais, estão, relacionada..."


#### Character counts

In [256]:
essay_df['chr_cnt'] = essay_df.apply(lambda r: len(r['text']), axis = 1)

#### Token counts (including stopwords)

In [257]:
essay_df['token_cnt'] = essay_df.apply(lambda r: len(r['tokens']), axis = 1)

#### Token counts (excluding stopwords)

In [258]:
stopwords = nltk.corpus.stopwords.words('portuguese')

In [259]:
essay_df['tokens_fld'] = essay_df.apply(lambda r: [w.lower() for w in r['tokens'] 
                                                   if w not in stopwords and not w.isnumeric() and len(w) > 1], axis = 1)

In [260]:
essay_df['token_cnt_fld'] = essay_df.apply(lambda r: len(r['tokens_fld']), axis = 1) 

#### Number of sentences and number of sentences longer than 250 characters

In [261]:
essay_df['sentences'] = essay_df.apply(lambda r: nltk.sent_tokenize(r['text']), axis = 1)

In [262]:
essay_df['sent_cnt'] = essay_df.apply(lambda r: len(r['sentences']), axis = 1)

In [263]:
essay_df['long_sent_cnt'] = essay_df.apply(lambda r: len([s for s in r['sentences'] if len(s) > 250]), axis = 1)

#### Average length (# of tokens) of sentence

In [264]:
essay_df['avg_sent_len'] = essay_df.apply(lambda r: float(r['token_cnt'] / r['sent_cnt']), axis = 1)

#### Number of words  that appear both in the question and the essay

In [265]:
question_token = set([w.lower() for w in nltk.wordpunct_tokenize(question) 
                      if w not in stopwords and not w.isnumeric() and len(w) > 1])

In [266]:
essay_df['question_tokens'] = essay_df.apply(lambda r: len(set(r['tokens_fld']).intersection(question_token)), axis = 1)

### Binary outcome

In [267]:
essay_df['pass'] = np.where(essay_df['average_score'] >= 2, 1,0)

### Train/test split


In [268]:
from sklearn.model_selection import train_test_split


In [269]:
train_df, test_df, _, _ = train_test_split(essay_df, essay_df['pass'], test_size = 0.3, random_state = 0)

In [270]:
y_train = train_df['pass']
z_train = train_df['average_score']
y_test = test_df['pass']
z_test = test_df['average_score']

### Features from above

In [271]:
features = ['chr_cnt', 'token_cnt', 'token_cnt_fld', 'sent_cnt', 'long_sent_cnt', 'avg_sent_len', 'question_tokens']
X_train_1 = train_df[features].values
X_test_1 = test_df[features].values

In [272]:
X_train_1lst = [X_train_1[:,i] for i in range(X_train_1.shape[1])]
X_test_1lst = [X_test_1[:,i] for  i in range(X_test_1.shape[1])]

### Features from POS Tagging

#### Create a tagger 

In [273]:
from nltk.corpus import floresta
Tagger0 = nltk.DefaultTagger('n')
def simplify_tag(t):
    if "+" in t:
        return t[t.index("+")+1:]
    else:
        return t
tsents = [[(w.lower(),simplify_tag(t)) for (w,t) in sent] for sent in floresta.tagged_sents() if sent]
Tagger1 = nltk.UnigramTagger(tsents, backoff=Tagger0)
Tagger2 = nltk.BigramTagger(tsents, backoff=Tagger1)

#### Add unigram and bigram pos tag features

In [274]:
all_unigram = set()
all_bigram = set()
train_unigrams = []
train_bigrams = []
for sents in train_df.sentences:
    tui = {}
    tbi = {}
    for sent in sents:
        tsent = Tagger2.tag(nltk.word_tokenize(sent))
        for i in range(len(tsent) - 1):
            t0 = tsent[i][1]
            t1 = tsent[i+1][1]
            all_unigram.add(t0)
            all_bigram.add((t0, t1))
            tui[t0] = tui[t0] + 1 if t0 in tui else 1
            tbi[(t0, t1)] = tbi[(t0, t1)] + 1 if (t0, t1) in tbi else 1
        t0 = tsent[len(tsent) - 1][1]
        all_unigram.add(t0)
        tui[t0] = tui[t0] + 1 if t0 in tui else 1
    train_unigrams.append(tui)
    train_bigrams.append(tbi)

Only keep those tag grams appear in at least 5 documents

In [275]:
L = 5
use_cnt = [(x, sum(x in tu for tu in train_unigrams)) for x in all_unigram]
use_unigram = set([x for (x,y) in use_cnt if y >L])
use_cnt = [(x, sum(x in tb for tb in train_bigrams)) for x in all_bigram]
use_bigram = set([x for (x,y) in use_cnt if y >L])


#### pos tag features of test data

In [276]:
test_unigrams = []
test_bigrams = []
for sents in test_df.sentences:
    tui = {}
    tbi = {}
    for sent in sents:
        tsent = Tagger2.tag(nltk.word_tokenize(sent))
        for i in range(len(tsent) - 1):
            t0 = tsent[i][1]
            t1 = tsent[i+1][1]
            if t0 in use_unigram:
                tui[t0] = tui[t0] + 1 if t0 in tui else 1
            if (t0, t1) in use_bigram:
                tbi[(t0, t1)] = tbi[(t0, t1)] + 1 if (t0, t1) in tbi else 1
        t0 = tsent[len(tsent) - 1][1]
        if t0 in use_unigram:
            tui[t0] = tui[t0] + 1 if t0 in tui else 1
    test_unigrams.append(tui)
    test_bigrams.append(tbi)

#### unigram pos tag features

In [277]:
train_unigram0 = [{u: tu[u] if u in tu else 0 for u in use_unigram} for tu in train_unigrams]

train_uni_mat = pd.DataFrame(train_unigram0).values

test_unigram0 = [{u: tu[u] if u in tu else 0 for u in use_unigram} for tu in test_unigrams]

test_uni_mat = pd.DataFrame(test_unigram0).values

In [278]:
X_train_2 = train_uni_mat
X_test_2 = test_uni_mat

#### bigram pos tag features

In [279]:
train_bigram0 = [{u: tu[u] if u in tu else 0 for u in use_bigram} for tu in train_bigrams]

train_bi_mat = pd.DataFrame(train_bigram0).values

test_bigram0 = [{u: tu[u] if u in tu else 0 for u in use_bigram} for tu in test_bigrams]

test_bi_mat = pd.DataFrame(test_bigram0).values

In [280]:
X_train_3 = train_bi_mat
X_test_3 = test_bi_mat

### Create term-document matrix

In [41]:
train_text = np.array(train_df['text'])
test_text = np.array(test_df['text'])

In [42]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

#### Count vector  ignoring tokens with doc-frequency < 5 and excluding stopwords

In [43]:
vect = CountVectorizer(min_df=5, stop_words = stopwords).fit(train_text)

In [44]:
X_train_4 = vect.transform(train_text)
X_test_4 = vect.transform(test_text)

#### Tfidf ignoring tokens with doc-frequency < 5 and excluding stopwords

In [45]:
vect = TfidfVectorizer(min_df=5, stop_words=stopwords).fit(train_text)

In [46]:
X_train_5 = vect.transform(train_text)
X_test_5 = vect.transform(test_text)

#### Feature sets

In [283]:
X_train_123 = np.concatenate((X_train_1, X_train_2, X_train_3), axis = 1)

In [284]:
X_test_123 = np.concatenate((X_test_1, X_test_2, X_test_3), axis = 1)

#### normalize

In [49]:
from sklearn.preprocessing import StandardScaler


In [285]:
scaler = StandardScaler().fit(X_train_1)
X_train_1n = scaler.transform(X_train_1)
X_test_1n = scaler.transform(X_test_1)

In [286]:
scaler = StandardScaler().fit(X_train_123)
X_train_123n = scaler.transform(X_train_123)
X_test_123n = scaler.transform(X_test_123)

In [102]:
def add_feature(X, feature_to_add):
    """
    Returns sparse feature matrix with added feature.
    feature_to_add can also be a list of features.
    """
    from scipy.sparse import csr_matrix, hstack
    return hstack([X, csr_matrix(feature_to_add).T], 'csr')

In [130]:
X_train_1lst = [X_train_1[:,i] for i in range(X_train_1n.shape[1])]
X_test_1lst = [X_test_1[:,i] for  i in range(X_test_1n.shape[1])]

In [131]:
X_train_14 = add_feature(X_train_4, X_train_1lst)
    

In [132]:
X_test_14 = add_feature(X_test_4, X_test_1lst)
X_train_15 = add_feature(X_train_5, X_train_1lst)
X_test_15 = add_feature(X_test_5, X_test_1lst)

### Try some basic models

#### constant model

In [57]:
from sklearn`.metrics import accuracy_score, mean_squared_error, f1_score, confusion_matrix

In [290]:
z_predict_b = np.ones(y_test.shape[0])*2.5
y_predict_b = np.ones(y_test.shape[0])
print accuracy_score(y_test, y_predict_b)
print np.sqrt(mean_squared_error(y_test, y_predict_b))

0.916666666667
0.288675134595


#### Logistic Regression

In [54]:
from sklearn.linear_model import LogisticRegression


In [287]:
model = LogisticRegression(class_weight={1:1.0, 0:3.8}, C=100.0)

#### use feature set 1

In [288]:
def check_result(y_true, y_predict):
    print accuracy_score(y_true, y_predict)
    print f1_score(y_true, y_predict)
    print confusion_matrix(y_true, y_predict)

In [289]:
model.fit(X_train_1n, y_train)
predictions1 = model.predict(X_test_1n)
check_result(y_test, predictions1)


0.906666666667
0.951048951049
[[  0  25]
 [  3 272]]


### use feature set 123

In [291]:
model = LogisticRegression(class_weight={1:1.0, 0:2}, C=1000.0)

In [292]:
model.fit(X_train_123n, y_train)
predictions2 = model.predict(X_test_123n)
check_result(y_test, predictions2)


0.75
0.854932301741
[[  4  21]
 [ 54 221]]


#### use feature set X4

In [293]:
model = LogisticRegression(class_weight={1:1.0, 0:5}, C=1000.0)
model.fit(X_train_4, y_train)
predictions3 = model.predict(X_test_4)
check_result(y_test, predictions3)


ValueError: Found input variables with inconsistent numbers of samples: [1009, 700]

#### use feature set X5

In [100]:
model = LogisticRegression(class_weight={1:1.0, 0:1}, C=1000.0)
model.fit(X_train_5, y_train)
predictions3 = model.predict(X_test_5)
check_result(y_test, predictions3)

0.833718244804
0.908163265306
[[  5  47]
 [ 25 356]]


#### use feature set X14

In [133]:
model = LogisticRegression(class_weight={1:1.0, 0:1}, C=1000.0)
model.fit(X_train_14, y_train)
predictions4 = model.predict(X_test_14)
check_result(y_test, predictions4)

0.817551963048
0.897001303781
[[ 10  42]
 [ 37 344]]


#### use feature set X15

In [134]:
model = LogisticRegression(class_weight={1:1.0, 0:2.5}, C=1000.0)
model.fit(X_train_15, y_train)
predictions5 = model.predict(X_test_15)
check_result(y_test, predictions5)

0.826789838337
0.903969270166
[[  5  47]
 [ 28 353]]


### Random Forest

In [128]:
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

#### use feature set 1

In [294]:
model = RandomForestClassifier(n_estimators=500, class_weight = {1:1, 0:20})
model.fit(X_train_1n, y_train)
predictions6 = model.predict(X_test_1n)
check_result(y_test, predictions6)

0.913333333333
0.954703832753
[[  0  25]
 [  1 274]]


In [295]:
model = RandomForestRegressor(n_estimators=500)
model.fit(X_train_1n, z_train)
pred = model.predict(X_test_1n)
print np.sqrt(mean_squared_error(z_test, pred))

#check_result(y_test, predictions6)

0.336921536452


In [153]:
predictions7 = np.array([1 if x >= 2 else 0 for x in pred])
check_result(y_test, predictions7)

0.849884526559
0.917825537295
[[  5  47]
 [ 18 363]]


#### Use feature set 123

In [158]:
model = RandomForestClassifier(n_estimators=500, class_weight = {1:1, 0:10})
model.fit(X_train_123n, y_train)
predictions8 = model.predict(X_test_123n)
check_result(y_test, predictions8)


0.879907621247
0.936117936118
[[  0  52]
 [  0 381]]


In [159]:
model = RandomForestRegressor(n_estimators=500)
model.fit(X_train_123n, z_train)
pred = model.predict(X_test_123n)
print np.sqrt(mean_squared_error(z_test, pred))
predictions8 = np.array([1 if x >= 2 else 0 for x in pred])
check_result(y_test, predictions8)

0.356999391385
0.879907621247
0.936117936118
[[  0  52]
 [  0 381]]


### use feature 14

In [161]:
model = RandomForestClassifier(n_estimators=500, class_weight = {1:1, 0:10})
model.fit(X_train_14, y_train)
predictions9 = model.predict(X_test_14)
check_result(y_test, predictions9)

0.877598152425
0.934648581998
[[  1  51]
 [  2 379]]


In [162]:
model = RandomForestRegressor(n_estimators=500)
model.fit(X_train_14, z_train)
pred = model.predict(X_test_14)
print np.sqrt(mean_squared_error(z_test, pred))
predictions10 = np.array([1 if x >= 2 else 0 for x in pred])
check_result(y_test, predictions10)

0.365751545531
0.877598152425
0.934809348093
[[  0  52]
 [  1 380]]


#### use feature 15

In [163]:
model = RandomForestClassifier(n_estimators=500, class_weight = {1:1, 0:10})
model.fit(X_train_15, y_train)
predictions11 = model.predict(X_test_15)
check_result(y_test, predictions11)

0.875288683603
0.933497536946
[[  0  52]
 [  2 379]]


In [164]:
model = RandomForestRegressor(n_estimators=500)
model.fit(X_train_15, z_train)
pred = model.predict(X_test_15)
print np.sqrt(mean_squared_error(z_test, pred))
predictions12 = np.array([1 if x >= 2 else 0 for x in pred])
check_result(y_test, predictions12)

0.357219999833
0.863741339492
0.926889714994
[[  0  52]
 [  7 374]]


### GBM

In [165]:
import h2o

In [166]:
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321. connected.


0,1
H2O cluster uptime:,2 mins 33 secs
H2O cluster version:,3.10.5.4
H2O cluster version age:,1 month and 1 day
H2O cluster name:,alin
H2O cluster total nodes:,1
H2O cluster free memory:,3.538 Gb
H2O cluster total cores:,4
H2O cluster allowed cores:,4
H2O cluster status:,"accepting new members, healthy"
H2O connection url:,http://localhost:54321


### Use feature set 1

In [None]:
from h2o.estimators.gbm import H2OGradientBoostingEstimator

In [169]:
y_train0 = y_train.reshape(y_train.shape[0],1)
z_train0 = z_train.reshape(z_train.shape[0],1)

In [218]:
train = np.concatenate((X_train_1n, y_train0), axis = 1)
train_hex = h2o.H2OFrame(train)
gbm = H2OGradientBoostingEstimator()
gbm.train(x = range(train.shape[1]-1), y = train.shape[1]-1, training_frame=train_hex)

test_hex = h2o.H2OFrame(X_test_1n)

pred = gbm.predict(test_hex)

pred1 = np.array([pred[i, 0] for i in range(y_test.shape[0])])

predictions13 = [1 if x > 0.5 else 0 for x in pred1]
check_result(y_test, predictions13)

Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
0.875288683603
0.933497536946
[[  0  52]
 [  2 379]]


In [219]:
train = np.concatenate((X_train_1n, z_train0), axis = 1)
train_hex = h2o.H2OFrame(train)
gbm = H2OGradientBoostingEstimator()
gbm.train(x = range(train.shape[1]-1), y = train.shape[1]-1, training_frame=train_hex)

test_hex = h2o.H2OFrame(X_test_1n)

pred = gbm.predict(test_hex)

pred1 = np.array([pred[i, 0] for i in range(y_test.shape[0])])
print np.sqrt(mean_squared_error(z_test, pred1))
predictions14 = [1 if x >= 2 else 0 for x in pred1]
check_result(y_test, predictions14)

Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
0.363471146243
0.866050808314
0.928217821782
[[  0  52]
 [  6 375]]


#### Use feature 123

In [220]:
train = np.concatenate((X_train_123n, y_train0), axis = 1)
train_hex = h2o.H2OFrame(train)
gbm = H2OGradientBoostingEstimator()
gbm.train(x = range(train.shape[1]-1), y = train.shape[1]-1, training_frame=train_hex)

test_hex = h2o.H2OFrame(X_test_123n)

pred = gbm.predict(test_hex)

pred1 = np.array([pred[i, 0] for i in range(y_test.shape[0])])

predictions15 = [1 if x > 0.5 else 0 for x in pred1]
check_result(y_test, predictions15)

Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
0.879907621247
0.936117936118
[[  0  52]
 [  0 381]]


In [221]:
train = np.concatenate((X_train_123n, z_train0), axis = 1)
train_hex = h2o.H2OFrame(train)
gbm = H2OGradientBoostingEstimator()
gbm.train(x = range(train.shape[1]-1), y = train.shape[1]-1, training_frame=train_hex)

test_hex = h2o.H2OFrame(X_test_123n)

pred = gbm.predict(test_hex)

pred1 = np.array([pred[i, 0] for i in range(y_test.shape[0])])
print np.sqrt(mean_squared_error(z_test, pred1))
predictions16 = [1 if x >= 2 else 0 for x in pred1]
check_result(y_test, predictions16)

Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
0.376590871913
0.854503464203
0.92095357591
[[  3  49]
 [ 14 367]]


### Use feature 4

In [226]:
X_train_4a = X_train_4.toarray()

In [228]:
X_test_4a = X_test_4.toarray()

In [229]:
train = np.concatenate((X_train_4a, y_train0), axis = 1)
train_hex = h2o.H2OFrame(train)
gbm = H2OGradientBoostingEstimator()
gbm.train(x = range(train.shape[1]-1), y = train.shape[1]-1, training_frame=train_hex)

test_hex = h2o.H2OFrame(X_test_4a)

pred = gbm.predict(test_hex)

pred1 = np.array([pred[i, 0] for i in range(y_test.shape[0])])

predictions16 = [1 if x > 0.5 else 0 for x in pred1]
check_result(y_test, predictions16)

Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
0.877598152425
0.934809348093
[[  0  52]
 [  1 380]]
