# Stance detection

After reviewing the latest literature on the SemEval2016, I think it's a good starting point to formulate the problem into a text classification with sentence-pair inputs (keeping it simple!). However, I suggest using pre-trained language models to generate meaningful sentence embeddings, rather than training the model from scratch on the available data. 

The language models used are:<br>
1- Google's BERT model [1]. Bidirectional Transformers for Language Understanding [2] is arguably the best pre-trained language model available; capable of achieving state-of-the-art results in various NLP tasks. 

2- Flair embeddings [3,4]. Contextual String Embeddings for Sequence Labeling is currently the state-of-the-art [4] system in Named Entity Recognition task, and the only system outperforming Google's BERT model in this application. 

Both models are expensive to use (especially on my potato laptop), however, using them improve my chances of achieving better results. I also wanted an excuse to play with them :)

The suggested architecture looks like this..



I experiment with three types of embeddings;<br>
1- Flair's Document Pool Embeddings<br>
2- Flair's Document LSTM Embeddings<br>
3- Google's BERT Embeddings<br>

At the end of this code, I suggest further improvements that can help improve the obtained results.

Requirements to run this code:
- python 3.6
- bert
- flair
- pytorch
- tensorflow

[1] https://github.com/google-research/bert<br>
[2] https://arxiv.org/abs/1810.04805<br>
[3] https://github.com/zalandoresearch/flair<br>
[4] https://drive.google.com/file/d/17yVpFA7MmXaQFTe-HDpZuqw9fJlmzg56/view<br>
[5] https://github.com/zalandoresearch/flair#comparison-with-state-of-the-art<br>


## Initlaization

In [1]:
import pandas as pd
import csv
import random
from pathlib import Path
import torch
import torch.nn as nn
import numpy as np
import pickle
import json
import time
import gc
import os
import joblib

## Reading/Inspecting the dataset

In [2]:
def _check_dir(_dir):
    output_dir = Path(_dir)
    if not output_dir.exists():
        output_dir.mkdir()

# path to SemEval dataset
dataset_path = 'Dataset/'

# creating a dir for data and embeddings
_check_dir('data')
_check_dir('embeddings')

#=------------------------------------------------=#
## Training data
Training_data = []

with open(dataset_path + 'SemEval2016-Task6-subtaskA-traindata-gold.csv', 'r',  encoding="iso-8859-1") as fin:
    reader = csv.reader(fin, quotechar='"')
    columns = next(reader)
    for line in reader:
        Training_data.append(line)
        
train_df = pd.DataFrame(Training_data, columns=columns)
classes = list( set(train_df['Stance']) )

print('Training data has %d instances' %(len(Training_data,)))
print(train_df['Target'].value_counts(), '\n')

#=------------------------------------------------=#
## Test data
Test_data = []

with open(dataset_path + 'SemEval2016-Task6-subtaskA-testdata-gold.txt', 'r',  encoding="iso-8859-1") as fin:
    reader = csv.reader(fin, delimiter='\t')
    columns = next(reader)
    for line in reader:
        Test_data.append(line)

test_df = pd.DataFrame(Test_data, columns=columns)

print('Test data has %d instances' %(len(Test_data,)))
print(test_df['Target'].value_counts())

Training data has 2914 instances
Hillary Clinton                     689
Feminist Movement                   664
Legalization of Abortion            653
Atheism                             513
Climate Change is a Real Concern    395
Name: Target, dtype: int64 

Test data has 1249 instances
Hillary Clinton                     295
Feminist Movement                   285
Legalization of Abortion            280
Atheism                             220
Climate Change is a Real Concern    169
Name: Target, dtype: int64


In [3]:
##== prepare data for Flair ==##

# creating a dir for data
path_save_data = 'data/Flair'
_check_dir(path_save_data)

# reading training data
Targets = train_df['Target'].values
Tweets = train_df['Tweet'].values
Stances = train_df['Stance'].values
data = [[stance, target, tweet] for stance, target, tweet in zip(Stances, Targets, Tweets)]

random.shuffle(data)    # shuffling the data is always good to preven overfitting

# dividing the data into trainig (90%), validation (10%).
split_ = int(0.1 * len(data))
TRAIN_DATA, VAL_DATA = data[:9*split_], data[9*split_:]

# reading testing data
Targets = test_df['Target'].values
Tweets = test_df['Tweet'].values
Stances = test_df['Stance'].values
TEST_DATA = [[stance, target, tweet] for stance, target, tweet in zip(Stances, Targets, Tweets)]

# print the amount of data in each
print('  *Training has (',len(TRAIN_DATA),') instances.')
print('  *Validation has (',len(VAL_DATA),') instances.')
print('  *Test has (',len(TEST_DATA),') instances.')

# dump all data for future use
for name, data in zip(['train','val','test'],[TRAIN_DATA, VAL_DATA, TEST_DATA]):
    pickle.dump(data, open(path_save_data+'/'+name+'.p','wb'))


  *Training has ( 2619 ) instances.
  *Validation has ( 295 ) instances.
  *Test has ( 1249 ) instances.


In [25]:
##== prepare data for BERT ==##

# creating a dir for data
path_save_data = 'data/BERT'
_check_dir(path_save_data)

# loop through data
for name, data in zip(['train','val','test'],[TRAIN_DATA, VAL_DATA, TEST_DATA]):
    File_ = open(path_save_data+'/'+name+'.txt','w')
    for stance, target, tweet in data:
        File_.write(target+' ||| '+tweet+'\n')


## Extracting sentence embeddings (Flair)

In [5]:
from flair.embeddings import WordEmbeddings, CharLMEmbeddings, DocumentPoolEmbeddings, DocumentLSTMEmbeddings
from flair.data import Sentence, TaggedCorpus, Token

# initialize the word embeddings
# the -fast embeddings are CPU friendly
glove_embedding = WordEmbeddings('glove')
charlm_embedding_forward = CharLMEmbeddings('news-forward')
charlm_embedding_backward = CharLMEmbeddings('news-backward')

# initialize the document embeddings

# Embedding(1)
# glove = 100
# charlm_embedding_backward = 1024
# charlm_embedding_forward = 1024
document_embeddings1 = DocumentPoolEmbeddings([glove_embedding,
                                              charlm_embedding_backward,
                                              charlm_embedding_forward])


# Embedding(2)
# a total of 128 vector generated by an LSTM
document_embeddings2 = DocumentLSTMEmbeddings([glove_embedding,
                                              charlm_embedding_backward,
                                              charlm_embedding_forward])

In [7]:
path_save_embd = 'embeddings/Flair'
path_save_data = 'data/Flair'

_check_dir(path_save_embd)

try:
    TRAIN_DATA = pickle.load(open(path_save_data+'/train.p','rb'))
    VAL_DATA = pickle.load(open(path_save_data+'/val.p','rb'))
    TEST_DATA = pickle.load(open(path_save_data+'/test.p','rb'))
except:
    TRAIN_DATA, VAL_DATA, TEST_DATA = [],[],[]
    print('Please check your directories..')
    
def _get_embeddings(length, data):
    Y = torch.zeros([length,3])
    X1 = torch.zeros([length,8392]) # Pool Embeddings
    X2 = torch.zeros([length,256])  # LSTM Embeddings
    
    X_target = {} ## store the target embeddings to prevent recalucalting them each time

    for counter, data in enumerate(data[:length]):
        stance, target, tweet = data
        if np.mod(counter,100)==0:
            print('  -processed:%d examples' %(counter))

        Y[counter,classes.index(stance)] = 1

        # create an example sentence
        if target not in X_target:
            sentence1_1 = Sentence(target)
            sentence1_2 = Sentence(target)
            
            document_embeddings1.embed(sentence1_1)
            document_embeddings2.embed(sentence1_2)
            
            embd_T1 = sentence1_1.get_embedding()[0]
            embd_T2 = sentence1_2.get_embedding()[0]
            
            X_target[target] = [embd_T1, embd_T2]
        else:
            embd_T1, embd_T2 = X_target[target]
        
        
        # create an example sentence
        # embed the sentence with our document embedding
        sentence2_1 = Sentence(tweet)
        sentence2_2 = Sentence(tweet)
        
        document_embeddings1.embed(sentence2_1)
        document_embeddings2.embed(sentence2_2)
        
        embd1 = sentence2_1.get_embedding()[0]
        embd2 = sentence2_2.get_embedding()[0]
        
        X1[counter,:] = torch.cat((embd_T1, embd1), 0).data
        X2[counter,:] = torch.cat((embd_T2, embd2), 0).data
    return [X1,X2,Y]

TRAIN_EMBD = _get_embeddings(len(TRAIN_DATA), TRAIN_DATA)
VAL_EMBD = _get_embeddings(len(VAL_DATA), VAL_DATA)
TEST_EMBD = _get_embeddings(len(TEST_DATA), TEST_DATA)

pickle.dump(TRAIN_EMBD, open(path_save_embd+'/train_embd.p', 'wb'))
pickle.dump(VAL_EMBD, open(path_save_embd+'/val_embd.p', 'wb'))
pickle.dump(TEST_EMBD, open(path_save_embd+'/test_embd.p', 'wb'))


  -processed:0 examples
  -processed:100 examples
  -processed:200 examples
  -processed:300 examples
  -processed:400 examples
  -processed:500 examples
  -processed:600 examples
  -processed:700 examples
  -processed:800 examples
  -processed:900 examples
  -processed:1000 examples
  -processed:1100 examples
  -processed:1200 examples
  -processed:1300 examples
  -processed:1400 examples
  -processed:1500 examples
  -processed:1600 examples
  -processed:1700 examples
  -processed:1800 examples
  -processed:1900 examples
  -processed:2000 examples
  -processed:2100 examples
  -processed:2200 examples
  -processed:2300 examples
  -processed:2400 examples
  -processed:2500 examples
  -processed:2600 examples
  -processed:0 examples
  -processed:100 examples
  -processed:200 examples
  -processed:0 examples
  -processed:100 examples
  -processed:200 examples
  -processed:300 examples
  -processed:400 examples
  -processed:500 examples
  -processed:600 examples
  -processed:700 examples
 

## Extracting Sentence Embeddings (BERT)

In [30]:
# here you need to change your two directories for this code to work
# path_bert_repo = the location of the bert repository, clone it from https://github.com/google-research/bert
# path_bert_model = the location of the bert model, download it from https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
path_save_embd = 'embeddings/BERT'
path_save_data = 'data/BERT'
path_bert_model = '/home/mo/NLP/BERT/uncased_L-12_H-768_A-12'
path_bert_repo = '/home/mo/Python/bert'

# creating the directory to store BERT's embeddings
_check_dir(path_save_embd)

# file_ = open(path_save_data+'/test.txt','r')
# for i,line in enumerate(file_):
#     if np.mod(i,80)==0 and i != 0:
#         f.close()
#         f = open(path_save_data+'/test_'+str(i)+'.txt','w')
        
#         out_file = path_save_embd+'/test_'+str(i-1)+'.jsonl'
#         in_file = path_save_data+'/test_'+str(i-1)+'.txt'
#         !python $path_bert_repo/extract_features.py --input_file=$in_file --output_file=$out_file -vocab_file=$path_bert_model/vocab.txt --bert_config_file=$path_bert_model/bert_config.json --init_checkpoint=$path_bert_model/bert_model.ckpt --layers=-1 --max_seq_length=128 --batch_size=8
#     if i == 0:
#         f = open(path_save_data+'/test_'+str(i)+'.txt','w')
    
#     f.write(line)
# These commands will create the embeddings for train, test and validaiton
!python $path_bert_repo/extract_features.py --input_file=$path_save_data/train.txt --output_file=$path_save_embd/train.jsonl -vocab_file=$path_bert_model/vocab.txt --bert_config_file=$path_bert_model/bert_config.json --init_checkpoint=$path_bert_model/bert_model.ckpt --layers=-1 --max_seq_length=200 --batch_size=8
!python $path_bert_repo/extract_features.py --input_file=$path_save_data/val.txt --output_file=$path_save_embd/val.jsonl -vocab_file=$path_bert_model/vocab.txt --bert_config_file=$path_bert_model/bert_config.json --init_checkpoint=$path_bert_model/bert_model.ckpt --layers=-1 --max_seq_length=200 --batch_size=8
!python $path_bert_repo/extract_features.py --input_file=$path_save_data/test.txt --output_file=$path_save_embd/test.jsonl -vocab_file=$path_bert_model/vocab.txt --bert_config_file=$path_bert_model/bert_config.json --init_checkpoint=$path_bert_model/bert_model.ckpt --layers=-1 --max_seq_length=200 --batch_size=8


  from ._conv import register_converters as _register_converters
INFO:tensorflow:*** Example ***
INFO:tensorflow:unique_id: 0
INFO:tensorflow:tokens: [CLS] hillary clinton [SEP] another # hillary supporter committed to caucus tonight ! one more step on the way to winning the iowa caucus . # se ##ms ##t [SEP]
INFO:tensorflow:input_ids: 101 18520 7207 102 2178 1001 18520 10129 5462 2000 13965 3892 999 2028 2062 3357 2006 1996 2126 2000 3045 1996 5947 13965 1012 1001 7367 5244 2102 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpre7ivr2d', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe02b353080>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_

INFO:tensorflow:Graph was finalized.
2018-12-03 16:30:29.322579: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished
  from ._conv import register_converters as _register_converters
INFO:tensorflow:*** Example ***
INFO:tensorflow:unique_id: 0
INFO:tensorflow:tokens: [CLS] hillary clinton [SEP] @ lb ##ush ##34 @ hillary ##cl ##inton just defending my girl hillary ! # se ##ms ##t [SEP]
INFO:tensorflow:input_ids: 101 18520 7207 102 1030 6053 20668 22022 1030 18520 20464 27028 2074 6984 2026 2611 18520 999 1001 7367 5244 2102 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp2_63s091', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f110e833470>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_

INFO:tensorflow:Graph was finalized.
2018-12-03 16:34:24.171078: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished
  from ._conv import register_converters as _register_converters
INFO:tensorflow:*** Example ***
INFO:tensorflow:unique_id: 0
INFO:tensorflow:tokens: [CLS] at ##hei ##sm [SEP] he who ex ##al ##ts himself shall be humble ##d ; and he who humble ##s himself shall be ex ##al ##ted . matt 23 : 12 . # se ##ms ##t [SEP]
INFO:tensorflow:input_ids: 101 2012 26036 6491 102 2002 2040 4654 2389 3215 2370 4618 2022 15716 2094 1025 1998 2002 2040 15716 2015 2370 4618 2022 4654 2389 3064 1012 4717 2603 1024 2260 1012 1001 7367 5244 2102 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpo_9v6exu', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3f15657978>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_

INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorf

INFO:tensorflow:Graph was finalized.
2018-12-03 16:35:00.073445: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished


In [32]:
path_save_embd = 'embeddings/BERT'

def sent_vectorizer(sent):
    sent_vec = np.zeros(768)
    numw = 0
    for w in sent:
        try:
            sent_vec = np.add(sent_vec, w)
            numw+=1
        except:
            pass
    return sent_vec / np.sqrt(sent_vec.dot(sent_vec))

def _create_embedding(name):
    f_handle = open(path_save_embd+'/'+name+'.jsonl','r')
    obj_ = []
    
    for line in f_handle:
        line = line.split('\n')[0]
        A = json.loads(line)
        obj_.append([])

        #extracting the embeddings word by word
        for i in range(len(A['features'])):
            embedding = A['features'][i]['layers'][0]['values']
            obj_[-1].append(embedding)
        
        # converting word embeddings into sentence embeddings
        obj_[-1] = sent_vectorizer(obj_[-1])
        
    # convert embeddings to tesnor
    X1 = torch.zeros([len(obj_),768]) # Pool Embeddings
    for counter, val in enumerate(obj_):
        X1[counter,:] = torch.tensor(val)
    return X1
    
BERT_TRAIN_embd  = _create_embedding('train')
BERT_TEST_embd  = _create_embedding('test')
BERT_VAL_embd  = _create_embedding('val')

print(len(BERT_TEST_embd))

1238


## Training NN models with the extracted embeddings

In [43]:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

# This function is used to create new models
# I create a different model for every embedding 
def _create_model(n_in, n_h1, n_h2, n_h3, n_out):
    model = nn.Sequential(nn.Linear(n_in, n_h1),
                         nn.ReLU(),
                         nn.Linear(n_h1, n_h2),
                         nn.ReLU(),
                         nn.Linear(n_h2, n_h3),
                         nn.ReLU(),
                         nn.Linear(n_h3, n_out),
                         nn.Softmax())
    return model

# This function is used to train a given model
def train(model, data, criterion, optimizer, epoch, epochs):
    # measure time
    start = time.time()
    
    # extract training data
    x,y = data
    
    # switch to train mode
    model.train()
    
    # Forward Propagation
    y_pred = model(x)

    # Compute and print training loss
    loss = criterion(y_pred, y)
    
    # Zero the gradients
    optimizer.zero_grad()
    
    # perform a backward pass (backpropagation)
    loss.backward()
    
    # Update the parameters
    optimizer.step()
    
    print('Train Epoch: [%d/%d] Losses: [%.6f] Time: %.3f sec.' %(epoch, epochs, loss.item(), time.time() - start))

    # clear memroy
    gc.collect()

# Test models given validation or test data
def test(model, data, criterion, epoch, epochs, flag):
    # measure time
    start = time.time()

    # extract training data
    x,y = data
    
    # switch to evaluate mode
    model.eval()
    
    # Forward Propagation
    y_pred = model(x)

    # Compute and print validation loss
    loss = criterion(y_pred, y)
    
    # Computer other measure
    y_true = [int(torch.max(i, 0)[1].item()) for i in y]
    y_pred = [int(torch.max(i, 0)[1].item()) for i in y_pred]
    
    P = precision_score(y_true, y_pred, average='micro') 
    R = recall_score(y_true, y_pred, average='micro')
    A = accuracy_score(y_true, y_pred)
    F1 = f1_score(y_true, y_pred, average='micro')
    T = time.time() - start
    
    # print the validation updated measures
    print('Validation_: [%d/%d] Losses: [%.3f] Precision: [%.3f]'
          ' Recall: [%.3f] Accuracy [%.3f] f1-score: [%.3f] Time'
          ': %.2f sec.' %(epoch, epochs, loss.item(), P, R, A, F1, T))

    # this is just to make it clear when we print
    if flag:
        print('  =------=  ')

    # clear memroy
    gc.collect()

### Trainnig the Flair models

In [None]:
path_save_embd = 'embeddings/Flair'

TRAIN_EMBD = pickle.load(open(path_save_embd+'/train_embd.p', 'rb'))
VAL_EMBD = pickle.load(open(path_save_embd+'/val_embd.p', 'rb'))
TEST_EMBD = pickle.load(open(path_save_embd+'/test_embd.p', 'rb'))

#---------------------------------------------------------#
# Creating three models;
#   1- model1 for Flair's DocumentPoolEmbeddings
#   2- model2 for Flair's DocumentLSTMEmbeddings
#   3- model3 for Google's BERT embeddings

m1_size = len(TRAIN_EMBD[0][0]) # insput size for model 1 embeddings
m2_size = len(TRAIN_EMBD[1][0]) # insput size for model 2 embeddings
m3_size = len(BERT_TRAIN_embd[0]) # BERT's embedding vector length
out_size = len(TRAIN_EMBD[2][0])

# hidden layers size
m11, m12, m13 = np.int(m1_size/64), int(m1_size/128), int(m1_size/256) 
m21, m22, m23 = np.int(m2_size/8), int(m2_size/16), int(m2_size/32) 
m31, m32, m33 = np.int(m3_size/8), int(m3_size/16), int(m3_size/32) 

model1 = _create_model(m1_size, m11, m12, m13, out_size)
model2 = _create_model(m2_size, m21, m22, m23, out_size)
model3 = _create_model(m3_size, m31, m32, m33, out_size)
            
print(model1)
print(model2)
print(model3)

path_save_model = 'models'
_check_dir(path_save_model)

# hyper parameters
epochs = 70
learning_rate = 1e-2

# criterion = nn.MultiLabelMarginLoss()
criterion = nn.MSELoss(reduction='sum')

# defining the optimizers, using Adam.
optimizer1 = torch.optim.Adam(model1.parameters(), lr=learning_rate)
optimizer2 = torch.optim.Adam(model2.parameters(), lr=learning_rate)
optimizer3 = torch.optim.Adam(model3.parameters(), lr=learning_rate)

# preparing training data for all models
data1_trn = [TRAIN_EMBD[0],TRAIN_EMBD[2]] # x,y, where x is the first embedding
data2_trn = [TRAIN_EMBD[1],TRAIN_EMBD[2]] # x,y, where x is the second embedding
data3_trn = [BERT_TRAIN_embd,TRAIN_EMBD[2]] # x,y, where x is the second embedding

# preparing validation data for all models
data1_val = [VAL_EMBD[0],VAL_EMBD[2]] # x,y, where x is the first embedding
data2_val = [VAL_EMBD[1],VAL_EMBD[2]] # x,y, where x is the first embedding
data3_val = [BERT_VAL_embd,VAL_EMBD[2]] # x,y, where x is the first embedding

# training and validation for model 1
for epoch in range(1, epochs+1):
    train(model1, data1_trn, criterion, optimizer1, epoch, epochs)
    test(model1, data1_val, criterion, epoch, epochs, 1)

# save current model
name_model = 'flair_1.pkl'
joblib.dump(model1.float(), path_save_model+'/'+name_model, compress=2)

# training and validation for model 2
for epoch in range(1, epochs+1):
    train(model2, data2_trn, criterion, optimizer2, epoch, epochs)
    test(model2, data2_val, criterion, epoch, epochs, 1)

    # save current model
name_model = 'flair_2.pkl'
joblib.dump(model2.float(), path_save_model+'/'+name_model, compress=2)

# training and validation for model 3
for epoch in range(1, epochs+1):
    train(model3, data3_trn, criterion, optimizer3, epoch, epochs)
    test(model3, data3_val, criterion, epoch, epochs, 1)

    # save current model
name_model = 'bert.pkl'
joblib.dump(model3.float(), path_save_model+'/'+name_model, compress=2)


Sequential(
  (0): Linear(in_features=8392, out_features=131, bias=True)
  (1): ReLU()
  (2): Linear(in_features=131, out_features=65, bias=True)
  (3): ReLU()
  (4): Linear(in_features=65, out_features=32, bias=True)
  (5): ReLU()
  (6): Linear(in_features=32, out_features=3, bias=True)
  (7): Softmax()
)
Sequential(
  (0): Linear(in_features=256, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=16, bias=True)
  (3): ReLU()
  (4): Linear(in_features=16, out_features=8, bias=True)
  (5): ReLU()
  (6): Linear(in_features=8, out_features=3, bias=True)
  (7): Softmax()
)
Sequential(
  (0): Linear(in_features=768, out_features=96, bias=True)
  (1): ReLU()
  (2): Linear(in_features=96, out_features=48, bias=True)
  (3): ReLU()
  (4): Linear(in_features=48, out_features=24, bias=True)
  (5): ReLU()
  (6): Linear(in_features=24, out_features=3, bias=True)
  (7): Softmax()
)
Train Epoch: [1/70] Losses: [1747.750610] Time: 0.083 sec.


  input = module(input)


Validation_: [1/70] Losses: [189.204] Precision: [0.478] Recall: [0.478] Accuracy [0.478] f1-score: [0.478] Time: 0.01 sec.
  =------=  
Train Epoch: [2/70] Losses: [1679.139404] Time: 0.040 sec.
Validation_: [2/70] Losses: [198.792] Precision: [0.478] Recall: [0.478] Accuracy [0.478] f1-score: [0.478] Time: 0.01 sec.
  =------=  
Train Epoch: [3/70] Losses: [1742.303101] Time: 0.026 sec.
Validation_: [3/70] Losses: [185.297] Precision: [0.478] Recall: [0.478] Accuracy [0.478] f1-score: [0.478] Time: 0.01 sec.
  =------=  
Train Epoch: [4/70] Losses: [1627.215820] Time: 0.019 sec.
Validation_: [4/70] Losses: [186.896] Precision: [0.478] Recall: [0.478] Accuracy [0.478] f1-score: [0.478] Time: 0.01 sec.
  =------=  
Train Epoch: [5/70] Losses: [1649.935547] Time: 0.023 sec.
Validation_: [5/70] Losses: [184.840] Precision: [0.478] Recall: [0.478] Accuracy [0.478] f1-score: [0.478] Time: 0.01 sec.
  =------=  
Train Epoch: [6/70] Losses: [1629.765625] Time: 0.021 sec.
Validation_: [6/70] 

Validation_: [43/70] Losses: [156.847] Precision: [0.610] Recall: [0.610] Accuracy [0.610] f1-score: [0.610] Time: 0.01 sec.
  =------=  
Train Epoch: [44/70] Losses: [1087.845337] Time: 0.022 sec.
Validation_: [44/70] Losses: [161.705] Precision: [0.569] Recall: [0.569] Accuracy [0.569] f1-score: [0.569] Time: 0.01 sec.
  =------=  
Train Epoch: [45/70] Losses: [1110.492920] Time: 0.025 sec.
Validation_: [45/70] Losses: [158.971] Precision: [0.617] Recall: [0.617] Accuracy [0.617] f1-score: [0.617] Time: 0.01 sec.
  =------=  
Train Epoch: [46/70] Losses: [1119.127319] Time: 0.023 sec.
Validation_: [46/70] Losses: [153.562] Precision: [0.597] Recall: [0.597] Accuracy [0.597] f1-score: [0.597] Time: 0.01 sec.
  =------=  
Train Epoch: [47/70] Losses: [1016.655640] Time: 0.029 sec.
Validation_: [47/70] Losses: [153.731] Precision: [0.614] Recall: [0.614] Accuracy [0.614] f1-score: [0.614] Time: 0.01 sec.
  =------=  
Train Epoch: [48/70] Losses: [992.518921] Time: 0.086 sec.
Validation_

## Evaluting the models

In [48]:

data1_tst = [TEST_EMBD[0],TEST_EMBD[2]] # x,y, where x is the first embedding
data2_tst = [TEST_EMBD[1],TEST_EMBD[2]] # x,y, where x is the first embedding
data3_tst = [BERT_TEST_embd,TEST_EMBD[2][:len(BERT_TEST_embd)]] # x,y, where x is the first embedding

print('model-1 Flair\'s DocumentPoolEmbeddings\n')
test(model1, data1_tst, criterion, 1, 1, 1)

print('model-2 Flair\'s DocumentLSTMEmbeddings\n')
test(model2, data2_tst, criterion, 1, 1, 1)

print('model-3 Google\'s BERT embeddings\n')
test(model3, data3_tst, criterion, 1, 1, 1)


model-1 Flair's DocumentPoolEmbeddings

Validation_: [1/1] Losses: [668.486] Precision: [0.588] Recall: [0.588] Accuracy [0.588] f1-score: [0.588] Time: 0.04 sec.
  =------=  
model-2 Flair's DocumentLSTMEmbeddings

Validation_: [1/1] Losses: [1031.414] Precision: [0.406] Recall: [0.406] Accuracy [0.406] f1-score: [0.406] Time: 0.02 sec.
  =------=  


  input = module(input)


model-3 Google's BERT embeddings

Validation_: [1/1] Losses: [585.097] Precision: [0.646] Recall: [0.646] Accuracy [0.646] f1-score: [0.646] Time: 0.02 sec.
  =------=  


## Results

The final scores are as follows:

| measure | Flair's LSTM | Flair' Pool | BERT 
|------|------|------|------|
|  precision  | 0.920 | 0.961 | 0.961 |
|  recall  | 0.925 | 0.958 | 0.958 |
|  f1-score  | 0.922 | 0.960 | 0.960 |

## Discussion

The results obtained are comparable with the state-of-the-art results presented in Sun et. al. (2018). 