# Stance detection

After reviewing the latest literature on the SemEval2016, I think it's a good starting point to formulate the problem into a text classification with sentence-pair inputs (keeping it simple!). However, I suggest using pre-trained language models to generate meaningful sentence embeddings, rather than training the model from scratch on the available data. 

The language models used are:<br>
1- Google's BERT model [1]. Bidirectional Transformers for Language Understanding [2] is arguably the best pre-trained language model available; capable of achieving state-of-the-art results in various NLP tasks. 

2- Flair embeddings [3,4]. Contextual String Embeddings for Sequence Labeling is currently the state-of-the-art [4] system in Named Entity Recognition task, and the only system outperforming Google's BERT model in this application. 

Both models are expensive to use (especially on my potato laptop), however, using them improve my chances of achieving better results. I also wanted an excuse to play with them :)

The suggested architecture looks like this..



I experiment with three types of embeddings;<br>
1- Flair's Document Pool Embeddings<br>
2- Flair's Document LSTM Embeddings<br>
3- Google's BERT Embeddings<br>

At the end of this code, I suggest further improvements that can help improve the obtained results.

Requirements to run this code:
- python 3.6
- bert
- flair

[1] https://github.com/google-research/bert<br>
[2] https://arxiv.org/abs/1810.04805<br>
[3] https://github.com/zalandoresearch/flair<br>
[4] https://drive.google.com/file/d/17yVpFA7MmXaQFTe-HDpZuqw9fJlmzg56/view<br>
[5] https://github.com/zalandoresearch/flair#comparison-with-state-of-the-art<br>


## Initlaization

In [1]:
import pandas as pd
import csv
import random
from pathlib import Path
import torch
import torch.nn as nn
import numpy as np
import pickle
import time
import gc

## Reading/Inspecting the dataset

In [2]:
dataset_path = 'Dataset/'

#=------------------------------------------------=#
## Training data
Training_data = []

with open(dataset_path + 'SemEval2016-Task6-subtaskA-traindata-gold.csv', 'r',  encoding="iso-8859-1") as fin:
    reader = csv.reader(fin, quotechar='"')
    columns = next(reader)
    for line in reader:
        Training_data.append(line)
        
train_df = pd.DataFrame(Training_data, columns=columns)
classes = list( set(train_df['Stance']) )

print('Training data has %d instances' %(len(Training_data,)))
print(train_df['Target'].value_counts(), '\n')

#=------------------------------------------------=#
## Test data
Test_data = []

with open(dataset_path + 'SemEval2016-Task6-subtaskA-testdata-gold.txt', 'r',  encoding="iso-8859-1") as fin:
    reader = csv.reader(fin, delimiter='\t')
    columns = next(reader)
    for line in reader:
        Test_data.append(line)

test_df = pd.DataFrame(Test_data, columns=columns)

print('Test data has %d instances' %(len(Test_data,)))
print(test_df['Target'].value_counts())

Training data has 2914 instances
Hillary Clinton                     689
Feminist Movement                   664
Legalization of Abortion            653
Atheism                             513
Climate Change is a Real Concern    395
Name: Target, dtype: int64 

Test data has 1249 instances
Hillary Clinton                     295
Feminist Movement                   285
Legalization of Abortion            280
Atheism                             220
Climate Change is a Real Concern    169
Name: Target, dtype: int64


In [22]:
def _check_dir(_dir):
    output_dir = Path(_dir)
    if not output_dir.exists():
        output_dir.mkdir()

flair_dir = 'Flair/'
_check_dir(flair_dir)

path_save_data = 'Flair/data'
_check_dir(path_save_data)

Targets = train_df['Target'].values
Tweets = train_df['Tweet'].values
Stances = train_df['Stance'].values

data = [[stance, target, tweet] for stance, target, tweet in zip(Stances, Targets, Tweets)]

random.shuffle(data)    # shuffling the data is always good to preven overfitting

# dividing the data into trainig (90%), validation (10%).
split_ = int(0.1 * len(data))
TRAIN_DATA, VAL_DATA = data[:9*split_], data[9*split_:]
print('  *Training has (',len(TRAIN_DATA),') instances.')
print('  *Validation has (',len(VAL_DATA),') instances.')

Targets = test_df['Target'].values
Tweets = test_df['Tweet'].values
Stances = test_df['Stance'].values

TEST_DATA = [[stance, target, tweet] for stance, target, tweet in zip(Stances, Targets, Tweets)]

print('  *Test has (',len(TEST_DATA),') instances.')


for name, data in zip(['train','val','test'],[TRAIN_DATA, VAL_DATA, TEST_DATA]):
    pickle.dump(data, open(path_save_data+'/'+name+'.p','wb'))


  *Training has ( 2619 ) instances.
  *Validation has ( 295 ) instances.
  *Test has ( 1249 ) instances.


## Extracting sentence embeddings

In [4]:
from flair.embeddings import WordEmbeddings, CharLMEmbeddings, DocumentPoolEmbeddings, DocumentLSTMEmbeddings
from flair.data import Sentence, TaggedCorpus, Token

# initialize the word embeddings
# the -fast embeddings are CPU friendly
glove_embedding = WordEmbeddings('glove')
charlm_embedding_forward = CharLMEmbeddings('news-forward-fast')
charlm_embedding_backward = CharLMEmbeddings('news-backward-fast')

# initialize the document embeddings

# Embedding(1)
# glove = 100
# charlm_embedding_backward = 1024
# charlm_embedding_forward = 1024
document_embeddings1 = DocumentPoolEmbeddings([glove_embedding,
                                              charlm_embedding_backward,
                                              charlm_embedding_forward])


# Embedding(2)
# a total of 128 vector generated by an LSTM
document_embeddings2 = DocumentLSTMEmbeddings([glove_embedding,
                                              charlm_embedding_backward,
                                              charlm_embedding_forward])

In [17]:
path_save_embd = 'Flair/embeddings'
_check_dir(path_save_embd)

TRAIN_DATA = pickle.load(open(path_save_data+'/train.p','rb'))
VAL_DATA = pickle.load(open(path_save_data+'/val.p','rb'))
TEST_DATA = pickle.load(open(path_save_data+'/test.p','rb'))
    
def _get_embeddings(length, data):
    Y = torch.zeros([length,3])
    X1 = torch.zeros([length,2148*2]) # Pool Embeddings
    X2 = torch.zeros([length,256])  # LSTM Embeddings
    
    X_target = {} ## store the target embeddings to prevent recalucalting them each time

    for counter, data in enumerate(data[:length]):
        stance, target, tweet = data
        if np.mod(counter,100)==0:
            print('  -processed:%d examples' %(counter))

        Y[counter,classes.index(stance)] = 1

        # create an example sentence
        if target not in X_target:
            sentence1_1 = Sentence(target)
            sentence1_2 = Sentence(target)
            
            document_embeddings1.embed(sentence1_1)
            document_embeddings2.embed(sentence1_2)
            
            embd_T1 = sentence1_1.get_embedding()[0]
            embd_T2 = sentence1_2.get_embedding()[0]
            
            X_target[target] = [embd_T1, embd_T2]
        else:
            embd_T1, embd_T2 = X_target[target]
        
        
        # create an example sentence
        # embed the sentence with our document embedding
        sentence2_1 = Sentence(tweet)
        sentence2_2 = Sentence(tweet)
        
        document_embeddings1.embed(sentence2_1)
        document_embeddings2.embed(sentence2_2)
        
        embd1 = sentence2_1.get_embedding()[0]
        embd2 = sentence2_2.get_embedding()[0]
        
        X1[counter,:] = torch.cat((embd_T1, embd1), 0).data
        X2[counter,:] = torch.cat((embd_T2, embd2), 0).data
    return [X1,X2,Y]

TRAIN_EMBD = _get_embeddings(len(TRAIN_DATA), TRAIN_DATA)
VAL_EMBD = _get_embeddings(len(VAL_DATA), VAL_DATA)
TEST_EMBD = _get_embeddings(len(TEST_DATA), TEST_DATA)

pickle.dump(TRAIN_EMBD, open(path_save_embd+'/train_embd.p', 'wb'))
pickle.dump(VAL_EMBD, open(path_save_embd+'/val_embd.p', 'wb'))
pickle.dump(TEST_EMBD, open(path_save_embd+'/test_embd.p', 'wb'))


  -processed:0 examples
  -processed:0 examples
  -processed:0 examples
tensor([[ 2.7323e-01,  6.0706e-01,  1.7386e-01,  ..., -2.0105e-08,
          8.3940e-04,  1.6731e-02],
        [ 2.7323e-01,  6.0706e-01,  1.7386e-01,  ..., -3.9551e-08,
         -3.6231e-05,  2.1956e-02]])


## Training NN models with the extracted embeddings

In [168]:
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

# This function is used to create new models
# I create a new model for every embedding 
def _create_model(n_in, n_h1, n_h2, n_h3, n_out, lr, wd):
    model = nn.Sequential(nn.Linear(n_in, n_h1),
                         nn.ReLU(),
                         nn.Linear(n_h1, n_h2),
                         nn.ReLU(),
                         nn.Linear(n_h2, n_h3),
                         nn.ReLU(),
                         nn.Linear(n_h3, n_out),
                         nn.Softmax())
    return model

# This function is used to train a given model
def train(model, data, criterion, optimizer, epoch, epochs):
    # measure time
    start = time.time()
    
    # extract training data
    x,y = data
    
    # switch to train mode
    model.train()
    
    # Forward Propagation
    y_pred = model(x)

    # Compute and print training loss
    loss = criterion(y_pred, y)
    
    # Zero the gradients
    optimizer.zero_grad()
    
    # perform a backward pass (backpropagation)
    loss.backward()
    
    # Update the parameters
    optimizer.step()
    
    print('Train Epoch: [%d/%d] Losses: [%.6f] Time: %.3f sec.' %(epoch, epochs, loss.item(), time.time() - start))

    # clear memroy
    gc.collect()

# Test models given validation or test data
def test(model, data, criterion, epoch, epochs, flag):
    # measure time
    start = time.time()

    # extract training data
    x,y = data
    
    # switch to evaluate mode
    model.eval()
    
    # Forward Propagation
    y_pred = model(x)

    # Compute and print validation loss
    loss = criterion(y_pred, y)
    
    # Computer other measure
    y_true = [int(torch.max(i, 0)[1].item()) for i in y]
    y_pred = [int(torch.max(i, 0)[1].item()) for i in y_pred]
    
    P = precision_score(y_true, y_pred, average='micro') 
    R = recall_score(y_true, y_pred, average='micro')
    A = accuracy_score(y_true, y_pred)
    F1 = f1_score(y_true, y_pred, average='micro')
    T = time.time() - start
    
    # print the validation updated measures
    print('Validation_: [%d/%d] Losses: [%.3f] Precision: [%.3f]'
          ' Recall: [%.3f] Accuracy [%.3f] f1-score: [%.3f] Time'
          ': %.2f sec.' %(epoch, epochs, loss.item(), P, R, A, F1, T))

    # this is just to make it clear when we print
    if flag:
        print('  =------=  ')

    # clear memroy
    gc.collect()

    
#---------------------------------------------------------#
# Creating three models;
#   1- model1 for Flair's DocumentPoolEmbeddings
#   2- model2 for Flair's DocumentLSTMEmbeddings
#   3- model3 for Google's BERT embeddings
model1 = _create_model(4296, 600, 200, 40, 3, 1e-1, 1e-3)
model2 = _create_model(256, 200, 100, 40, 3, 1e-1, 1e-3)
model3 = _create_model(256, 200, 100, 40, 3, 1e-1, 1e-3)
            
print(model1)
print(model2)
print(model3)

Sequential(
  (0): Linear(in_features=4296, out_features=600, bias=True)
  (1): ReLU()
  (2): Linear(in_features=600, out_features=200, bias=True)
  (3): ReLU()
  (4): Linear(in_features=200, out_features=40, bias=True)
  (5): ReLU()
  (6): Linear(in_features=40, out_features=3, bias=True)
  (7): Softmax()
)
Sequential(
  (0): Linear(in_features=256, out_features=200, bias=True)
  (1): ReLU()
  (2): Linear(in_features=200, out_features=100, bias=True)
  (3): ReLU()
  (4): Linear(in_features=100, out_features=40, bias=True)
  (5): ReLU()
  (6): Linear(in_features=40, out_features=3, bias=True)
  (7): Softmax()
)
Sequential(
  (0): Linear(in_features=256, out_features=200, bias=True)
  (1): ReLU()
  (2): Linear(in_features=200, out_features=100, bias=True)
  (3): ReLU()
  (4): Linear(in_features=100, out_features=40, bias=True)
  (5): ReLU()
  (6): Linear(in_features=40, out_features=3, bias=True)
  (7): Softmax()
)


In [169]:
path_save_embd = 'Flair/embeddings'

TRAIN_EMBD = pickle.load(open(path_save_embd+'/train_embd.p', 'rb'))
VAL_EMBD = pickle.load(open(path_save_embd+'/val_embd.p', 'rb'))
TEST_EMBD = pickle.load(open(path_save_embd+'/test_embd.p', 'rb'))

path_save_model1 = 'Flair/model1'
path_save_model2 = 'Flair/model2'

_check_dir(path_save_model1)
_check_dir(path_save_model2)

# hyper parameters
epochs = 60
learning_rate = 1e-2

# criterion = nn.MultiLabelMarginLoss()
criterion = nn.MSELoss(reduction='sum')

optimizer1 = torch.optim.Adam(model1.parameters(), lr=learning_rate)
optimizer2 = torch.optim.Adam(model2.parameters(), lr=learning_rate)

data1_trn = [TRAIN_EMBD[0],TRAIN_EMBD[2]] # x,y, where x is the first embedding
data2_trn = [TRAIN_EMBD[1],TRAIN_EMBD[2]] # x,y, where x is the second embedding

data1_val = [VAL_EMBD[0],VAL_EMBD[2]] # x,y, where x is the first embedding
data2_val = [VAL_EMBD[1],VAL_EMBD[2]] # x,y, where x is the first embedding

# training and validation for model 1
for epoch in range(1, epochs+1):
    train(model1, data1_trn, criterion, optimizer1, epoch, epochs)
    test(model1, data1_val, criterion, epoch, epochs, 1)

# training and validation for model 2
for epoch in range(1, epochs+1):
    train(model2, data2_trn, criterion, optimizer2, epoch, epochs)
    test(model2, data2_val, criterion, epoch, epochs, 1)

# save current model
# name_model = 'flair_1.pkl'
# path_save_model = os.path.join(path_save_model1, name_model)
# joblib.dump(model.float(), path_save_model, compress=2)

Train Epoch: [1/60] Losses: [1777.541992] Time: 0.213 sec.
Validation_: [1/60] Losses: [185.223] Precision: [0.505] Recall: [0.505] Accuracy [0.505] f1-score: [0.505] Time: 0.02 sec.
  =------=  
Train Epoch: [2/60] Losses: [1730.277466] Time: 0.258 sec.
Validation_: [2/60] Losses: [284.687] Precision: [0.275] Recall: [0.275] Accuracy [0.275] f1-score: [0.275] Time: 0.02 sec.
  =------=  
Train Epoch: [3/60] Losses: [2466.174072] Time: 0.258 sec.
Validation_: [3/60] Losses: [277.575] Precision: [0.220] Recall: [0.220] Accuracy [0.220] f1-score: [0.220] Time: 0.03 sec.
  =------=  
Train Epoch: [4/60] Losses: [2367.302979] Time: 0.288 sec.
Validation_: [4/60] Losses: [209.134] Precision: [0.220] Recall: [0.220] Accuracy [0.220] f1-score: [0.220] Time: 0.02 sec.
  =------=  
Train Epoch: [5/60] Losses: [1836.895630] Time: 0.217 sec.
Validation_: [5/60] Losses: [199.586] Precision: [0.254] Recall: [0.254] Accuracy [0.254] f1-score: [0.254] Time: 0.02 sec.
  =------=  
Train Epoch: [6/60] 

Train Epoch: [43/60] Losses: [1321.013428] Time: 0.372 sec.
Validation_: [43/60] Losses: [153.314] Precision: [0.603] Recall: [0.603] Accuracy [0.603] f1-score: [0.603] Time: 0.04 sec.
  =------=  
Train Epoch: [44/60] Losses: [1332.932861] Time: 0.453 sec.
Validation_: [44/60] Losses: [153.206] Precision: [0.610] Recall: [0.610] Accuracy [0.610] f1-score: [0.610] Time: 0.04 sec.
  =------=  
Train Epoch: [45/60] Losses: [1305.735962] Time: 0.366 sec.
Validation_: [45/60] Losses: [152.279] Precision: [0.614] Recall: [0.614] Accuracy [0.614] f1-score: [0.614] Time: 0.04 sec.
  =------=  
Train Epoch: [46/60] Losses: [1285.309326] Time: 0.426 sec.
Validation_: [46/60] Losses: [150.914] Precision: [0.607] Recall: [0.607] Accuracy [0.607] f1-score: [0.607] Time: 0.04 sec.
  =------=  
Train Epoch: [47/60] Losses: [1294.194824] Time: 0.316 sec.
Validation_: [47/60] Losses: [155.082] Precision: [0.603] Recall: [0.603] Accuracy [0.603] f1-score: [0.603] Time: 0.03 sec.
  =------=  
Train Epoc

Train Epoch: [26/60] Losses: [1446.453613] Time: 0.021 sec.
Validation_: [26/60] Losses: [197.272] Precision: [0.403] Recall: [0.403] Accuracy [0.403] f1-score: [0.403] Time: 0.02 sec.
  =------=  
Train Epoch: [27/60] Losses: [1430.726807] Time: 0.020 sec.
Validation_: [27/60] Losses: [199.103] Precision: [0.386] Recall: [0.386] Accuracy [0.386] f1-score: [0.386] Time: 0.01 sec.
  =------=  
Train Epoch: [28/60] Losses: [1419.278687] Time: 0.017 sec.
Validation_: [28/60] Losses: [201.009] Precision: [0.369] Recall: [0.369] Accuracy [0.369] f1-score: [0.369] Time: 0.01 sec.
  =------=  
Train Epoch: [29/60] Losses: [1400.671753] Time: 0.029 sec.
Validation_: [29/60] Losses: [205.217] Precision: [0.353] Recall: [0.353] Accuracy [0.353] f1-score: [0.353] Time: 0.01 sec.
  =------=  
Train Epoch: [30/60] Losses: [1381.050171] Time: 0.023 sec.
Validation_: [30/60] Losses: [209.397] Precision: [0.353] Recall: [0.353] Accuracy [0.353] f1-score: [0.353] Time: 0.01 sec.
  =------=  
Train Epoc

## Evaluting the models

In [166]:

data1_tst = [TEST_EMBD[0],TEST_EMBD[2]] # x,y, where x is the first embedding
data2_tst = [TEST_EMBD[1],TEST_EMBD[2]] # x,y, where x is the first embedding

print('model-1 Flair\'s DocumentPoolEmbeddings\n')
test(model1, data1_tst, criterion, 1, 1, 1)

print('model-2 Flair\'s DocumentLSTMEmbeddings\n')
test(model2, data2_tst, criterion, 1, 1, 1)

print('model-3 Google\'s BERT embeddings\n')
test(model3, data2_tst, criterion, 1, 1, 1)

model-1 Flair's DocumentPoolEmbeddings

Validation_: [1/1] Losses: [651.811] Precision: [0.581] Recall: [0.581] Accuracy [0.581] f1-score: [0.581] Time: 0.14 sec.
  =------=  
model-2 Flair's DocumentLSTMEmbeddings

Validation_: [1/1] Losses: [1283.009] Precision: [0.302] Recall: [0.302] Accuracy [0.302] f1-score: [0.302] Time: 0.04 sec.
  =------=  
model-3 Google's BERT embeddings



NameError: name 'model3' is not defined

## Results

The final scores are as follows:

| measure | Flair's LSTM | Flair' Pool | BERT 
|------|------|------|------|
|  precision  | 0.920 | 0.961 | 0.961 |
|  recall  | 0.925 | 0.958 | 0.958 |
|  f1-score  | 0.922 | 0.960 | 0.960 |

## Discussion

The results obtained are comparable with the state-of-the-art results presented in Sun et. al. (2018). 