# TRAINING TEXT CLASSIFIERS WITH SPACY

In this lab we will train different text classifiers with spacy.

1. Read through the code and train to add more inline documentation as you try to understand the functionality.

2. We will adapt the code to train two different fake news classifiers: one on general fake news from 6 different domains and another one on celebrities, were there are legitimate news but also news which are false gossip.



# New section

In [39]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [40]:
# We will be using spacy v2, so no need to upgrade to v3

In [41]:
# TODO install and test the language modules of your choice following the https://spacy.io/usage

!python -m spacy download en_core_web_sm
#!python -m spacy download en_core_web_md
#!python -m spacy download en_core_web_lg

Collecting en_core_web_sm==2.2.5
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
[K     |████████████████████████████████| 12.0 MB 5.5 MB/s 
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


In [42]:
import spacy
import csv
import random
import time
import numpy as np
import pandas as pd
import re
import string

from spacy.util import minibatch, compounding
import sys
from spacy import displacy
from itertools import chain

from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

# TODO add inline documentation describing the functionality of each function
# load data
def load_data(fnames):
    data = []
    for fname in fnames:
        data.append(pd.read_csv(fname, sep='\t', encoding='utf-8'))
    data = pd.concat(data)
    targets = set(data['Target'])
    return data, list(targets)

# pre-process tweets
def cleanup(tweet):
    """we remove urls, hashtags and user symbols"""
    tweet = re.sub(r"http\S+", "", tweet.replace("#", "").replace("@", "").replace('\n', ' ').replace('\t', ' '))
    return tweet

In [43]:
# data path. trial data used as training too.
trial_file = "/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/semeval2016-task6-trialdata.utf-8.txt"
train_file = "/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/semeval2016-task6-trainingdata.utf-8.txt"
test_file = "/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/SemEval2016-Task6-subtaskA-testdata-gold.txt"

training_data, targets = load_data([trial_file, train_file])
training_data['Clean_tweet'] = training_data['Tweet'].apply(cleanup)

test_data, _ = load_data([test_file])
test_data['Clean_tweet'] = test_data['Tweet'].apply(cleanup)
display(training_data)

Unnamed: 0,ID,Target,Tweet,Stance,Clean_tweet
0,1,Hillary Clinton,"@tedcruz And, #HandOverTheServer she wiped cle...",AGAINST,"tedcruz And, HandOverTheServer she wiped clean..."
1,2,Hillary Clinton,Hillary is our best choice if we truly want to...,FAVOR,Hillary is our best choice if we truly want to...
2,3,Hillary Clinton,@TheView I think our country is ready for a fe...,AGAINST,TheView I think our country is ready for a fem...
3,4,Hillary Clinton,I just gave an unhealthy amount of my hard-ear...,AGAINST,I just gave an unhealthy amount of my hard-ear...
4,5,Hillary Clinton,@PortiaABoulger Thank you for adding me to you...,NONE,PortiaABoulger Thank you for adding me to your...
...,...,...,...,...,...
2809,2910,Legalization of Abortion,"There's a law protecting unborn eagles, but no...",AGAINST,"There's a law protecting unborn eagles, but no..."
2810,2911,Legalization of Abortion,I am 1 in 3... I have had an abortion #Abortio...,AGAINST,I am 1 in 3... I have had an abortion Abortion...
2811,2912,Legalization of Abortion,How dare you say my sexual preference is a cho...,AGAINST,How dare you say my sexual preference is a cho...
2812,2913,Legalization of Abortion,"Equal rights for those 'born that way', no rig...",AGAINST,"Equal rights for those 'born that way', no rig..."


In [44]:
for target in targets:
  training_data[training_data['Target'] == target][['Stance', 'Clean_tweet']].to_csv(f"/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/train.{target}.tsv",
          sep="\t", index=False, quoting=csv.QUOTE_NONE, quotechar="", escapechar="")
  test_data[test_data['Target'] == target][['Stance', 'Clean_tweet']].to_csv(f"/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/test.{target}.tsv",
          sep="\t", index=False, quoting=csv.QUOTE_NONE, quotechar="", escapechar="")

In [45]:
def load_data_spacy(fname):
  training_data = pd.read_csv(fname, sep='\t', encoding='utf-8')
  #train_data.dropna(axis = 0, how ='any',inplace=True)
  #train_data['Num_words_text'] = train_data['text'].apply(lambda x:len(str(x).split())) 
  #mask = train_data['Num_words_text'] >2
  #train_data = train_data[mask]
  print(training_data['Stance'].value_counts())
   
  train_texts = training_data['Clean_tweet'].tolist()
  train_cats = training_data['Stance'].tolist()
  final_train_cats=[]
  for cat in train_cats:
    cat_list = {}
    if cat == 'AGAINST':
      cat_list['AGAINST'] =  1
      cat_list['FAVOR'] =  0
      cat_list['NONE'] =  0
    elif cat == 'FAVOR':
      cat_list['AGAINST'] =  0
      cat_list['FAVOR'] =  1
      cat_list['NONE'] =  0
    else:
      cat_list['AGAINST'] =  0
      cat_list['FAVOR'] =  0
      cat_list['NONE'] =  1
    final_train_cats.append(cat_list)
    
  train_data = list(zip(train_texts, [{"cats": cats} for cats in final_train_cats]))
  return train_data, train_texts, train_cats


In [46]:
training_data, train_texts, train_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/train.Feminist Movement.tsv')
print(training_data[:10])
print(len(training_data))
test_data, test_texts, test_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/test.Feminist Movement.tsv')
print(len(test_data))

AGAINST    328
FAVOR      210
NONE       126
Name: Stance, dtype: int64
[('Always a delight to see chest-drumming alpha males hiss and scuttle backwards up the wall when a feminist enters the room. manly SemST', {'cats': {'AGAINST': 0, 'FAVOR': 1, 'NONE': 0}}), ("Sometimes I overheat and want to take off my shirt but can't because of social expectations of people with breasts. ;n; SemST", {'cats': {'AGAINST': 0, 'FAVOR': 1, 'NONE': 0}}), ('If feminists spent 1/2 as much time reading papers as they do tumblr they would be real people, not ignorant sexist bigots. SemST', {'cats': {'AGAINST': 1, 'FAVOR': 0, 'NONE': 0}}), ('Stupid Feminists, the civilization you take for granted was built with the labour, blood sweat and tears of men. SemST', {'cats': {'AGAINST': 1, 'FAVOR': 0, 'NONE': 0}}), ("YOU'RE A GIRL AND HAVE A SEX DRIVE!? YOU MUST BE A SLUT! feminist SemST", {'cats': {'AGAINST': 0, 'FAVOR': 1, 'NONE': 0}}), ("Suns out....  Dresses out...  StreetHarassment out...  This shouldn't be 

In [47]:
def Sort(sub_li):
  # reverse = True (Soresulting_list = list(first_list)rts in Descending  order) 
  # key is set to sort using second element of  
  # sublist lambda has been used 
  return(sorted(sub_li, key = lambda x: x[1],reverse=True))  

# run the predictions on each sentence in the evaluation  dataset, and return the metrics
def evaluate(tokenizer, textcat, test_texts, test_cats ):
  docs = (tokenizer(text) for text in test_texts)
  preds = []
  for i, doc in enumerate(textcat.pipe(docs)):
    #print(doc.cats.items())
    scores = Sort(doc.cats.items())
    #print(scores)
    catList=[]
    for score in scores:
      catList.append(score[0])
    preds.append(catList[0])
        
  labels = ['AGAINST', 'FAVOR']
  print(classification_report(test_cats, preds,labels=labels))
    

In [48]:
def train_spacy(  train_data, iterations,test_texts,test_cats, model_arch, dropout = 0.3, model=None, init_tok2vec=None):
    ''' Train a spacy NER model, which can be queried against with test data
   
    train_data : training data in the format of (sentence, {cats: ['AGAINST'|'FAVOR'|'NONE']})
    labels : a list of unique annotations
    iterations : number of training iterations
    dropout : dropout proportion for training
    display_freq : number of epochs between logging losses to console
    '''
    
    nlp = spacy.load('en_core_web_sm')
    

    # add the text classifier to the pipeline if it doesn't exist
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "textcat" not in nlp.pipe_names:
        textcat = nlp.create_pipe(
            "textcat", config={"exclusive_classes": True, "architecture": model_arch}
        )
        nlp.add_pipe(textcat, last=True)
        
    # otherwise, get it, so we can add labels to it
    else:
        textcat = nlp.get_pipe("textcat")

    # add label to text classifier
    textcat.add_label("AGAINST")
    textcat.add_label("FAVOR")
    textcat.add_label("NONE")


    # get names of other pipes to disable them during training
    pipe_exceptions = ["textcat", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    with nlp.disable_pipes(*other_pipes):  # only train textcat
        optimizer = nlp.begin_training()
        if init_tok2vec is not None:
            with init_tok2vec.open("rb") as file_:
                textcat.model.tok2vec.from_bytes(file_.read())
        print("Training the model...")
        print("{:^5}\t{:^5}\t{:^5}\t{:^5}".format("LOSS", "P", "R", "F"))
        batch_sizes = compounding(16.0, 64.0, 1.5)
        for i in range(iterations):
            print('Iteration: '+str(i))
            start_time = time.clock()
            losses = {}
            # batch up the examples using spaCy's minibatch
            random.shuffle(train_data)
            batches = minibatch(train_data, size=batch_sizes)
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=dropout, losses=losses)
            with textcat.model.use_params(optimizer.averages):
                # evaluate on the test data 
                evaluate(nlp.tokenizer, textcat, test_texts,test_cats)
            print ('Elapsed time'+str(time.clock() - start_time)+  "seconds")
        with nlp.use_params(optimizer.averages):
            model_name = model_arch + "_Feminism_Stance_Semeval2016"
            filepath = "/content/drive/MyDrive/2022-ILTAPP/resources/" + model_name 
            nlp.to_disk(filepath)
    return nlp

In [49]:
nlp = train_spacy(training_data, 20, test_texts, test_cats, "bow")

Training the model...
LOSS 	  P  	  R  	  F  
Iteration: 0




              precision    recall  f1-score   support

     AGAINST       0.66      0.98      0.79       183
       FAVOR       0.64      0.12      0.20        58

   micro avg       0.66      0.78      0.71       241
   macro avg       0.65      0.55      0.50       241
weighted avg       0.65      0.78      0.65       241

Elapsed time0.6363750000000437seconds
Iteration: 1




              precision    recall  f1-score   support

     AGAINST       0.64      1.00      0.78       183
       FAVOR       0.00      0.00      0.00        58

   micro avg       0.64      0.76      0.70       241
   macro avg       0.32      0.50      0.39       241
weighted avg       0.49      0.76      0.60       241

Elapsed time0.4222440000000347seconds
Iteration: 2




              precision    recall  f1-score   support

     AGAINST       0.65      1.00      0.79       183
       FAVOR       0.50      0.02      0.03        58

   micro avg       0.65      0.76      0.70       241
   macro avg       0.57      0.51      0.41       241
weighted avg       0.61      0.76      0.60       241

Elapsed time0.48458400000004076seconds
Iteration: 3




              precision    recall  f1-score   support

     AGAINST       0.64      0.98      0.78       183
       FAVOR       0.29      0.03      0.06        58

   micro avg       0.64      0.75      0.69       241
   macro avg       0.46      0.51      0.42       241
weighted avg       0.56      0.75      0.60       241

Elapsed time0.32724300000000994seconds
Iteration: 4




              precision    recall  f1-score   support

     AGAINST       0.65      0.96      0.78       183
       FAVOR       0.38      0.10      0.16        58

   micro avg       0.64      0.76      0.69       241
   macro avg       0.51      0.53      0.47       241
weighted avg       0.59      0.76      0.63       241

Elapsed time0.3844659999999749seconds
Iteration: 5




              precision    recall  f1-score   support

     AGAINST       0.66      0.94      0.78       183
       FAVOR       0.40      0.17      0.24        58

   micro avg       0.64      0.76      0.69       241
   macro avg       0.53      0.56      0.51       241
weighted avg       0.60      0.76      0.65       241

Elapsed time0.45183900000000676seconds
Iteration: 6




              precision    recall  f1-score   support

     AGAINST       0.66      0.92      0.77       183
       FAVOR       0.34      0.17      0.23        58

   micro avg       0.62      0.74      0.68       241
   macro avg       0.50      0.55      0.50       241
weighted avg       0.58      0.74      0.64       241

Elapsed time0.47196700000000646seconds
Iteration: 7




              precision    recall  f1-score   support

     AGAINST       0.67      0.91      0.77       183
       FAVOR       0.37      0.24      0.29        58

   micro avg       0.63      0.75      0.68       241
   macro avg       0.52      0.57      0.53       241
weighted avg       0.60      0.75      0.66       241

Elapsed time0.4597210000000018seconds
Iteration: 8




              precision    recall  f1-score   support

     AGAINST       0.68      0.90      0.78       183
       FAVOR       0.38      0.29      0.33        58

   micro avg       0.64      0.75      0.69       241
   macro avg       0.53      0.59      0.55       241
weighted avg       0.61      0.75      0.67       241

Elapsed time0.4802050000000122seconds
Iteration: 9




              precision    recall  f1-score   support

     AGAINST       0.68      0.88      0.77       183
       FAVOR       0.35      0.29      0.32        58

   micro avg       0.62      0.74      0.68       241
   macro avg       0.51      0.59      0.54       241
weighted avg       0.60      0.74      0.66       241

Elapsed time0.4793000000000234seconds
Iteration: 10




              precision    recall  f1-score   support

     AGAINST       0.68      0.86      0.76       183
       FAVOR       0.35      0.33      0.34        58

   micro avg       0.62      0.73      0.67       241
   macro avg       0.51      0.59      0.55       241
weighted avg       0.60      0.73      0.66       241

Elapsed time0.4725090000000023seconds
Iteration: 11




              precision    recall  f1-score   support

     AGAINST       0.69      0.86      0.77       183
       FAVOR       0.40      0.40      0.40        58

   micro avg       0.63      0.75      0.69       241
   macro avg       0.55      0.63      0.58       241
weighted avg       0.62      0.75      0.68       241

Elapsed time0.46632099999999355seconds
Iteration: 12




              precision    recall  f1-score   support

     AGAINST       0.70      0.85      0.77       183
       FAVOR       0.40      0.43      0.41        58

   micro avg       0.63      0.75      0.69       241
   macro avg       0.55      0.64      0.59       241
weighted avg       0.63      0.75      0.68       241

Elapsed time0.4623320000000035seconds
Iteration: 13




              precision    recall  f1-score   support

     AGAINST       0.70      0.85      0.77       183
       FAVOR       0.41      0.45      0.43        58

   micro avg       0.64      0.75      0.69       241
   macro avg       0.56      0.65      0.60       241
weighted avg       0.63      0.75      0.69       241

Elapsed time0.47347100000001774seconds
Iteration: 14




              precision    recall  f1-score   support

     AGAINST       0.70      0.84      0.77       183
       FAVOR       0.39      0.43      0.41        58

   micro avg       0.63      0.74      0.68       241
   macro avg       0.55      0.64      0.59       241
weighted avg       0.63      0.74      0.68       241

Elapsed time0.4532029999999736seconds
Iteration: 15




              precision    recall  f1-score   support

     AGAINST       0.71      0.84      0.77       183
       FAVOR       0.39      0.45      0.42        58

   micro avg       0.64      0.75      0.69       241
   macro avg       0.55      0.64      0.59       241
weighted avg       0.63      0.75      0.69       241

Elapsed time0.4534680000000435seconds
Iteration: 16




              precision    recall  f1-score   support

     AGAINST       0.72      0.83      0.77       183
       FAVOR       0.39      0.48      0.43        58

   micro avg       0.63      0.74      0.68       241
   macro avg       0.55      0.65      0.60       241
weighted avg       0.64      0.74      0.69       241

Elapsed time0.45351199999998926seconds
Iteration: 17




              precision    recall  f1-score   support

     AGAINST       0.73      0.82      0.77       183
       FAVOR       0.41      0.53      0.46        58

   micro avg       0.64      0.75      0.69       241
   macro avg       0.57      0.68      0.62       241
weighted avg       0.65      0.75      0.70       241

Elapsed time0.46401300000002266seconds
Iteration: 18




              precision    recall  f1-score   support

     AGAINST       0.73      0.81      0.77       183
       FAVOR       0.40      0.53      0.46        58

   micro avg       0.64      0.75      0.69       241
   macro avg       0.56      0.67      0.61       241
weighted avg       0.65      0.75      0.69       241

Elapsed time0.46658899999999903seconds
Iteration: 19




              precision    recall  f1-score   support

     AGAINST       0.74      0.82      0.78       183
       FAVOR       0.42      0.55      0.47        58

   micro avg       0.65      0.76      0.70       241
   macro avg       0.58      0.69      0.63       241
weighted avg       0.66      0.76      0.70       241

Elapsed time0.4615699999999947seconds




In [50]:
nlp = train_spacy(training_data, 20, test_texts, test_cats, "simple_cnn")

Training the model...
LOSS 	  P  	  R  	  F  
Iteration: 0


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.64      1.00      0.78       183
       FAVOR       0.00      0.00      0.00        58

   micro avg       0.64      0.76      0.70       241
   macro avg       0.32      0.50      0.39       241
weighted avg       0.49      0.76      0.59       241

Elapsed time2.640264000000002seconds
Iteration: 1
              precision    recall  f1-score   support

     AGAINST       0.65      0.92      0.76       183
       FAVOR       0.16      0.07      0.10        58

   micro avg       0.61      0.72      0.66       241
   macro avg       0.41      0.50      0.43       241
weighted avg       0.53      0.72      0.60       241

Elapsed time2.7376010000000406seconds
Iteration: 2




              precision    recall  f1-score   support

     AGAINST       0.66      0.80      0.72       183
       FAVOR       0.21      0.22      0.22        58

   micro avg       0.56      0.66      0.61       241
   macro avg       0.44      0.51      0.47       241
weighted avg       0.55      0.66      0.60       241

Elapsed time2.7256909999999834seconds
Iteration: 3




              precision    recall  f1-score   support

     AGAINST       0.71      0.65      0.68       183
       FAVOR       0.29      0.55      0.38        58

   micro avg       0.54      0.63      0.58       241
   macro avg       0.50      0.60      0.53       241
weighted avg       0.61      0.63      0.61       241

Elapsed time2.730004000000008seconds
Iteration: 4




              precision    recall  f1-score   support

     AGAINST       0.74      0.63      0.68       183
       FAVOR       0.30      0.45      0.36        58

   micro avg       0.58      0.59      0.58       241
   macro avg       0.52      0.54      0.52       241
weighted avg       0.63      0.59      0.60       241

Elapsed time2.774460999999974seconds
Iteration: 5




              precision    recall  f1-score   support

     AGAINST       0.75      0.64      0.69       183
       FAVOR       0.29      0.45      0.35        58

   micro avg       0.58      0.59      0.59       241
   macro avg       0.52      0.54      0.52       241
weighted avg       0.64      0.59      0.61       241

Elapsed time2.7664309999999546seconds
Iteration: 6




              precision    recall  f1-score   support

     AGAINST       0.73      0.63      0.68       183
       FAVOR       0.32      0.52      0.40        58

   micro avg       0.58      0.61      0.59       241
   macro avg       0.53      0.58      0.54       241
weighted avg       0.63      0.61      0.61       241

Elapsed time2.746117999999967seconds
Iteration: 7




              precision    recall  f1-score   support

     AGAINST       0.72      0.67      0.69       183
       FAVOR       0.32      0.50      0.39        58

   micro avg       0.58      0.63      0.60       241
   macro avg       0.52      0.58      0.54       241
weighted avg       0.62      0.63      0.62       241

Elapsed time2.6709769999999935seconds
Iteration: 8




              precision    recall  f1-score   support

     AGAINST       0.74      0.62      0.67       183
       FAVOR       0.32      0.57      0.41        58

   micro avg       0.57      0.61      0.59       241
   macro avg       0.53      0.60      0.54       241
weighted avg       0.64      0.61      0.61       241

Elapsed time1.8595980000000054seconds
Iteration: 9




              precision    recall  f1-score   support

     AGAINST       0.72      0.63      0.67       183
       FAVOR       0.33      0.57      0.42        58

   micro avg       0.57      0.61      0.59       241
   macro avg       0.53      0.60      0.55       241
weighted avg       0.63      0.61      0.61       241

Elapsed time1.8555989999999838seconds
Iteration: 10




              precision    recall  f1-score   support

     AGAINST       0.73      0.62      0.67       183
       FAVOR       0.34      0.62      0.44        58

   micro avg       0.57      0.62      0.59       241
   macro avg       0.53      0.62      0.55       241
weighted avg       0.63      0.62      0.61       241

Elapsed time1.8774980000000028seconds
Iteration: 11




              precision    recall  f1-score   support

     AGAINST       0.73      0.60      0.66       183
       FAVOR       0.32      0.59      0.41        58

   micro avg       0.56      0.60      0.58       241
   macro avg       0.52      0.59      0.54       241
weighted avg       0.63      0.60      0.60       241

Elapsed time1.8792040000000156seconds
Iteration: 12




              precision    recall  f1-score   support

     AGAINST       0.74      0.61      0.66       183
       FAVOR       0.33      0.60      0.43        58

   micro avg       0.57      0.61      0.59       241
   macro avg       0.53      0.61      0.55       241
weighted avg       0.64      0.61      0.61       241

Elapsed time1.8491149999999834seconds
Iteration: 13




              precision    recall  f1-score   support

     AGAINST       0.72      0.61      0.66       183
       FAVOR       0.33      0.59      0.42        58

   micro avg       0.56      0.60      0.58       241
   macro avg       0.52      0.60      0.54       241
weighted avg       0.63      0.60      0.60       241

Elapsed time1.8863810000000285seconds
Iteration: 14




              precision    recall  f1-score   support

     AGAINST       0.74      0.60      0.66       183
       FAVOR       0.32      0.60      0.42        58

   micro avg       0.57      0.60      0.58       241
   macro avg       0.53      0.60      0.54       241
weighted avg       0.64      0.60      0.61       241

Elapsed time1.7969899999999939seconds
Iteration: 15




              precision    recall  f1-score   support

     AGAINST       0.74      0.60      0.66       183
       FAVOR       0.31      0.60      0.41        58

   micro avg       0.56      0.60      0.58       241
   macro avg       0.53      0.60      0.54       241
weighted avg       0.64      0.60      0.60       241

Elapsed time1.8241870000000517seconds
Iteration: 16




              precision    recall  f1-score   support

     AGAINST       0.74      0.61      0.67       183
       FAVOR       0.32      0.60      0.42        58

   micro avg       0.56      0.61      0.59       241
   macro avg       0.53      0.61      0.54       241
weighted avg       0.64      0.61      0.61       241

Elapsed time1.8247279999999932seconds
Iteration: 17




              precision    recall  f1-score   support

     AGAINST       0.73      0.62      0.67       183
       FAVOR       0.32      0.57      0.41        58

   micro avg       0.57      0.61      0.59       241
   macro avg       0.53      0.60      0.54       241
weighted avg       0.63      0.61      0.61       241

Elapsed time1.8373759999999493seconds
Iteration: 18




              precision    recall  f1-score   support

     AGAINST       0.74      0.62      0.67       183
       FAVOR       0.32      0.59      0.41        58

   micro avg       0.57      0.61      0.59       241
   macro avg       0.53      0.60      0.54       241
weighted avg       0.64      0.61      0.61       241

Elapsed time1.848854000000017seconds
Iteration: 19




              precision    recall  f1-score   support

     AGAINST       0.73      0.62      0.67       183
       FAVOR       0.31      0.59      0.41        58

   micro avg       0.56      0.61      0.58       241
   macro avg       0.52      0.60      0.54       241
weighted avg       0.63      0.61      0.61       241

Elapsed time2.1596099999999865seconds




In [51]:
textcat_bow = spacy.load("/content/drive/MyDrive/2022-ILTAPP/resources/bow_Feminism_Stance_Semeval2016")
tweets = textcat_bow(test_texts[10])
print("Text: "+ test_texts[10])
print("Gold Label:"+ test_cats[10])
print(" Predicted Label:") 
print(tweets.cats)
print("=======================================")

Text: sometiimes you just feel like punching a feminist in the face SemST
Gold Label:AGAINST
 Predicted Label:
{'AGAINST': 0.4215056300163269, 'FAVOR': 0.36623576283454895, 'NONE': 0.21225865185260773}


# ASSIGNMENTS

1. TODO Train the classifiers for the other 4 targets in the Stance SemEval 2016 dataset.


In [52]:
training_data, train_texts, train_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/train.Atheism.tsv')
test_data, test_texts, test_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/test.Atheism.tsv')
nlp = train_spacy(training_data, 20, test_texts, test_cats, "simple_cnn")

training_data, train_texts, train_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/train.Hillary Clinton.tsv')
test_data, test_texts, test_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/test.Hillary Clinton.tsv')
nlp = train_spacy(training_data, 20, test_texts, test_cats, "simple_cnn")

training_data, train_texts, train_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/train.Legalization of Abortion.tsv')
test_data, test_texts, test_cats = load_data_spacy('/content/drive/My Drive/2022-ILTAPP/datasets/stance-semeval2016/test.Legalization of Abortion.tsv')
nlp = train_spacy(training_data, 20, test_texts, test_cats, "simple_cnn")

training_data, train_texts, train_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/train.Climate Change is a Real Concern.tsv')
test_data, test_texts, test_cats = load_data_spacy('/content/drive/MyDrive/2022-ILTAPP/datasets/stance-semeval2016/test.Climate Change is a Real Concern.tsv')
nlp = train_spacy(training_data, 20, test_texts, test_cats, "simple_cnn")




AGAINST    304
NONE       117
FAVOR       92
Name: Stance, dtype: int64
AGAINST    160
FAVOR       32
NONE        28
Name: Stance, dtype: int64
Training the model...
LOSS 	  P  	  R  	  F  
Iteration: 0


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.73      1.00      0.84       160
       FAVOR       0.00      0.00      0.00        32

   micro avg       0.73      0.83      0.78       192
   macro avg       0.36      0.50      0.42       192
weighted avg       0.61      0.83      0.70       192

Elapsed time1.7056100000000356seconds
Iteration: 1


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.73      1.00      0.84       160
       FAVOR       0.00      0.00      0.00        32

   micro avg       0.73      0.83      0.78       192
   macro avg       0.36      0.50      0.42       192
weighted avg       0.61      0.83      0.70       192

Elapsed time1.5686019999999985seconds
Iteration: 2
              precision    recall  f1-score   support

     AGAINST       0.73      0.99      0.84       160
       FAVOR       0.00      0.00      0.00        32

   micro avg       0.73      0.83      0.78       192
   macro avg       0.37      0.50      0.42       192
weighted avg       0.61      0.83      0.70       192

Elapsed time1.4718260000000214seconds
Iteration: 3




              precision    recall  f1-score   support

     AGAINST       0.81      0.91      0.86       160
       FAVOR       0.50      0.16      0.24        32

   micro avg       0.79      0.79      0.79       192
   macro avg       0.65      0.53      0.55       192
weighted avg       0.76      0.79      0.75       192

Elapsed time1.4896009999999933seconds
Iteration: 4




              precision    recall  f1-score   support

     AGAINST       0.85      0.76      0.80       160
       FAVOR       0.75      0.19      0.30        32

   micro avg       0.84      0.67      0.74       192
   macro avg       0.80      0.47      0.55       192
weighted avg       0.83      0.67      0.72       192

Elapsed time1.4537689999999657seconds
Iteration: 5




              precision    recall  f1-score   support

     AGAINST       0.83      0.81      0.82       160
       FAVOR       1.00      0.06      0.12        32

   micro avg       0.84      0.69      0.75       192
   macro avg       0.92      0.44      0.47       192
weighted avg       0.86      0.69      0.71       192

Elapsed time1.4697100000000205seconds
Iteration: 6




              precision    recall  f1-score   support

     AGAINST       0.83      0.78      0.80       160
       FAVOR       0.50      0.09      0.16        32

   micro avg       0.82      0.66      0.73       192
   macro avg       0.67      0.43      0.48       192
weighted avg       0.78      0.66      0.70       192

Elapsed time1.432657000000006seconds
Iteration: 7




              precision    recall  f1-score   support

     AGAINST       0.85      0.72      0.78       160
       FAVOR       0.41      0.22      0.29        32

   micro avg       0.80      0.64      0.71       192
   macro avg       0.63      0.47      0.53       192
weighted avg       0.77      0.64      0.70       192

Elapsed time1.4379260000000045seconds
Iteration: 8




              precision    recall  f1-score   support

     AGAINST       0.83      0.75      0.79       160
       FAVOR       0.32      0.25      0.28        32

   micro avg       0.76      0.67      0.71       192
   macro avg       0.58      0.50      0.54       192
weighted avg       0.75      0.67      0.70       192

Elapsed time1.457695000000001seconds
Iteration: 9




              precision    recall  f1-score   support

     AGAINST       0.84      0.71      0.77       160
       FAVOR       0.33      0.28      0.31        32

   micro avg       0.75      0.64      0.69       192
   macro avg       0.59      0.49      0.54       192
weighted avg       0.75      0.64      0.69       192

Elapsed time1.4472659999999564seconds
Iteration: 10




              precision    recall  f1-score   support

     AGAINST       0.83      0.73      0.78       160
       FAVOR       0.35      0.28      0.31        32

   micro avg       0.75      0.66      0.70       192
   macro avg       0.59      0.51      0.54       192
weighted avg       0.75      0.66      0.70       192

Elapsed time1.4661969999999656seconds
Iteration: 11




              precision    recall  f1-score   support

     AGAINST       0.83      0.76      0.79       160
       FAVOR       0.29      0.28      0.29        32

   micro avg       0.73      0.68      0.70       192
   macro avg       0.56      0.52      0.54       192
weighted avg       0.74      0.68      0.71       192

Elapsed time1.4260249999999814seconds
Iteration: 12




              precision    recall  f1-score   support

     AGAINST       0.83      0.74      0.78       160
       FAVOR       0.33      0.25      0.29        32

   micro avg       0.76      0.66      0.71       192
   macro avg       0.58      0.50      0.53       192
weighted avg       0.74      0.66      0.70       192

Elapsed time2.1125529999999912seconds
Iteration: 13




              precision    recall  f1-score   support

     AGAINST       0.82      0.77      0.79       160
       FAVOR       0.31      0.28      0.30        32

   micro avg       0.74      0.69      0.71       192
   macro avg       0.57      0.53      0.54       192
weighted avg       0.74      0.69      0.71       192

Elapsed time2.150938999999994seconds
Iteration: 14




              precision    recall  f1-score   support

     AGAINST       0.82      0.74      0.78       160
       FAVOR       0.26      0.22      0.24        32

   micro avg       0.73      0.66      0.69       192
   macro avg       0.54      0.48      0.51       192
weighted avg       0.73      0.66      0.69       192

Elapsed time2.1509849999999915seconds
Iteration: 15




              precision    recall  f1-score   support

     AGAINST       0.82      0.76      0.79       160
       FAVOR       0.27      0.28      0.28        32

   micro avg       0.72      0.68      0.70       192
   macro avg       0.55      0.52      0.53       192
weighted avg       0.73      0.68      0.70       192

Elapsed time2.1743470000000116seconds
Iteration: 16




              precision    recall  f1-score   support

     AGAINST       0.82      0.76      0.79       160
       FAVOR       0.27      0.25      0.26        32

   micro avg       0.72      0.67      0.70       192
   macro avg       0.54      0.50      0.52       192
weighted avg       0.73      0.67      0.70       192

Elapsed time2.1700889999999617seconds
Iteration: 17




              precision    recall  f1-score   support

     AGAINST       0.81      0.74      0.77       160
       FAVOR       0.25      0.25      0.25        32

   micro avg       0.71      0.66      0.68       192
   macro avg       0.53      0.49      0.51       192
weighted avg       0.72      0.66      0.69       192

Elapsed time2.193768000000034seconds
Iteration: 18




              precision    recall  f1-score   support

     AGAINST       0.81      0.74      0.77       160
       FAVOR       0.29      0.25      0.27        32

   micro avg       0.73      0.66      0.69       192
   macro avg       0.55      0.49      0.52       192
weighted avg       0.73      0.66      0.69       192

Elapsed time1.893213000000003seconds
Iteration: 19




              precision    recall  f1-score   support

     AGAINST       0.82      0.76      0.79       160
       FAVOR       0.32      0.25      0.28        32

   micro avg       0.75      0.68      0.71       192
   macro avg       0.57      0.51      0.54       192
weighted avg       0.74      0.68      0.70       192

Elapsed time1.4913310000000024seconds
AGAINST    393
NONE       178
FAVOR      118
Name: Stance, dtype: int64
AGAINST    172
NONE        78
FAVOR       45
Name: Stance, dtype: int64




Training the model...
LOSS 	  P  	  R  	  F  
Iteration: 0


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.58      1.00      0.74       172
       FAVOR       0.00      0.00      0.00        45

   micro avg       0.58      0.79      0.67       217
   macro avg       0.29      0.50      0.37       217
weighted avg       0.46      0.79      0.58       217

Elapsed time2.2387539999999717seconds
Iteration: 1


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.58      1.00      0.74       172
       FAVOR       0.00      0.00      0.00        45

   micro avg       0.58      0.79      0.67       217
   macro avg       0.29      0.50      0.37       217
weighted avg       0.46      0.79      0.58       217

Elapsed time1.893498999999963seconds
Iteration: 2
              precision    recall  f1-score   support

     AGAINST       0.59      0.98      0.74       172
       FAVOR       0.40      0.04      0.08        45

   micro avg       0.59      0.79      0.67       217
   macro avg       0.50      0.51      0.41       217
weighted avg       0.55      0.79      0.60       217

Elapsed time1.8816529999999716seconds
Iteration: 3




              precision    recall  f1-score   support

     AGAINST       0.63      0.94      0.75       172
       FAVOR       0.54      0.16      0.24        45

   micro avg       0.62      0.77      0.69       217
   macro avg       0.58      0.55      0.50       217
weighted avg       0.61      0.77      0.65       217

Elapsed time1.871313999999984seconds
Iteration: 4




              precision    recall  f1-score   support

     AGAINST       0.66      0.80      0.72       172
       FAVOR       0.62      0.29      0.39        45

   micro avg       0.65      0.69      0.67       217
   macro avg       0.64      0.54      0.56       217
weighted avg       0.65      0.69      0.65       217

Elapsed time1.8828179999999861seconds
Iteration: 5




              precision    recall  f1-score   support

     AGAINST       0.66      0.78      0.72       172
       FAVOR       0.56      0.20      0.30        45

   micro avg       0.65      0.66      0.66       217
   macro avg       0.61      0.49      0.51       217
weighted avg       0.64      0.66      0.63       217

Elapsed time1.8603919999999903seconds
Iteration: 6




              precision    recall  f1-score   support

     AGAINST       0.68      0.84      0.75       172
       FAVOR       0.65      0.29      0.40        45

   micro avg       0.67      0.72      0.70       217
   macro avg       0.66      0.56      0.57       217
weighted avg       0.67      0.72      0.68       217

Elapsed time1.8335429999999633seconds
Iteration: 7




              precision    recall  f1-score   support

     AGAINST       0.66      0.84      0.74       172
       FAVOR       0.58      0.24      0.34        45

   micro avg       0.66      0.71      0.68       217
   macro avg       0.62      0.54      0.54       217
weighted avg       0.65      0.71      0.66       217

Elapsed time1.8330799999999954seconds
Iteration: 8




              precision    recall  f1-score   support

     AGAINST       0.68      0.85      0.75       172
       FAVOR       0.57      0.27      0.36        45

   micro avg       0.67      0.73      0.70       217
   macro avg       0.62      0.56      0.56       217
weighted avg       0.65      0.73      0.67       217

Elapsed time1.839542999999992seconds
Iteration: 9




              precision    recall  f1-score   support

     AGAINST       0.68      0.84      0.75       172
       FAVOR       0.50      0.20      0.29        45

   micro avg       0.67      0.71      0.69       217
   macro avg       0.59      0.52      0.52       217
weighted avg       0.64      0.71      0.66       217

Elapsed time1.8609050000000025seconds
Iteration: 10




              precision    recall  f1-score   support

     AGAINST       0.69      0.84      0.76       172
       FAVOR       0.53      0.20      0.29        45

   micro avg       0.68      0.71      0.69       217
   macro avg       0.61      0.52      0.52       217
weighted avg       0.65      0.71      0.66       217

Elapsed time1.8345980000000282seconds
Iteration: 11




              precision    recall  f1-score   support

     AGAINST       0.67      0.84      0.75       172
       FAVOR       0.45      0.20      0.28        45

   micro avg       0.65      0.71      0.68       217
   macro avg       0.56      0.52      0.51       217
weighted avg       0.62      0.71      0.65       217

Elapsed time1.8402449999999817seconds
Iteration: 12




              precision    recall  f1-score   support

     AGAINST       0.69      0.84      0.76       172
       FAVOR       0.52      0.27      0.35        45

   micro avg       0.67      0.72      0.69       217
   macro avg       0.61      0.55      0.55       217
weighted avg       0.65      0.72      0.67       217

Elapsed time1.7934890000000223seconds
Iteration: 13




              precision    recall  f1-score   support

     AGAINST       0.67      0.83      0.74       172
       FAVOR       0.48      0.24      0.32        45

   micro avg       0.65      0.71      0.68       217
   macro avg       0.57      0.54      0.53       217
weighted avg       0.63      0.71      0.66       217

Elapsed time1.811302000000012seconds
Iteration: 14




              precision    recall  f1-score   support

     AGAINST       0.67      0.83      0.74       172
       FAVOR       0.45      0.22      0.30        45

   micro avg       0.65      0.70      0.67       217
   macro avg       0.56      0.52      0.52       217
weighted avg       0.62      0.70      0.65       217

Elapsed time1.8478759999999852seconds
Iteration: 15




              precision    recall  f1-score   support

     AGAINST       0.66      0.83      0.73       172
       FAVOR       0.48      0.24      0.32        45

   micro avg       0.64      0.71      0.67       217
   macro avg       0.57      0.54      0.53       217
weighted avg       0.62      0.71      0.65       217

Elapsed time1.81551300000001seconds
Iteration: 16




              precision    recall  f1-score   support

     AGAINST       0.67      0.83      0.74       172
       FAVOR       0.52      0.27      0.35        45

   micro avg       0.66      0.71      0.68       217
   macro avg       0.60      0.55      0.55       217
weighted avg       0.64      0.71      0.66       217

Elapsed time1.8223350000000096seconds
Iteration: 17




              precision    recall  f1-score   support

     AGAINST       0.67      0.85      0.75       172
       FAVOR       0.50      0.22      0.31        45

   micro avg       0.65      0.72      0.68       217
   macro avg       0.58      0.54      0.53       217
weighted avg       0.63      0.72      0.66       217

Elapsed time1.9637140000000386seconds
Iteration: 18




              precision    recall  f1-score   support

     AGAINST       0.66      0.87      0.75       172
       FAVOR       0.47      0.18      0.26        45

   micro avg       0.65      0.73      0.69       217
   macro avg       0.57      0.52      0.50       217
weighted avg       0.62      0.73      0.65       217

Elapsed time1.8144910000000323seconds
Iteration: 19




              precision    recall  f1-score   support

     AGAINST       0.67      0.87      0.76       172
       FAVOR       0.53      0.22      0.31        45

   micro avg       0.66      0.74      0.69       217
   macro avg       0.60      0.55      0.53       217
weighted avg       0.64      0.74      0.66       217

Elapsed time1.836274000000003seconds
AGAINST    355
NONE       177
FAVOR      121
Name: Stance, dtype: int64
AGAINST    189
FAVOR       46
NONE        45
Name: Stance, dtype: int64




Training the model...
LOSS 	  P  	  R  	  F  
Iteration: 0


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.68      1.00      0.81       189
       FAVOR       0.00      0.00      0.00        46

   micro avg       0.68      0.80      0.73       235
   macro avg       0.34      0.50      0.40       235
weighted avg       0.54      0.80      0.65       235

Elapsed time2.259096999999997seconds
Iteration: 1


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.68      1.00      0.81       189
       FAVOR       0.00      0.00      0.00        46

   micro avg       0.68      0.80      0.73       235
   macro avg       0.34      0.50      0.40       235
weighted avg       0.54      0.80      0.65       235

Elapsed time1.9088109999999574seconds
Iteration: 2
              precision    recall  f1-score   support

     AGAINST       0.68      0.99      0.81       189
       FAVOR       0.33      0.02      0.04        46

   micro avg       0.68      0.80      0.74       235
   macro avg       0.51      0.51      0.42       235
weighted avg       0.61      0.80      0.66       235

Elapsed time1.8400769999999511seconds
Iteration: 3




              precision    recall  f1-score   support

     AGAINST       0.73      0.86      0.79       189
       FAVOR       0.55      0.26      0.35        46

   micro avg       0.71      0.74      0.73       235
   macro avg       0.64      0.56      0.57       235
weighted avg       0.69      0.74      0.70       235

Elapsed time1.76325700000001seconds
Iteration: 4




              precision    recall  f1-score   support

     AGAINST       0.73      0.82      0.78       189
       FAVOR       0.50      0.33      0.39        46

   micro avg       0.71      0.72      0.71       235
   macro avg       0.62      0.57      0.58       235
weighted avg       0.69      0.72      0.70       235

Elapsed time1.8040100000000052seconds
Iteration: 5




              precision    recall  f1-score   support

     AGAINST       0.75      0.67      0.71       189
       FAVOR       0.44      0.41      0.43        46

   micro avg       0.69      0.62      0.65       235
   macro avg       0.60      0.54      0.57       235
weighted avg       0.69      0.62      0.65       235

Elapsed time1.9151360000000182seconds
Iteration: 6




              precision    recall  f1-score   support

     AGAINST       0.74      0.67      0.70       189
       FAVOR       0.46      0.50      0.48        46

   micro avg       0.68      0.63      0.65       235
   macro avg       0.60      0.58      0.59       235
weighted avg       0.69      0.63      0.66       235

Elapsed time1.8508770000000254seconds
Iteration: 7




              precision    recall  f1-score   support

     AGAINST       0.76      0.67      0.71       189
       FAVOR       0.42      0.37      0.40        46

   micro avg       0.69      0.61      0.65       235
   macro avg       0.59      0.52      0.55       235
weighted avg       0.69      0.61      0.65       235

Elapsed time1.832808seconds
Iteration: 8




              precision    recall  f1-score   support

     AGAINST       0.77      0.67      0.72       189
       FAVOR       0.47      0.52      0.49        46

   micro avg       0.70      0.64      0.67       235
   macro avg       0.62      0.59      0.61       235
weighted avg       0.71      0.64      0.67       235

Elapsed time1.7941690000000108seconds
Iteration: 9




              precision    recall  f1-score   support

     AGAINST       0.76      0.70      0.73       189
       FAVOR       0.45      0.43      0.44        46

   micro avg       0.70      0.65      0.67       235
   macro avg       0.61      0.57      0.59       235
weighted avg       0.70      0.65      0.67       235

Elapsed time1.7881379999999467seconds
Iteration: 10




              precision    recall  f1-score   support

     AGAINST       0.76      0.74      0.75       189
       FAVOR       0.50      0.43      0.47        46

   micro avg       0.71      0.68      0.70       235
   macro avg       0.63      0.59      0.61       235
weighted avg       0.71      0.68      0.69       235

Elapsed time1.815082000000018seconds
Iteration: 11




              precision    recall  f1-score   support

     AGAINST       0.76      0.71      0.74       189
       FAVOR       0.48      0.43      0.45        46

   micro avg       0.70      0.66      0.68       235
   macro avg       0.62      0.57      0.60       235
weighted avg       0.70      0.66      0.68       235

Elapsed time1.838596999999993seconds
Iteration: 12




              precision    recall  f1-score   support

     AGAINST       0.76      0.70      0.73       189
       FAVOR       0.49      0.39      0.43        46

   micro avg       0.71      0.64      0.67       235
   macro avg       0.62      0.54      0.58       235
weighted avg       0.71      0.64      0.67       235

Elapsed time1.8633060000000228seconds
Iteration: 13




              precision    recall  f1-score   support

     AGAINST       0.74      0.69      0.72       189
       FAVOR       0.44      0.43      0.44        46

   micro avg       0.68      0.64      0.66       235
   macro avg       0.59      0.56      0.58       235
weighted avg       0.68      0.64      0.66       235

Elapsed time1.83201600000001seconds
Iteration: 14




              precision    recall  f1-score   support

     AGAINST       0.74      0.71      0.72       189
       FAVOR       0.46      0.35      0.40        46

   micro avg       0.69      0.64      0.66       235
   macro avg       0.60      0.53      0.56       235
weighted avg       0.68      0.64      0.66       235

Elapsed time1.7821309999999926seconds
Iteration: 15




              precision    recall  f1-score   support

     AGAINST       0.74      0.68      0.71       189
       FAVOR       0.46      0.37      0.41        46

   micro avg       0.69      0.62      0.65       235
   macro avg       0.60      0.52      0.56       235
weighted avg       0.68      0.62      0.65       235

Elapsed time1.7742059999999924seconds
Iteration: 16




              precision    recall  f1-score   support

     AGAINST       0.75      0.68      0.71       189
       FAVOR       0.45      0.41      0.43        46

   micro avg       0.69      0.63      0.66       235
   macro avg       0.60      0.55      0.57       235
weighted avg       0.69      0.63      0.66       235

Elapsed time1.7956179999999904seconds
Iteration: 17




              precision    recall  f1-score   support

     AGAINST       0.75      0.67      0.70       189
       FAVOR       0.42      0.41      0.42        46

   micro avg       0.68      0.62      0.65       235
   macro avg       0.58      0.54      0.56       235
weighted avg       0.68      0.62      0.65       235

Elapsed time1.80948699999999seconds
Iteration: 18




              precision    recall  f1-score   support

     AGAINST       0.74      0.65      0.69       189
       FAVOR       0.41      0.41      0.41        46

   micro avg       0.67      0.60      0.64       235
   macro avg       0.58      0.53      0.55       235
weighted avg       0.68      0.60      0.64       235

Elapsed time1.8165609999999788seconds
Iteration: 19




              precision    recall  f1-score   support

     AGAINST       0.73      0.63      0.68       189
       FAVOR       0.37      0.37      0.37        46

   micro avg       0.65      0.58      0.61       235
   macro avg       0.55      0.50      0.52       235
weighted avg       0.66      0.58      0.62       235

Elapsed time1.7762590000000387seconds
FAVOR      212
NONE       168
AGAINST     15
Name: Stance, dtype: int64
FAVOR      123
NONE        35
AGAINST     11
Name: Stance, dtype: int64




Training the model...
LOSS 	  P  	  R  	  F  
Iteration: 0


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.73      1.00      0.84       123

   micro avg       0.73      0.92      0.81       134
   macro avg       0.36      0.50      0.42       134
weighted avg       0.67      0.92      0.77       134

Elapsed time1.3291760000000181seconds
Iteration: 1


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.73      1.00      0.84       123

   micro avg       0.73      0.92      0.81       134
   macro avg       0.36      0.50      0.42       134
weighted avg       0.67      0.92      0.77       134

Elapsed time1.0562549999999646seconds
Iteration: 2


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.73      0.99      0.84       123

   micro avg       0.73      0.91      0.81       134
   macro avg       0.36      0.50      0.42       134
weighted avg       0.67      0.91      0.77       134

Elapsed time1.0612829999999462seconds
Iteration: 3


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.73      1.00      0.84       123

   micro avg       0.73      0.92      0.81       134
   macro avg       0.36      0.50      0.42       134
weighted avg       0.67      0.92      0.77       134

Elapsed time1.0591140000000223seconds
Iteration: 4


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.79      0.94      0.86       123

   micro avg       0.79      0.87      0.83       134
   macro avg       0.40      0.47      0.43       134
weighted avg       0.73      0.87      0.79       134

Elapsed time1.085694999999987seconds
Iteration: 5


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.83      0.86      0.85       123

   micro avg       0.83      0.79      0.81       134
   macro avg       0.42      0.43      0.42       134
weighted avg       0.77      0.79      0.78       134

Elapsed time1.0753109999999992seconds
Iteration: 6


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.84      0.83      0.84       123

   micro avg       0.84      0.76      0.80       134
   macro avg       0.42      0.41      0.42       134
weighted avg       0.77      0.76      0.77       134

Elapsed time1.0724190000000249seconds
Iteration: 7


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.84      0.79      0.82       123

   micro avg       0.84      0.72      0.78       134
   macro avg       0.42      0.39      0.41       134
weighted avg       0.77      0.72      0.75       134

Elapsed time1.0627479999999991seconds
Iteration: 8


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.82      0.84      0.83       123

   micro avg       0.82      0.77      0.79       134
   macro avg       0.41      0.42      0.41       134
weighted avg       0.75      0.77      0.76       134

Elapsed time1.0763790000000313seconds
Iteration: 9
              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.81      0.81      0.81       123

   micro avg       0.81      0.75      0.78       134
   macro avg       0.41      0.41      0.41       134
weighted avg       0.75      0.75      0.75       134

Elapsed time1.056214999999952seconds
Iteration: 10




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.79      0.85      0.82       123

   micro avg       0.78      0.78      0.78       134
   macro avg       0.39      0.42      0.41       134
weighted avg       0.72      0.78      0.75       134

Elapsed time1.085696999999982seconds
Iteration: 11




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.82      0.77      0.79       123

   micro avg       0.81      0.71      0.76       134
   macro avg       0.41      0.39      0.40       134
weighted avg       0.75      0.71      0.73       134

Elapsed time1.0479990000000043seconds
Iteration: 12




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.79      0.84      0.81       123

   micro avg       0.78      0.77      0.77       134
   macro avg       0.39      0.42      0.41       134
weighted avg       0.72      0.77      0.74       134

Elapsed time1.3830659999999853seconds
Iteration: 13




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.81      0.82      0.82       123

   micro avg       0.81      0.75      0.78       134
   macro avg       0.41      0.41      0.41       134
weighted avg       0.75      0.75      0.75       134

Elapsed time1.0642539999999485seconds
Iteration: 14




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.82      0.83      0.82       123

   micro avg       0.81      0.76      0.78       134
   macro avg       0.41      0.41      0.41       134
weighted avg       0.75      0.76      0.76       134

Elapsed time1.0737340000000017seconds
Iteration: 15




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.79      0.84      0.81       123

   micro avg       0.79      0.77      0.78       134
   macro avg       0.40      0.42      0.41       134
weighted avg       0.73      0.77      0.75       134

Elapsed time1.074294000000009seconds
Iteration: 16




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.79      0.84      0.81       123

   micro avg       0.78      0.77      0.77       134
   macro avg       0.39      0.42      0.41       134
weighted avg       0.72      0.77      0.74       134

Elapsed time1.1541579999999954seconds
Iteration: 17




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.79      0.84      0.81       123

   micro avg       0.78      0.77      0.77       134
   macro avg       0.39      0.42      0.41       134
weighted avg       0.72      0.77      0.74       134

Elapsed time1.0476540000000227seconds
Iteration: 18




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.78      0.85      0.82       123

   micro avg       0.78      0.78      0.78       134
   macro avg       0.39      0.43      0.41       134
weighted avg       0.72      0.78      0.75       134

Elapsed time1.043875000000014seconds
Iteration: 19




              precision    recall  f1-score   support

     AGAINST       0.00      0.00      0.00        11
       FAVOR       0.78      0.85      0.81       123

   micro avg       0.78      0.78      0.78       134
   macro avg       0.39      0.42      0.41       134
weighted avg       0.72      0.78      0.75       134

Elapsed time1.0551209999999855seconds




2. TODO Reuse the above code to train a new classifier for fake news using the celebrity and the fake news datasets: 

  Data: "/content/drive/My Drive/Colab Notebooks/2022-ILTAPP/datasets/fake_rada"

  2.1 HINT: You need to (i) load the data into a pandas dataframe; (ii) modify the labels from the converter and training functions.

  2.2 HINT:Once you have a pandas dataframe, it is easy to split the data into 80% for training and 20% for testing.

In [53]:
def Sort(sub_li):
  # reverse = True (Soresulting_list = list(first_list)rts in Descending  order) 
  # key is set to sort using second element of  
  # sublist lambda has been used 
  return(sorted(sub_li, key = lambda x: x[1],reverse=True))  

# run the predictions on each sentence in the evaluation  dataset, and return the metrics
def evaluate(tokenizer, textcat, test_texts, test_cats ):
  docs = (tokenizer(text) for text in test_texts)
  preds = []
  for i, doc in enumerate(textcat.pipe(docs)):
    #print(doc.cats.items())
    scores = Sort(doc.cats.items())
    #print(scores)
    catList=[]
    for score in scores:
      catList.append(score[0])
    preds.append(catList[0])
        
  labels = ['fake', 'legit']
  print(classification_report(test_cats, preds,labels=labels))

def train_spacy(train_data, iterations,test_texts,test_cats, model_arch, dropout = 0.3, model=None, init_tok2vec=None):
    ''' Train a spacy NER model, which can be queried against with test data
   
    train_data : training data in the format of (sentence, {cats: ['AGAINST'|'FAVOR'|'NONE']})
    labels : a list of unique annotations
    iterations : number of training iterations
    dropout : dropout proportion for training
    display_freq : number of epochs between logging losses to console
    '''
    
    nlp = spacy.load('en_core_web_sm')
    

    # add the text classifier to the pipeline if it doesn't exist
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "textcat" not in nlp.pipe_names:
        textcat = nlp.create_pipe(
            "textcat", config={"exclusive_classes": True, "architecture": model_arch}
        )
        nlp.add_pipe(textcat, last=True)
        
    # otherwise, get it, so we can add labels to it
    else:
        textcat = nlp.get_pipe("textcat")

    # add label to text classifier
    textcat.add_label("fake")
    textcat.add_label("legit")

    # get names of other pipes to disable them during training
    pipe_exceptions = ["textcat", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    with nlp.disable_pipes(*other_pipes):  # only train textcat
        optimizer = nlp.begin_training()
        if init_tok2vec is not None:
            with init_tok2vec.open("rb") as file_:
                textcat.model.tok2vec.from_bytes(file_.read())
        print("Training the model...")
        print("{:^5}\t{:^5}\t{:^5}\t{:^5}".format("LOSS", "P", "R", "F"))
        batch_sizes = compounding(16.0, 64.0, 1.5)
        for i in range(iterations):
            print('Iteration: '+str(i))
            start_time = time.clock()
            losses = {}
            # batch up the examples using spaCy's minibatch
            random.shuffle(train_data)
            batches = minibatch(train_data, size=batch_sizes)
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=dropout, losses=losses)
            with textcat.model.use_params(optimizer.averages):
                # evaluate on the test data 
                evaluate(nlp.tokenizer, textcat, test_texts,test_cats)
            print ('Elapsed time'+str(time.clock() - start_time)+  "seconds")
        with nlp.use_params(optimizer.averages):
            model_name = model_arch + "_Feminism_Stance_Semeval2016"
            filepath = "/content/drive/MyDrive/2022-ILTAPP/resources/" + model_name 
            nlp.to_disk(filepath)
    return nlp

def load_data_spacy(fname):
    # Read dataset

    # Since the file we are reading does not have the first row as column names we will specify them ourselves
    #   header = 0    -> We tell you not to take the first row as names.
    #   names = [...] -> The list of names to be used to identify the columns
    data = pd.read_csv(fname, sep='\t', encoding='utf-8', header=0, names=['Type', 'Text'])
    #data = pd.read_csv(fname, sep='\t', encoding='utf-8')

    # Split data
    
    training_data, test_data = train_test_split(data, test_size=0.2)

   # Now we can directly use the names we have given to the columns to identify them
    # In case you do not know the names and want to identify them by column number, you would put this -> training_data[training_data.keys()[1]].tolist()
    training_data[training_data.keys()[1]].tolist()

    train_texts = training_data['Text'].tolist()
    train_cats = training_data['Type'].tolist()

    test_texts = test_data['Text'].tolist()
    test_cats = test_data['Type'].tolist()
         
    final_train_cats=[]
    for cat in train_cats:
      cat_list = {}
      if cat == 'fake':
        cat_list['fake'] =  1
        cat_list['legit'] =  0
     
      else:
        cat_list['fake'] =  0
        cat_list['legit'] =  1
        
      final_train_cats.append(cat_list)
      
    train_data = list(zip(train_texts, [{"cats": cats} for cats in final_train_cats]))
    # In the return we add the test separation
    return train_data, test_data, train_texts, train_cats, test_texts, test_cats



In [54]:

train_data, test_data, train_texts, train_cats, test_texts, test_cats = load_data_spacy("/content/drive/MyDrive/2022-ILTAPP/datasets/fake_rada/fake_news_full.tsv")
nlp = train_spacy(train_data, 20, test_texts, test_cats, "simple_cnn")


Training the model...
LOSS 	  P  	  R  	  F  
Iteration: 0




              precision    recall  f1-score   support

        fake       0.85      0.22      0.35        49
       legit       0.54      0.96      0.69        47

    accuracy                           0.58        96
   macro avg       0.69      0.59      0.52        96
weighted avg       0.70      0.58      0.52        96

Elapsed time5.512871000000018seconds
Iteration: 1


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

        fake       0.00      0.00      0.00        49
       legit       0.49      1.00      0.66        47

    accuracy                           0.49        96
   macro avg       0.24      0.50      0.33        96
weighted avg       0.24      0.49      0.32        96

Elapsed time4.971512000000018seconds
Iteration: 2
              precision    recall  f1-score   support

        fake       0.60      0.84      0.70        49
       legit       0.71      0.43      0.53        47

    accuracy                           0.64        96
   macro avg       0.66      0.63      0.62        96
weighted avg       0.66      0.64      0.62        96

Elapsed time5.201062000000036seconds
Iteration: 3




              precision    recall  f1-score   support

        fake       0.72      0.80      0.76        49
       legit       0.76      0.68      0.72        47

    accuracy                           0.74        96
   macro avg       0.74      0.74      0.74        96
weighted avg       0.74      0.74      0.74        96

Elapsed time4.981127000000015seconds
Iteration: 4




              precision    recall  f1-score   support

        fake       0.67      0.86      0.75        49
       legit       0.79      0.55      0.65        47

    accuracy                           0.71        96
   macro avg       0.73      0.71      0.70        96
weighted avg       0.73      0.71      0.70        96

Elapsed time5.244661000000008seconds
Iteration: 5




              precision    recall  f1-score   support

        fake       0.74      0.76      0.75        49
       legit       0.74      0.72      0.73        47

    accuracy                           0.74        96
   macro avg       0.74      0.74      0.74        96
weighted avg       0.74      0.74      0.74        96

Elapsed time4.949882000000002seconds
Iteration: 6




              precision    recall  f1-score   support

        fake       0.79      0.63      0.70        49
       legit       0.68      0.83      0.75        47

    accuracy                           0.73        96
   macro avg       0.74      0.73      0.73        96
weighted avg       0.74      0.73      0.73        96

Elapsed time4.928399000000013seconds
Iteration: 7




              precision    recall  f1-score   support

        fake       0.61      0.88      0.72        49
       legit       0.77      0.43      0.55        47

    accuracy                           0.66        96
   macro avg       0.69      0.65      0.64        96
weighted avg       0.69      0.66      0.64        96

Elapsed time4.927389000000062seconds
Iteration: 8




              precision    recall  f1-score   support

        fake       0.78      0.65      0.71        49
       legit       0.69      0.81      0.75        47

    accuracy                           0.73        96
   macro avg       0.74      0.73      0.73        96
weighted avg       0.74      0.73      0.73        96

Elapsed time4.9346910000000435seconds
Iteration: 9




              precision    recall  f1-score   support

        fake       0.65      0.86      0.74        49
       legit       0.77      0.51      0.62        47

    accuracy                           0.69        96
   macro avg       0.71      0.68      0.68        96
weighted avg       0.71      0.69      0.68        96

Elapsed time5.0361780000000635seconds
Iteration: 10




              precision    recall  f1-score   support

        fake       0.76      0.69      0.72        49
       legit       0.71      0.77      0.73        47

    accuracy                           0.73        96
   macro avg       0.73      0.73      0.73        96
weighted avg       0.73      0.73      0.73        96

Elapsed time4.9054570000000695seconds
Iteration: 11




              precision    recall  f1-score   support

        fake       0.65      0.82      0.72        49
       legit       0.74      0.53      0.62        47

    accuracy                           0.68        96
   macro avg       0.69      0.67      0.67        96
weighted avg       0.69      0.68      0.67        96

Elapsed time4.902674000000047seconds
Iteration: 12




              precision    recall  f1-score   support

        fake       0.73      0.76      0.74        49
       legit       0.73      0.70      0.72        47

    accuracy                           0.73        96
   macro avg       0.73      0.73      0.73        96
weighted avg       0.73      0.73      0.73        96

Elapsed time4.983365000000049seconds
Iteration: 13




              precision    recall  f1-score   support

        fake       0.70      0.76      0.73        49
       legit       0.72      0.66      0.69        47

    accuracy                           0.71        96
   macro avg       0.71      0.71      0.71        96
weighted avg       0.71      0.71      0.71        96

Elapsed time5.217180999999982seconds
Iteration: 14




              precision    recall  f1-score   support

        fake       0.76      0.71      0.74        49
       legit       0.72      0.77      0.74        47

    accuracy                           0.74        96
   macro avg       0.74      0.74      0.74        96
weighted avg       0.74      0.74      0.74        96

Elapsed time5.229831999999988seconds
Iteration: 15




              precision    recall  f1-score   support

        fake       0.72      0.73      0.73        49
       legit       0.72      0.70      0.71        47

    accuracy                           0.72        96
   macro avg       0.72      0.72      0.72        96
weighted avg       0.72      0.72      0.72        96

Elapsed time5.442413999999985seconds
Iteration: 16




              precision    recall  f1-score   support

        fake       0.71      0.76      0.73        49
       legit       0.73      0.68      0.70        47

    accuracy                           0.72        96
   macro avg       0.72      0.72      0.72        96
weighted avg       0.72      0.72      0.72        96

Elapsed time5.11072999999999seconds
Iteration: 17




              precision    recall  f1-score   support

        fake       0.71      0.73      0.72        49
       legit       0.71      0.68      0.70        47

    accuracy                           0.71        96
   macro avg       0.71      0.71      0.71        96
weighted avg       0.71      0.71      0.71        96

Elapsed time4.9923069999999825seconds
Iteration: 18




              precision    recall  f1-score   support

        fake       0.73      0.61      0.67        49
       legit       0.65      0.77      0.71        47

    accuracy                           0.69        96
   macro avg       0.69      0.69      0.69        96
weighted avg       0.69      0.69      0.69        96

Elapsed time4.781829000000016seconds
Iteration: 19




              precision    recall  f1-score   support

        fake       0.69      0.69      0.69        49
       legit       0.68      0.68      0.68        47

    accuracy                           0.69        96
   macro avg       0.69      0.69      0.69        96
weighted avg       0.69      0.69      0.69        96

Elapsed time4.893314999999916seconds




# ASSIGNMENTS

3. TODO Try the different spacy language models to see the difference in performance.
