# Stance Detection in Political Debates Using Deep Learning Techniques II

This is Part II of the "Stance Detection in Political Debates Using Deep Learning Techniques". Training DL models usually requires a GPU. If you do not have access to a private GPU you may use Google Colab.

Google Colab: https://colab.research.google.com/?utm_source=scs-index

If you do so, you need to upload this notebook to Colab. You also need to upload the saved data files (see Part I) to your Google Drive account (`data_path`). 

In [1]:
# needed in Colab as flair is not pre-installed
%pip install flair



You should consider upgrading via the '/home/robin/.virtualenvs/stance_course/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# mount colab to your drive
from google.colab import drive
drive.mount('/content/drive')

import io
import os
import pandas as pd
import numpy as np

from flair.data import Corpus
from flair.datasets import ClassificationCorpus
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer


ModuleNotFoundError: No module named 'google'

In [None]:
# need to be adjusted to your platform
data_path = '/content/drive/MyDrive/StanceDetection2021/data'
model_path = '/content/drive/MyDrive/StanceDetection2021/models'

## 3. Loading of Corpus

Loading the corpus is simple. We instantiate a `ClassificationCorpus` object and define all data files. 

In [None]:
corpus: Corpus = ClassificationCorpus(data_path, dev_file='dev.csv', train_file='train.csv', test_file='test.csv')

2021-07-01 09:23:24,733 Reading data from /content/drive/MyDrive/StanceDetection2021/data
2021-07-01 09:23:24,737 Train: /content/drive/MyDrive/StanceDetection2021/data/train.csv
2021-07-01 09:23:24,740 Dev: /content/drive/MyDrive/StanceDetection2021/data/dev.csv
2021-07-01 09:23:24,742 Test: /content/drive/MyDrive/StanceDetection2021/data/test.csv


In [None]:
label_dict = corpus.make_label_dictionary()

2021-07-01 09:23:42,621 Computing label dictionary. Progress:


100%|██████████| 973/973 [00:03<00:00, 284.95it/s]

2021-07-01 09:23:46,412 [b'2', b'1']





## 4. Creating of Embeddings and Classifier

Next, we define the embedding object using `TransformerDocumentEmbeddings` and a classifier using `TextClassifier`.

In [None]:
embeddings = TransformerDocumentEmbeddings('bert-base-cased', fine_tune=True)

Some weights of the model checkpoint at bert-base-german-cased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
classifier = TextClassifier(embeddings, label_dictionary=label_dict)

## 5. Model Training

Training is done using a `ModelTrainer` object and the `.train()` method.  

In [None]:
trainer = ModelTrainer(classifier, corpus)

In [None]:
trainer.train(model_path,
              learning_rate=0.01,
              mini_batch_size=32,
              anneal_factor=0.5,
              patience=5,
              max_epochs=5)

2021-07-01 08:10:57,822 ----------------------------------------------------------------------------------------------------
2021-07-01 08:10:57,828 Model: "TextClassifier(
  (document_embeddings): TransformerDocumentEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(30000, 768, padding_idx=0)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0): BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
               

RuntimeError: ignored