#Multi-Class Classification Using Simple Transformers

---

In the past few years we have seen tremendous improvements in the ability of machines to deal with Natural languages. We saw algorithms breaking the state-of-the-art one after the other on a variety of language specific tasks, all thanks to transformers. In this article we will discuss and implement transformers in the simplest way possible using a library called Simple Transformers.

##The Seq2Seq Model

Before stepping into the transformers’ territory let's take a brief look at the Sequence-to-Sequence models.

The Sequence-to-Sequence model (seq2seq) converts a given sequence of text of fixed length into another sequence of fixed length, which we can easily relate to machine translation. But Seq2seq is not just limited to translation, in fact it is quite efficient in tasks that require text generation.The model uses an encoder-decoder architecture and has been very successful in machine translation and question answering tasks.It uses a stack of Long Short Term Memory(LSTM) networks or Gated Recurrent Units(GRU) in encoders and decoders.

Here is a simple demonstration of Seq2Seq model:

![alt text](https://analyticsindiamag.com/wp-content/uploads/2019/12/seq-seq_aim.png)

Image Source : [A ten-minute introduction to sequence-to-sequence learning in Keras](https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html)

One major drawback of the Seq2Seq model comes from the limitation of it’s underlying RNNs. Though LSTMs are meant to deal with long term dependencies between the word vectors the performance drops as the distance increases. The model also restricts parallelization.

##Transformer Architecture 

The transformer model introduces an architecture that is solely based on attention mechanism and does not use any Recurrent Networks but yet produces results superior in quality to Seq2Seq models.It addresses the long term dependency problem of the Seq2Seq model.The transformer architecture is also parallelizable and the training process is considerably faster.

![alt text](https://analyticsindiamag.com/wp-content/uploads/2019/12/transformer_architecture_aim.png)

Image Source : [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)


Let's take a look at some of the important features :

* Encoder: The encoder has 6 identical layers in which each layer consists of a multi-head self-attention mechanism and a fully connected feed-forward network. The multi head attention system and feed-forward network both has a residual connection and a normalization layer. 

* Decoder: The decoder also consists of 6 identical layers with an additional sublayer in each of the 6 layers. The additional sublayer performs multi-head attention over the output of the encoder stack.

* Attention Mechanism: 

Attention is the mapping of a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors.The attention mechanism allows the model to understand the context of a text. 

* Scaled Dot-Product Attention:

![alt text](https://analyticsindiamag.com/wp-content/uploads/2019/12/scaled_dot_attention_aim.png)

* Multi-Head Attention:

![alt text](https://analyticsindiamag.com/wp-content/uploads/2019/12/multi-head-attention_aim.png)


![alt text](https://analyticsindiamag.com/wp-content/uploads/2019/12/self-attention_aim.png)
Image Source : [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf)

The transformer architecture is a breakthrough in the NLP spectrum giving rise to many state-of-the-art algorithms such as Google’s BERT, RobertA, OpenGPT and many others.


---


In this hands-on session, you will be introduced to Simple Transformers library. The library is built on top of the popular huggingface transformers library which consists of implementations of various transformer based models and algorithms.

The library makes it effortless to implement various language modeling tasks such as Simple Transformers currently supports tasks such as Sequence Classification, Token Classification (NER), and Question Answering. 

So without further ado let's get our hands dirty !

##About The Dataset - [Predict The News Category Hackathon](https://www.machinehack.com/course/predict-the-news-category-hackathon/)

From the beginning, since the first printed newspaper, every news that makes into a page has had a specific section allotted to it. Although pretty much everything changed in newspapers from the ink to the type of paper used, this proper categorization of news was carried over by generations and even to the digital versions of the newspaper. Newspaper articles are not limited to a few topics or subjects, it covers a wide range of interests from politics to sports to movies and so on. For long, this process of sectioning was done manually by people but now technology can do it without much effort. In this hackathon, Data Science and Machine Learning enthusiasts like you will use Natural Language Processing to predict which genre or category a piece of news will fall in to from the story.

* Size of training set: 7,628 records
* Size of test set: 2,748 records

FEATURES:

* STORY:  A part of the main content of the article to be published as a piece of news.
* SECTION: The genre/category the STORY falls in.

There are four distinct sections where each story may fall in to. The Sections are labelled as follows :

* Politics: 0
* Technology: 1
* Entertainment: 2
* Business: 3


##Mounting Google Drive

In [1]:
from google.colab import drive
drive.mount("/GD")

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /GD


##Importing Modules

In [2]:
try:
  %tensorflow_version 2.x  #gpu
except Exception:
  pass
import tensorflow as tf

`%tensorflow_version` only switches the major version: `1.x` or `2.x`.
You set: `2.x  #gpu`. This will be interpreted as: `2.x`.


TensorFlow 2.x selected.


In [0]:
import os
import re
import pandas as pd

##Loading & Splitting The Data

In [0]:
train = pd.read_excel("/GD/My Drive/Colab Notebooks/News_category/Datasets/Data_Train.xlsx")

#Reducing the training sample for fast execution
train = train.sample(frac = 0.2)

#splitting the training set in to training and validation sets
from sklearn.model_selection import train_test_split
train, val =  train_test_split(train, test_size = 0.2, random_state = 120)

In [48]:
train.head()

Unnamed: 0,STORY,SECTION
7306,Pichai said that by offering a search engine i...,1
6675,"Meanwhile, an AAP release said on Tuesday that...",0
596,This coincided with Congress-NCP leaders doubl...,0
5534,McIntosh said Perry’s character is an homage t...,2
5691,Most phones have screen sizes in between 6.2 a...,1


In [6]:
train.shape

(1220, 2)

In [7]:
val.shape

(306, 2)

##Installing & Importing Simple Transformers

In [8]:
!pip install simpletransformers

Collecting simpletransformers
[?25l  Downloading https://files.pythonhosted.org/packages/34/58/eb37623d9671c123d21f1ed0b1f96fe0501586ae62f9d261dedde202a817/simpletransformers-0.10.2-py3-none-any.whl (93kB)
[K     |███▌                            | 10kB 24.5MB/s eta 0:00:01[K     |███████                         | 20kB 4.0MB/s eta 0:00:01[K     |██████████▌                     | 30kB 5.7MB/s eta 0:00:01[K     |██████████████                  | 40kB 7.2MB/s eta 0:00:01[K     |█████████████████▋              | 51kB 4.8MB/s eta 0:00:01[K     |█████████████████████           | 61kB 5.7MB/s eta 0:00:01[K     |████████████████████████▋       | 71kB 6.5MB/s eta 0:00:01[K     |████████████████████████████    | 81kB 7.2MB/s eta 0:00:01[K     |███████████████████████████████▋| 92kB 8.0MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 5.5MB/s 
Collecting seqeval
  Downloading https://files.pythonhosted.org/packages/34/91/068aca8d60ce56dd9ba4506850e876aba5e66a6f2f

## Creating A Classification Model

In [9]:
from simpletransformers.classification import ClassificationModel

#Create a ClassificationModel
model = ClassificationModel('roberta', 'roberta-base', num_labels=4, use_cuda = False)


100%|██████████| 473/473 [00:00<00:00, 133318.04B/s]
100%|██████████| 898823/898823 [00:00<00:00, 1833660.79B/s]
100%|██████████| 456318/456318 [00:00<00:00, 1114849.26B/s]
100%|██████████| 501200538/501200538 [00:18<00:00, 26839413.43B/s]


##Training the Classifier

In [10]:
model.train_model(train)

Converting to features started.


HBox(children=(IntProgress(value=0, max=1220), HTML(value='')))




HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…

HBox(children=(IntProgress(value=0, description='Current iteration', max=153, style=ProgressStyle(description_…

Running loss: 0.003528
Training of roberta model complete. Saved to outputs/.


##Evaluating The Classifier

In [19]:
scores1, model_outputs, wrong_predictions = model.eval_model(val)

Features loaded from cache at cache_dir/cached_dev_roberta_128_4_306


HBox(children=(IntProgress(value=0, max=39), HTML(value='')))




In [20]:
scores1

{'eval_loss': 0.20702565842881226, 'mcc': 0.9280285195386848}

In [0]:
#Evaluating With F1 Score & Accuracy

from sklearn.metrics import f1_score, accuracy_score
def f1_multiclass(labels, preds):
    return f1_score(labels, preds, average='micro')

In [22]:
scores2, model_outputs, wrong_predictions = model.eval_model(val, f1=f1_multiclass, acc=accuracy_score)

Features loaded from cache at cache_dir/cached_dev_roberta_128_4_306


HBox(children=(IntProgress(value=0, max=39), HTML(value='')))




In [23]:
scores2

{'acc': 0.9477124183006536,
 'eval_loss': 0.20702565842881226,
 'f1': 0.9477124183006536,
 'mcc': 0.9280285195386848}

##Predicting
---

Classes & Labels

* Politics: 0
* Technology: 1
* Entertainment: 2
* Business: 3


In [46]:
predictions, raw_output  = model.predict(['Indian is lead by prime minister Modi '])

Converting to features started.


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))




HBox(children=(IntProgress(value=0, max=1), HTML(value='')))




In [47]:
predictions

array([0])

In [38]:
raw_output

array([[-1.2096044 ,  0.51245356,  4.533729  , -2.7435937 ]],
      dtype=float32)

In [34]:
predictions2, _ = model.predict(['my phone is soo dumb and slow'])

Converting to features started.


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))




HBox(children=(IntProgress(value=0, max=1), HTML(value='')))




In [35]:
predictions2

array([1])

##Predicting For A Test Set

In [0]:
predictions3, _ = model.predict(test_data.STORY)

In [0]:
df = pd.DataFrame(predictions3, columns = ['SECTION'])

In [0]:
#Saving the predictions in an excel file
df.to_excel("/GD/My Drive/Colab Notebooks/Transformers/simple_transformers.xlsx", index = False)

Upload the above file at MachineHack to check your score!!