# Fine Tuning Transformer for MultiLabel Text Classification

### Introduction

In this tutorial we will be fine tuning a transformer model for the **Multilabel text classification** problem. 
This is one of the most common business problems where a given piece of text/sentence/document needs to be classified into one or more of categories out of the given list. For example a movie can be categorized into 1 or more genres.

#### Flow of the notebook

The notebook will be divided into seperate sections to provide a organized walk through for the process used. This process can be modified for individual use cases. The sections are:

1. [Importing Python Libraries and preparing the environment](#section01)
2. [Importing and Pre-Processing the domain data](#section02)
3. [Preparing the Dataset and Dataloader](#section03)
4. [Creating the Neural Network for Fine Tuning](#section04)
5. [Fine Tuning the Model](#section05)
6. [Validating the Model Performance](#section06)
7. [Saving the model and artifacts for Inference in Future](#section07)

#### Technical Details

This script leverages on multiple tools designed by other teams. Details of the tools used below. Please ensure that these elements are present in your setup to successfully implement this script.

 - Data: 
	 - We are using the Jigsaw toxic data from [Kaggle](https://www.kaggle.com/)
     - This is competion provide the souce dataset [Toxic Comment Competition](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)
	 - We are referring only to the first csv file from the data dump: `train.csv`
	 - There are rows of data.  Where each row has the following data-point: 
		 - Comment Text
		 - `toxic`
		 - `severe_toxic`
		 - `obscene`
		 - `threat`
		 - `insult`
		 - `identity_hate`

Each comment can be marked for multiple categories. If the comment is `toxic` and `obscene`, then for both those headers the value will be `1` and for the others it will be `0`.


 - Language Model Used:
	 - BERT is used for this project. It was the transformer model created by the Google AI Team.  
	 - [Blog-Post](https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html)
	 - [Research Paper](https://arxiv.org/abs/1810.04805)
     - [Documentation for python](https://huggingface.co/transformers/model_doc/bert.html)

---
***NOTE***
- *It is to be noted that the outputs to the BERT model are different from DistilBert Model implemented by the Hugging Face team. There are no `token_type_ids` generated from the tokenizer in case of Distilbert and also the final outputs from the network differ.*
- *This will be explained further in the notebook*
---

 - Hardware Requirements:
	 - Python 3.6 and above
	 - Pytorch, Transformers and All the stock Python ML Libraries
	 - GPU enabled setup 


 - Script Objective:
	 - The objective of this script is to fine tune BERT to be able to label a comment  into the following categories:
		 - `toxic`
		 - `severe_toxic`
		 - `obscene`
		 - `threat`
		 - `insult`
		 - `identity_hate`

---
***NOTE***
- *It is to be noted that the overall mechanisms for a multiclass and multilabel problems are similar, except for few differences namely:*
	- *Loss function is designed to evaluate all the probability of categories individually rather than as compared to other categories. Hence the use of `BCE` rather than `Cross Entropy` when defining loss.*
	- *Sigmoid of the outputs calcuated to rather than Softmax. Again for the reasons defined in the previous point*
	- *The [accuracy metrics](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) and [F1 scores](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score) used from sklearn package as compared to direct comparison of expected vs predicted*
---

<a id='section01'></a>
### Importing Python Libraries and preparing the environment

At this step we will be importing the libraries and modules needed to run our script. Libraries are:
* Pandas
* Pytorch
* Pytorch Utils for Dataset and Dataloader
* Transformers
* BERT Model and Tokenizer

Followed by that we will preapre the device for GPU execeution. This configuration is needed if you want to leverage on onboard GPU. 

*I have included the code for TPU configuration, but commented it out. If you plan to use the TPU, please comment the GPU execution codes and uncomment the TPU ones to install the packages and define the device.*

In [1]:
# Installing the transformers library and additional libraries if looking process 

# !pip install -q transformers

# Code for TPU packages install
# !curl -q https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
# !python pytorch-xla-env-setup.py --apt-packages libomp5 libopenblas-dev

In [46]:

LABELS = [
    "Evolution des procédés industriels",
    "Secteur Infrastructure & Déchéterie",
    "Innovation produits & services",
    "Gestion des bâtiments",
    "Production & distribution d'énergie",
    "Gestion des déchets",
    "Secteur Ville durable",
    "Secteur Agriculture & Zones rurales",
    "Mobilité des employés",
    "Secteur Eau & écosystèmes",
    "Ressources humaines",
    "Secteur Bois"
    ]
df = pd.read_csv("../src/final.csv", sep="\t", encoding="utf-16")
df["Thématique"] = df["best_cat"].map(lambda x: LABELS[int(x)])

In [47]:
df[df["Thématique"]=="Secteur Eau & écosystèmes"].name.values[0]

'Investir pour la qualité des eaux et Re-UTE'

"  La crise sanitaire a profondément affecté notre tissu industriel en portant un coup d'arrêt brutal à l'investissement de nombreuses entreprises, dans des territoires déjà souvent exposés à de profondes mutations.    Face à l'urgence, la réponse réside dans une accélération des investissements et une action rapide au plus près des territoires.    Cela passe par une démarche ambitieuse et coconstruite entre l'Etat et les Régions, animée à travers le programme Territoires d'Industrie.\r  Dans le cadre du Plan de relance, l'Etat met en place un fonds de 150 M€ de subventions en faveur des projets industriels les plus structurants pour les territoires.      La sélection se fera selon un processus simplifié, dans une logique de proximité, lors de Revues régionales d'accélération Etat - Région    . Elle s'appuiera sur une instruction financière et des diligences liées à la connaissance client, réalisée par Bpifrance.    Le présent dispositif vise à soutenir des investissements à dimension 

In [58]:
idx = np.random.choice(df.index)
print(idx)
print("####")
print(df.aidDetails.iloc[idx])
print("######")
df.summary.iloc[idx]

42
####
  La BPI participe au financement de votre projet d'innovation de rupture avant son lancement industriel et commercial.    Projets éligibles    Tous projets de R&D visant le développement d'une innovation de rupture à fort contenu technologique, qualifiée deeptech.    Le terme deeptech qualifie des projets reposant sur des technologies ou des combinaisons de technologies :      Issues d'un laboratoire de recherche (public/privé) et/ou s'appuyant sur une équipe/gouvernance en lien fort avec le monde scientifique (profil scientifique/technologie clé)      Qui présentent de fortes barrières à l'entrée, matérialisées par des verrous technologiques difficiles à lever,      Qui constituent un avantage fortement différenciateur par rapport à la concurrence,      Caractérisées par un go-to-market (développement, industrialisation, commercialisation) long/complexe donc probablement capitalistique.      Dépenses éligibles    Dépenses internes et externes directement liées aux phases de r

"<pad> BPI participe au financement de votre projet d'innovation de rupture avant son lancement industriel et commercial. Le terme deeptech qualifie les projets reposant sur des technologies ou des combinaisons de technologies. Les entreprises doivent avoir déposé leur dossier de demande d'aide au projet d'innovation.</s>"

In [1]:
# Importing stock ml libraries
import numpy as np
import pandas as pd
from sklearn import metrics
import transformers
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from transformers import CamembertTokenizerFast, CamembertModel, CamembertConfig

# Preparing for TPU usage
# import torch_xla
# import torch_xla.core.xla_model as xm
# device = xm.xla_device()

In [2]:
# # Setting up the device for GPU usage

from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

<a id='section02'></a>
### Importing and Pre-Processing the domain data

We will be working with the data and preparing for fine tuning purposes. 
*Assuming that the `train.csv` is already downloaded, unzipped and saved in your `data` folder*

* Import the file in a dataframe and give it the headers as per the documentation.
* Taking the values of all the categories and coverting it into a list.
* The list is appened as a new column and other columns are removed

In [14]:
0.05*349

17.45

In [32]:
df = pd.read_csv("../src/processings_train.csv", sep="\t", encoding="utf-16")
import ast 
df["list"] = df["list"].apply(lambda e : [int(x) for x in ast.literal_eval(e)])
# df['list'] = df[df.columns[2:]].values.tolist()
new_df = df[['text', 'list']].copy()
new_df.head()
df = new_df.copy()
# df["list"] = df.list.apply(lambda x: ' '.join([str(e) for e in x]))


In [34]:
df["nb_themes"] = df.list.apply(lambda X: np.sum(X))

In [45]:
df[df.nb_themes==2].iloc[10].text

"Réduire l'impact des activités humaines sur l'environnement.   La Fondation UEM soutient des projets locaux et régionaux mais peut aussi accompagner des projets portant sur des problématiques plus vastes, dans un périmètre national.    Elle apporte son soutien à des actions concrètes dont la finalité vise à réduire l'impact des activités humaines sur l'environnement. Cet engagement se traduit par la lutte contre les pollutions et les nuisances, la prévention des risques naturels et technologiques et la préservation des équilibres naturels (faune, flore, sites et milieux naturels).    La Fondation se donne pour priorité de contribuer à réduire le changement climatique et de préserver la biodiversité par tous les moyens possibles. Ce soutien peut être financier mais peut aussi consister à assurer une meilleure visibilité à votre projet grâce à une meilleure communication autour de celui-ci, en informant et sensibilisant le grand public sur les questions environnementales soulevées par v

In [40]:
349-279

70

<a id='section03'></a>
### Preparing the Dataset and Dataloader

We will start with defining few key variables that will be used later during the training/fine tuning stage.
Followed by creation of CustomDataset class - This defines how the text is pre-processed before sending it to the neural network. We will also define the Dataloader that will feed  the data in batches to the neural network for suitable training and processing. 
Dataset and Dataloader are constructs of the PyTorch library for defining and controlling the data pre-processing and its passage to neural network. For further reading into Dataset and Dataloader read the [docs at PyTorch](https://pytorch.org/docs/stable/data.html)

#### *CustomDataset* Dataset Class
- This class is defined to accept the `tokenizer`, `dataframe` and `max_length` as input and generate tokenized output and tags that is used by the BERT model for training. 
- We are using the BERT tokenizer to tokenize the data in the `comment_text` column of the dataframe.
- The tokenizer uses the `encode_plus` method to perform tokenization and generate the necessary outputs, namely: `ids`, `attention_mask`, `token_type_ids`
---
- *This is the first difference between the distilbert and bert, where the tokenizer generates the token_type_ids in case of Bert*
---
- To read further into the tokenizer, [refer to this document](https://huggingface.co/transformers/model_doc/bert.html#berttokenizer)
- `targest` is the list of categories labled as `0` or `1` in the dataframe. 
- The *CustomDataset* class is used to create 2 datasets, for training and for validation.
- *Training Dataset* is used to fine tune the model: **80% of the original data**
- *Validation Dataset* is used to evaluate the performance of the model. The model has not seen this data during training. 

#### Dataloader
- Dataloader is used to for creating training and validation dataloader that load data to the neural network in a defined manner. This is needed because all the data from the dataset cannot be loaded to the memory at once, hence the amount of dataloaded to the memory and then passed to the neural network needs to be controlled.
- This control is achieved using the parameters such as `batch_size` and `max_len`.
- Training and Validation dataloaders are used in the training and validation part of the flow respectively

In [4]:
# Sections of config

# Defining some key variables that will be used later on in the training
MAX_LEN = 256
TRAIN_BATCH_SIZE = 1
VALID_BATCH_SIZE = 2
EPOCHS = 30
LEARNING_RATE = 1e-6
tokenizer = CamembertTokenizerFast.from_pretrained('camembert-base')

In [5]:
class CustomDataset(Dataset):

    def __init__(self, dataframe, tokenizer, max_len):
        self.tokenizer = tokenizer
        self.data = dataframe
        self.comment_text = dataframe.text
        self.targets = self.data.list
        self.max_len = max_len

    def __len__(self):
        return len(self.comment_text)

    def __getitem__(self, index):
        comment_text = str(self.comment_text[index])
        comment_text = " ".join(comment_text.split())

        inputs = self.tokenizer.encode_plus(
            comment_text,
            None,
            add_special_tokens=True,
            max_length=self.max_len,
            pad_to_max_length=True,
            return_token_type_ids=True
        )
        ids = inputs['input_ids']
        mask = inputs['attention_mask']
        token_type_ids = inputs["token_type_ids"]


        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
            'targets': torch.tensor(self.targets[index], dtype=torch.float)
        }

In [6]:
# Creating the dataset and dataloader for the neural network

train_size = 0.8
train_dataset=new_df.sample(frac=train_size,random_state=123)
test_dataset=new_df.drop(train_dataset.index).reset_index(drop=True)
train_dataset = train_dataset.reset_index(drop=True)


print("FULL Dataset: {}".format(new_df.shape))
print("TRAIN Dataset: {}".format(train_dataset.shape))
print("TEST Dataset: {}".format(test_dataset.shape))

training_set = CustomDataset(train_dataset, tokenizer, MAX_LEN)
testing_set = CustomDataset(test_dataset, tokenizer, MAX_LEN)

FULL Dataset: (390, 2)
TRAIN Dataset: (312, 2)
TEST Dataset: (78, 2)


In [7]:
train_params = {'batch_size': TRAIN_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

test_params = {'batch_size': VALID_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)

In [9]:
# test_dataset.text.loc[30]

<a id='section04'></a>
### Creating the Neural Network for Fine Tuning

#### Neural Network
 - We will be creating a neural network with the `BERTClass`. 
 - This network will have the `Bert` model.  Follwed by a `Droput` and `Linear Layer`. They are added for the purpose of **Regulariaztion** and **Classification** respectively. 
 - In the forward loop, there are 2 output from the `BertModel` layer.
 - The second output `output_1` or called the `pooled output` is passed to the `Drop Out layer` and the subsequent output is given to the `Linear layer`. 
 - Keep note the number of dimensions for `Linear Layer` is **6** because that is the total number of categories in which we are looking to classify our model.
 - The data will be fed to the `BertClass` as defined in the dataset. 
 - Final layer outputs is what will be used to calcuate the loss and to determine the accuracy of models prediction. 
 - We will initiate an instance of the network called `model`. This instance will be used for training and then to save the final trained model for future inference. 
 
#### Loss Function and Optimizer
 - The Loss is defined in the next cell as `loss_fn`.
 - As defined above, the loss function used will be a combination of Binary Cross Entropy which is implemented as [BCELogits Loss](https://pytorch.org/docs/stable/nn.html#bcewithlogitsloss) in PyTorch
 - `Optimizer` is defined in the next cell.
 - `Optimizer` is used to update the weights of the neural network to improve its performance.
 
#### Further Reading
- You can refer to my [Pytorch Tutorials](https://github.com/abhimishra91/pytorch-tutorials) to get an intuition of Loss Function and Optimizer.
- [Pytorch Documentation for Loss Function](https://pytorch.org/docs/stable/nn.html#loss-functions)
- [Pytorch Documentation for Optimizer](https://pytorch.org/docs/stable/optim.html)
- Refer to the links provided on the top of the notebook to read more about `BertModel`. 

In [16]:
# Creating the customized model, by adding a drop out and a dense layer on top of distil bert to get the final output for the model. 

# class BERTClass(torch.nn.Module):
#     def __init__(self):
#         super(BERTClass, self).__init__()
#         self.l1 = transformers.CamembertModel.from_pretrained('camembert-base')
#         self.l2 = torch.nn.Dropout(0.3)
#         self.l3 = torch.nn.Linear(768, 12)
    
#     def forward(self, ids, mask, token_type_ids):
#         res = self.l1(ids, attention_mask = mask, token_type_ids = token_type_ids)
#         self.res = res
#         output_2 = self.l2(res)
#         output = self.l3(output_2)
#         return output

# model = BERTClass()
# model.to(device)
from transformers import CamembertForSequenceClassification
model = CamembertForSequenceClassification.from_pretrained('camembert-base', num_labels=5)
model.to(device)


Some weights of the model checkpoint at camembert-base were not used when initializing CamembertForSequenceClassification: ['lm_head.decoder.weight', 'roberta.pooler.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.weight', 'lm_head.dense.bias']
- This IS expected if you are initializing CamembertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CamembertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of CamembertForSequenceClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.out_proj.weig

CamembertForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(32005, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0): RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (Laye

In [17]:
def loss_fn(outputs, targets):
    return torch.nn.BCEWithLogitsLoss()(outputs.logits, targets)

In [18]:
optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)

<a id='section05'></a>
### Fine Tuning the Model

After all the effort of loading and preparing the data and datasets, creating the model and defining its loss and optimizer. This is probably the easier steps in the process. 

Here we define a training function that trains the model on the training dataset created above, specified number of times (EPOCH), An epoch defines how many times the complete data will be passed through the network. 

Following events happen in this function to fine tune the neural network:
- The dataloader passes data to the model based on the batch size. 
- Subsequent output from the model and the actual category are compared to calculate the loss. 
- Loss value is used to optimize the weights of the neurons in the network.
- After every 5000 steps the loss value is printed in the console.

As you can see just in 1 epoch by the final step the model was working with a miniscule loss of 0.022 i.e. the network output is extremely close to the actual output.

In [19]:
def train(epoch):
    model.train()
    best_loss = float('inf')
    for _,data in enumerate(training_loader, 0):
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
        targets = data['targets'].to(device, dtype = torch.float)

        outputs = model(ids, mask, token_type_ids)

        optimizer.zero_grad()
        loss = loss_fn(outputs, targets)
        if _%5000==0:
            print(f'Epoch: {epoch}, Loss:  {loss.item()}')
            if loss.item() < best_loss:
                model.save_pretrained('./results/camembert_v2_subclasses_C1/')
                best_loss = loss.item()
                print("Best model saved (epoch {})!".format(epoch))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

In [20]:
for epoch in range(EPOCHS):
    train(epoch)

Epoch: 0, Loss:  0.7074956893920898
Best model saved (epoch 0)!
Epoch: 1, Loss:  0.6815589070320129
Best model saved (epoch 1)!
Epoch: 2, Loss:  0.6760404706001282
Best model saved (epoch 2)!
Epoch: 3, Loss:  0.6832512021064758
Best model saved (epoch 3)!
Epoch: 4, Loss:  0.6698747277259827
Best model saved (epoch 4)!
Epoch: 5, Loss:  0.6879416704177856
Best model saved (epoch 5)!
Epoch: 6, Loss:  0.6851610541343689
Best model saved (epoch 6)!
Epoch: 7, Loss:  0.6528114676475525
Best model saved (epoch 7)!
Epoch: 8, Loss:  0.658505916595459
Best model saved (epoch 8)!
Epoch: 9, Loss:  0.6382265090942383
Best model saved (epoch 9)!
Epoch: 10, Loss:  0.6632227897644043
Best model saved (epoch 10)!
Epoch: 11, Loss:  0.6695339679718018
Best model saved (epoch 11)!
Epoch: 12, Loss:  0.6752375364303589
Best model saved (epoch 12)!
Epoch: 13, Loss:  0.6709195971488953
Best model saved (epoch 13)!
Epoch: 14, Loss:  0.6815897822380066
Best model saved (epoch 14)!
Epoch: 15, Loss:  0.61126488447

<a id='section06'></a>
### Validating the Model

During the validation stage we pass the unseen data(Testing Dataset) to the model. This step determines how good the model performs on the unseen data. 

This unseen data is the 20% of `train.csv` which was seperated during the Dataset creation stage. 
During the validation stage the weights of the model are not updated. Only the final output is compared to the actual value. This comparison is then used to calcuate the accuracy of the model. 

As defined above to get a measure of our models performance we are using the following metrics. 
- Accuracy Score
- F1 Micro
- F1 Macro

We are getting amazing results for all these 3 categories just by training the model for 1 Epoch.

In [21]:
def validation(epoch):
    model.eval()
    fin_targets=[]
    fin_outputs=[]
    with torch.no_grad():
        for _, data in enumerate(testing_loader, 0):
            ids = data['ids'].to(device, dtype = torch.long)
            mask = data['mask'].to(device, dtype = torch.long)
            token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
            targets = data['targets'].to(device, dtype = torch.float)
            outputs = model(ids, mask, token_type_ids)
            fin_targets.extend(targets.cpu().detach().numpy().tolist())
            fin_outputs.extend(torch.sigmoid(outputs.logits).cpu().detach().numpy().tolist())
    return fin_outputs, fin_targets

In [23]:
# for epoch in range(EPOCHS):
outputs, targets = validation(EPOCHS-1)
outputs = np.array(outputs) >= 0.2
accuracy = metrics.accuracy_score(targets, outputs)
f1_score_micro = metrics.f1_score(targets, outputs, average='micro')
f1_score_macro = metrics.f1_score(targets, outputs, average='macro')
cm = metrics.multilabel_confusion_matrix(targets, outputs)
print(f"Accuracy Score = {accuracy}")
print(f"F1 Score (Micro) = {f1_score_micro}")
print(f"F1 Score (Macro) = {f1_score_macro}")



Accuracy Score = 0.0
F1 Score (Micro) = 0.6538461538461539
F1 Score (Macro) = 0.5955244755244754


In [35]:
LABELS = [
    "Evolution des procédés industriels",
    "Secteur Infrastructure & Déchéterie",
    "Innovation produits & services",
    "Gestion des bâtiments",
    "Production & distribution d'énergie",
    "Gestion des déchets",
    "Secteur Ville durable",
    "Secteur Agriculture & Zones rurales",
    "Mobilité des employés",
    "Secteur Eau & écosystèmes",
    "Ressources humaines",
    "Secteur Bois"
    ]


In [38]:
np.where(outputs!=targets)

(array([ 0,  0,  0,  0,  0,  2,  2,  4,  4,  5,  6,  6,  6,  6,  6,  6,  6,
         6,  7,  7,  7,  8,  8,  9,  9,  9, 10, 10, 10, 11, 11, 13, 13, 14,
        14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
        15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 17, 17, 18, 18,
        18, 19, 19, 21, 21, 21, 21, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25,
        26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 28, 28, 28, 28, 28,
        28, 28, 29, 29, 30, 30, 30, 30, 31, 31, 32, 32, 33, 33, 34, 34, 35,
        35, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39,
        40, 40, 40, 41, 41, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
        43, 43, 43, 43, 44, 44, 46, 46, 47, 47, 47, 47, 48, 48, 48, 48, 49,
        49, 49, 50, 50, 52, 52, 53, 53, 54, 54, 57, 57, 58, 58, 59, 59, 59,
        60, 60, 61, 61, 61, 61, 62, 62, 63, 63, 63, 63, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 65, 65, 65, 66, 66, 67, 67, 68, 69, 69, 69, 69,
        69, 

In [26]:
targets[0]

[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

In [27]:
outputs[0]

array([False, False, False, False, False, False,  True, False, False,
       False, False, False])

In [19]:
targets[0]

[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

In [23]:
np.mean(outputs==targets)

0.9423076923076923

In [36]:
op = [' '.join(str(int(e)) for e in i) for i in outputs]
tg = [' '.join(str(int(e)) for e in i) for i in targets]


In [42]:
idx = 0
for i in range(len(op)):
    if tg[i] != op[i]:
        tmp_tg = tg[i].split(" ")
        tmp_op = op[i].split(" ")
        tmp_tg = [int(x) for x in tmp_tg]
        tmp_op = [int(x) for x in tmp_op]
        print(tmp_op)
        labels_op = [LABELS[x] for x in range(len(tmp_op)) if tmp_op[x]==1]
        labels_tg = [LABELS[x] for x in range(len(tmp_tg)) if tmp_tg[x]==1]
        print("Targets: ",labels_tg)
        print("Outputs: ", labels_op)
        print(test_dataset.text.loc[i])
        print("###############################################################")

[0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0]
Targets:  ['Innovation produits & services']
Outputs:  ['Innovation produits & services', 'Gestion des bâtiments', 'Secteur Ville durable', 'Secteur Agriculture & Zones rurales', 'Mobilité des employés']
Vous aider dans la préparation de votre projet d'innovation - Aide pour la faisabilité de l'innovation.   La BPI vous aide dans la préparation de votre projet d'innovation (Subvention ou avance récupérable pour valider la faisabilité de votre projet).      Projets éligibles    Tout projet de recherche, développement et innovation (RDI) de produits, procédés ou services innovants présentant des perspectives d'industrialisation et/ou de commercialisation.    Bpifrance a noué un partenariat avec le ministère de l'Agriculture et de l'Alimentation pour renforcer son soutien à des projets d'innovation dans l'industrie agroalimentaire en finançant les études amont de faisabilité ou de recherche de partenaires.    Dépenses éligibles      Etudes d'évaluation

In [16]:
test_dataset.text.loc[np.where(outputs!=targets)[0]].values

array(["Vous aider dans la préparation de votre projet d'innovation - Aide pour la faisabilité de l'innovation.   La BPI vous aide dans la préparation de votre projet d'innovation (Subvention ou avance récupérable pour valider la faisabilité de votre projet).      Projets éligibles    Tout projet de recherche, développement et innovation (RDI) de produits, procédés ou services innovants présentant des perspectives d'industrialisation et/ou de commercialisation.    Bpifrance a noué un partenariat avec le ministère de l'Agriculture et de l'Alimentation pour renforcer son soutien à des projets d'innovation dans l'industrie agroalimentaire en finançant les études amont de faisabilité ou de recherche de partenaires.    Dépenses éligibles      Etudes d'évaluation et d'analyse du potentiel d'un projet mettant en exergue les perspectives et les risques du projet et précisant les ressources nécessaires pour le mener à bien.      Conception et définition du projet, planification, validation de l

<a id='section07'></a>
### Saving the Trained Model Artifacts for inference

This is the final step in the process of fine tuning the model. 

The model and its vocabulary are saved locally. These files are then used in the future to make inference on new inputs of news headlines.

Please remember that a trained neural network is only useful when used in actual inference after its training. 

In the lifecycle of an ML projects this is only half the job done. We will leave the inference of these models for some other day. 