___

**project**: `First Aid Recomentation Bot (FARB)`

**date**: `2022-12-14`

**decription`**: `This notebook represent the code for data preparation all the way to model traing of an AI chatbot that will help in assisting people by giving them first aid recomendations.`

**main**: `Natural Language Processing (NLP) pytorch`

**programmer**: `crispengari`

**architecture**: `Convulutional Neural Networks (torchtext)`

**language:** `python` 

____


### Problem Statement
"Human beings turn to forget First Aid precaution when someone is in need of it or gave them a wrong first aid treatment. Due to the power of Artificial intelligent in Natural Language Processing using deep leaning techniques, we can create ai models that will help humans as Bots."

In this project i will introduce natural language processing techniques in health system, that can be used to help humans that are in need of a First Aid Treatment.

### Data
The dataset that we going to use in this notebook we be comming from [kaggle](https://www.kaggle.com/code/therealsampat/first-aid-recommendation-deep-learning-chatbot). We are going to generate our dataset based on the file called `intents.json` which is a file that contains `44` intents. We are going to save the prepared data into `.csv` files for three sets which are:

1. train
2. test
3. validation


### Model Architecture
`CNN`s are not a good choice in processing sequential data, but in this notebook based on my repository `torchtext` we are going to use `CNN` in doing `multi-class` classifications of intents for our `Bot`. We are going to use the following notebook as reference:

> [05_TORCHTEXT_CNN_1D](https://github.com/CrispenGari/torchtext/blob/main/sentiment-analyisis/05_TORCHTEXT_CNN_1D.ipynb)

### Installing Helper Packages
In the following code cell we are going to install the package called `helperfns` that provide us with some usefull helper functions for machine learning.

In [1]:
pip install helperfns -q

#### Imports

In  the following code cell we are going to import all the packages that we are going to use throughout this `notebook`

In [2]:
import time
import json
import torch
import os
import random
import torchtext
from matplotlib import pyplot as plt
from torch import nn
from torchtext import data
from collections import Counter
from torchtext import vocab

from helperfns.tables import tabulate_data
from helperfns.visualization import plot_complicated_confusion_matrix, plot_simple_confusion_matrix
from helperfns.torch import models
from helperfns.utils import hms_string

import torch.nn.functional as F
import numpy as np
import pandas as pd

from google.colab import drive, files

torchtext.__version__, torch.__version__

('0.14.0', '1.13.0+cu116')

### Seed
In the following code cell we are going to set the seed to all random operations for reproducivity.

In [3]:
SEED = 42

random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
np.random.seed(SEED)
torch.backends.cudnn.deterministic = True

### Device
In the following code cell we are going to get `gpu` device if possible

In [4]:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

### Data

Our dataset that we are going to use will be comming from [`kaggle`](https://www.kaggle.com/code/therealsampat/first-aid-recommendation-deep-learning-chatbot/data) and will be loaded from google drive  where i uploaded it in a folder called `FARB`. So in the following code cell we are going to mount our google drive to this colab instance.

In [5]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Path to the dataset.
Now we can define the path as a variable to the location where our `intents.json` file is located in the following codecell.

In [6]:
base_dir = "/content/drive/My Drive/NLP Data/FARB"

assert os.path.exists(base_dir), f"The path '{base_dir}' does not exists."

intents_path = os.path.join(base_dir, 'intents.json')

assert os.path.exists(intents_path), f"The path '{intents_path}' does not exists."

Then we need to read the `intents.json` file and create a classification dataset from it.

In [7]:
with open(intents_path, "r") as f:
  intents_data = json.load(f)

An example of as single intent in that file looks as follows:

```json
{
    "tag": "Cuts",
   "patterns": ["What to do if Cuts?",
    "How to cure Cuts?",
    "Which medicine to apply for Cuts?",
    "what to apply on cuts?",
    "Cuts"],
   "responses": ["Wash the cut properly to prevent infection and stop the bleeding by applying pressure for 1-2minutes until bleeding stops. Apply Petroleum Jelly to make sure that the wound is moist for quick healing. Finally cover the cut with a sterile bandage. Pain relievers such as acetaminophen can be applied."],
   "context_set": ""
}
```
So what we are intrested in for model training is `patterns` and it's tag, so our target value that we are trying to predict is a `tag` given a certain `pattern`. So from this we are going to generate our data maped with all the `pattens` to their labels.

In [8]:
dataset = list()
for intent in intents_data.get('intents'):
  label = intent.get('tag').lower()
  for pattern in intent.get('patterns'):
    feature = pattern.lower()
    dataset.append((feature, label))

print("Dataset size: {}".format(len(dataset)))

Dataset size: 188


We have a small dataset that contains `188` examples, let's check the first `10` examples in the dataset:

In [9]:
dataset[:10]

[('what to do if cuts?', 'cuts'),
 ('how to cure cuts?', 'cuts'),
 ('which medicine to apply for cuts?', 'cuts'),
 ('what to apply on cuts?', 'cuts'),
 ('cuts', 'cuts'),
 ('how do you treat abrasions?', 'abrasions'),
 ('do abrasions cause scars?', 'abrasions'),
 ('abrasions', 'abrasions'),
 ('what to do if abrasions?', 'abrasions'),
 ('which medicine to apply for abrasions?', 'abrasions')]

Next we are going to use the `random` module to shuffle our dataset and then check again the size first `10` examples before creating dataframes.

In [10]:
random.shuffle(dataset)

In [11]:
dataset[:10]

[('which medicine to apply for cuts?', 'cuts'),
 ('how do you treat a broken toe?', 'broken toe'),
 ('stings', 'stings'),
 ('how to help a drowning person in cpr?', 'cpr'),
 ('how to cure testicle pain?', 'testicle pain'),
 ('what to do if i have a blocked nose?', 'nasal congestion'),
 ('what to do if someone drowned?', 'drowning'),
 ('how do you treat a fracture?', 'fracture'),
 ('what to do if i get a wound?', 'wound'),
 ('what to do if you get a sting?', 'stings')]

So for our train data we are going to take all the examples in the `dataset` and then for the `validation` and `testing` set we are going to take a fraction of `40%` and `60%` from the dataset respectively.

In [12]:


train_df = pd.DataFrame(dataset, columns=["text", "label" ])

TEST_EXAMPLES = int(.6 * len(dataset))

random.shuffle(dataset)
test_df = pd.DataFrame(dataset[:TEST_EXAMPLES], columns=["text", "label" ])
val_df = pd.DataFrame(dataset[TEST_EXAMPLES: ], columns=["text", "label" ])

Checking our dataframes.


1. train dataframe

In [13]:
train_df.head(5)

Unnamed: 0,text,label
0,which medicine to apply for cuts?,cuts
1,how do you treat a broken toe?,broken toe
2,stings,stings
3,how to help a drowning person in cpr?,cpr
4,how to cure testicle pain?,testicle pain


2. test dataframe

In [14]:
test_df.head(5)

Unnamed: 0,text,label
0,which medicine to take if i get sun burn?,sun burn
1,what to do if i my nose is bleeding?,nose bleed
2,is heat or ice better for a pulled muscle?,strains
3,how to cure insect bite?,insect bites
4,what to apply on cuts?,cuts


3. validatation dataframe

In [15]:
val_df.head(5)

Unnamed: 0,text,label
0,what to do if i get a broken toe?,broken toe
1,how to cure a mild headache?,headache
2,which medicine to take if i get a snake bite?,snake bite
3,which medicine to take if i get a broken toe?,broken toe
4,how do you treat faint?,fainting


Now that we have text matched to labels, we can go ahead and save the `csv` files for these 3 different sets of data

In [16]:
train_df.to_csv(os.path.join(base_dir, "train.csv"),  index = False, header = True)
test_df.to_csv(os.path.join(base_dir, "test.csv"),  index = False, header = True)
val_df.to_csv(os.path.join(base_dir, "val.csv"),  index = False, header = True)
print("Done")

Done


In the following code cell we are going to count the examples that are in each set of our whole dataset.

In [17]:
columns = ["Set", "Example(s)"]

examples = [
    ['training', len(train_df)],
    ['validation', len(val_df)],
    ['testing', len(test_df)],
    ['total', len(train_df) +  len(test_df) + len(val_df)],
]

tabulate_data(columns, examples, "Exmples")

+-------------------------+
|         Exmples         |
+------------+------------+
| Set        | Example(s) |
+------------+------------+
| training   |        188 |
| validation |         76 |
| testing    |        112 |
| total      |        376 |
+------------+------------+


### Features and Labels
Our fetures are the actual `text` in the dataframes which is the column named `text` and our labels will come from the column called `label`. In the following code cell we are going to read features and labels in a numpy arrays for each set.

In [18]:
# train
train_texts = train_df.text.values
train_labels = train_df.label.values

# test
test_texts = test_df.text.values
test_labels = test_df.label.values

# val
val_texts = val_df.text.values
val_labels = val_df.label.values

### Text Preprocessing
In our text processing pipeline we need to do the following steps:

1. tokenize sentences
* this is the process of converting a sentence or text into senquence of word. For this process we are going to use a pre-trained model from spacy language model. You can read more about other tokenizers that you can use at [pytorch](https://pytorch.org/text/stable/data_utils.html).org.

2. vocabulary
We will to create a vocabulary based on our sentences that are in the train dataset. A `vocabulary` is esentially a `word` to `index` mapping that allows us to reference the word with their integer representation, since machine leaning models does not understand words. This vocabulary will be used during model training and also can be used at model inference.

### Tokenizer
In the following code cell we are going to geta a tokenier object that will convert a sentence into a sequence of word using the `spacy-en` language model. The reason we are using the english langauge model it's because our intents are in english.

In [19]:
tokenizer = data.utils.get_tokenizer('spacy', 'en')
tokenizer("This is a boy.")



['This', 'is', 'a', 'boy', '.']

### Vocabulary
In the following code cell we are going to create a `vocabulary` object from torchtext. This vocabulary takes in an `` of words to their count. So we are going to use the `Counter` module from `collections` to generate these counts from our train features.

We are going to specify the `min_freq` to `2` meaning that the words that does not appear at least 2 times will be converted to unknown. We are also going to specify the special tokens during creation of the vocabulary object.

In [20]:
counter = Counter()
for line in train_texts:
    counter.update(tokenizer(line))

#  our special tokens are (unknown, padding, start of sentence, end of sentence)
vocabulary = vocab.vocab(counter, min_freq=2, specials=('<unk>', '<pad>', '<sos>', '<eos>'))

### STOI - String To Integer
This will be a dictionary that contains a string to integer mapping which will be our actual vocabulary. In the following code cell we are going to create object called `stoi` which is essentially a dictionary of word to index mapping. This dictionary will be used during training as well as during model inference.

In [21]:
stoi = vocabulary.get_stoi()

### Text Pipeline
After our text has been tokenized we need a way of converting those words into numbers because machine leaning models understand numbers not words. That's where we the `text_pipeline` function comes into play. So this function takes in a sentence and tokenize it then converts each word to a number. Note that the word that does not exists in the vocabulay (`stoi`) will be converted to  an unkown `('<unk>')` token (0).

In [22]:
def text_pipeline(x: str):
  values = list()
  tokens = tokenizer(x.lower()) # convert to lower case.
  for token in tokens:
    try:
      v = stoi[token]
    except KeyError as e:
      v = stoi['<unk>']
    values.append(v)
  return values

### Label pipeline
Our labels for now are just textual. We also need to convert these labels into numbers. This is very simple what we need to do is to get all the uniqe labels and then create a `labels_vocab` which is a label to integer representation. which looks as follows:

```json
"head injur": 0,
"gastrointestinal problem": 1,
"bruise": 2,
"rectal bleedin": 3,
"snake bit": 4,
"abrasion": 5,
"abdonominal pai": 6,
"diarrhe": 7,
"eye injur": 8,
"cut": 9,
"cp": 10,
"heat strok": 11,
"normal bleedin": 12,
"vertig": 13,
"testicle pai": 14,
"seizur": 15,
"sprain": 16,
"drownin": 17,
"pulled muscl": 18,
"broken to": 19,
"coug": 20,
"splinte": 21,
"chemical bur": 22,
"faintin": 23,
"frost bit": 24,
"nasal congestio": 25,
"sting": 26,
"strain": 27,
"woun": 28,
"fractur": 29,
"teet": 30,
"poiso": 31,
"animal bit": 32,
"sun bur": 33,
"skin problem": 34,
"col": 35,
"heat exhaustio": 36,
"headach": 37,
"feve": 38,
"insect bite": 39,
"nose blee": 40,
"ras": 41,
"sore throa": 42,
"chokin": 43}
```

> As you have noticed we have `44` labels which are tags that we need to be able to predict.

The `label_pipeline` function will then takes in the label and then returns us an integer representation of that label.

In [23]:
labels_dict = {k: v for v, k in enumerate(train_df.label.unique())}

In [24]:
label_pipeline = lambda x: labels_dict[x]

Now that we have our vocabularies for labels `labels_dict` and  features `stoi` we can then save thes files as they will be used suring model inference. We are going to save these files as `.json` files.

In [25]:
with open(os.path.join(base_dir, "vocab.json"), 'w') as f:
  f.write(json.dumps(stoi, indent=2))

with open(os.path.join(base_dir, "labels_dict.json"), 'w') as f:
  f.write(json.dumps(labels_dict, indent=2))

print("Saved!!")

Saved!!


### Pretrained vectors
In the following code cell we are going to download the predtrained word vectors. We are going to use the `GloVe.6B.100d`. These are pretrained vectors that were trained with about `~6B` words and have a vector representation of a word in `100` dimension for each word.

In [26]:
EMBEDDING_DIM = 100
glove_vectors = vocab.GloVe('6B', dim=EMBEDDING_DIM)

### Creating Embedding matrix
Now that we have our glove vectors we need to costomize them so that they fit our use case. We are going to create an embedding matrix that suits the our vocabulary. So essentially this embedding matrix will be the word to vector mapping for all the words that arein our vocabulary.

In [27]:
VOCAB_SIZE = len(stoi)
EMBEDDING_MATRIX= torch.zeros([VOCAB_SIZE, EMBEDDING_DIM])
for i, word in enumerate(vocabulary.get_itos()):
  EMBEDDING_MATRIX[i] = glove_vectors[word]

In the followig code cell we are going to check the embedding matrix for the word `"the"`.

In [28]:
EMBEDDING_MATRIX[stoi['the']]

tensor([-0.0382, -0.2449,  0.7281, -0.3996,  0.0832,  0.0440, -0.3914,  0.3344,
        -0.5755,  0.0875,  0.2879, -0.0673,  0.3091, -0.2638, -0.1323, -0.2076,
         0.3340, -0.3385, -0.3174, -0.4834,  0.1464, -0.3730,  0.3458,  0.0520,
         0.4495, -0.4697,  0.0263, -0.5415, -0.1552, -0.1411, -0.0397,  0.2828,
         0.1439,  0.2346, -0.3102,  0.0862,  0.2040,  0.5262,  0.1716, -0.0824,
        -0.7179, -0.4153,  0.2033, -0.1276,  0.4137,  0.5519,  0.5791, -0.3348,
        -0.3656, -0.5486, -0.0629,  0.2658,  0.3020,  0.9977, -0.8048, -3.0243,
         0.0125, -0.3694,  2.2167,  0.7220, -0.2498,  0.9214,  0.0345,  0.4674,
         1.1079, -0.1936, -0.0746,  0.2335, -0.0521, -0.2204,  0.0572, -0.1581,
        -0.3080, -0.4162,  0.3797,  0.1501, -0.5321, -0.2055, -1.2526,  0.0716,
         0.7056,  0.4974, -0.4206,  0.2615, -1.5380, -0.3022, -0.0734, -0.2831,
         0.3710, -0.2522,  0.0162, -0.0171, -0.3898,  0.8742, -0.7257, -0.5106,
        -0.5203, -0.1459,  0.8278,  0.27

### Creating Dataset for Training

In the following code cell we are going to create a dataset class called `FARBDataset`. This dataset will takes in the labels and the text of a set.

In [29]:
class FARBDataset(torch.utils.data.Dataset):
  def __init__(self, labels, text):
    super(FARBDataset, self).__init__()
    self.labels = labels
    self.text = text
      
  def __getitem__(self, index):
    return self.labels[index], self.text[index]
  
  def __len__(self):
    return len(self.labels)

### collate_fn
We are going to create a collate function called `tokenize_batch`. This function actually takes in a `batch` and does the preprocessing of the text and labels. This function will be passed to the `DataLoader` class to do the preprocessing of features and labels.

`tokenize_batch` function:

* this function takes in a batch in each set and convert the features and labels to integer representation. It goes ahead and `pad` and `truncate` the sequence to the same `length` and returns `labels` and `features`.

In [30]:
def tokenize_batch(batch, max_len=100, padding="pre"):
  assert padding=="pre" or padding=="post", "the padding can be either pre or post"
  labels_list, text_list = [], []
  for _label, _text in batch:
    labels_list.append(label_pipeline(_label))
    text_holder = torch.zeros(max_len, dtype=torch.int32)
    processed_text = torch.tensor(text_pipeline(_text.lower()), dtype=torch.int32)
    pos = min(max_len, len(processed_text))
    if padding == "pre":
      text_holder[:pos] = processed_text[:pos]
    else:
      text_holder[-pos:] = processed_text[-pos:]
    text_list.append(text_holder.unsqueeze(dim=0))
  #  the labels will be torch long tensors since it is a multi-class classification.
  return torch.LongTensor(labels_list), torch.cat(text_list, dim=0)

### Datasets
In the following code cell we are going to create the datasets for all our three sets using the `FARBDataset` class.

In [31]:
train_dataset = FARBDataset(train_labels, train_texts)
test_dataset = FARBDataset(test_labels, test_texts)
val_dataset = FARBDataset(val_labels, val_texts)

### Iterators
In the following code cell we are going to create loaders using the `DataLoader` class from `torch.utils.data` for our `3` sets. We are going to use the `batch_size` of `128` and our `collate_function` is `tokenize_batch`. For the validation and testing dataset we are going to set the shuffle to `False` because there's no need fo us to shuffle these examples.

In [32]:
BATCH_SIZE = 128
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=tokenize_batch)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=tokenize_batch)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=tokenize_batch)

Checking a single Batch Data

In [33]:
lbl, txt = next(iter(train_loader))

Labels in the first batch.

In [34]:
lbl

tensor([17, 41,  2, 35, 17, 24, 17, 30, 42,  1,  0, 32, 26, 18, 12, 36, 27,  0,
        38, 30, 20, 27, 35, 10, 40, 38, 19,  8, 35, 34,  0, 18, 42, 43, 17, 29,
        10, 34, 23, 21,  0, 22, 25, 13, 23, 39,  5, 35, 17,  5, 16, 36, 17,  7,
        21,  5,  3, 14, 15,  1, 21, 38, 33, 12, 22,  1, 40, 19, 10,  3, 43, 26,
        29, 37, 31, 23, 18, 24, 15, 13, 31, 15, 27, 13, 31,  1, 28,  6, 28, 33,
        18,  6, 20, 29, 16, 11,  4, 40, 42,  9, 37,  8, 37, 33,  3, 25,  9, 14,
        11, 13,  2, 14,  5, 29, 10, 14,  7, 33, 24,  3, 41, 28,  2, 34, 15, 16,
        42, 43])

The first sentence in the batch.

In [35]:
txt[0]

tensor([ 4,  5,  6,  7,  8, 51, 10,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=torch.int32)

### Model Creation
Now that we have our loaders we can now create a model. The model that we are going to create is called `FARBModel`.  As mentioned we are going to use `Convulutional Neural Networks (CNN)` to build this model.

In [36]:
class FARBModel(nn.Module):
  def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
            dropout, pad_idx):
    super(FARBModel, self).__init__()
    self.embedding = nn.Sequential(
        nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
    )
    self.convs = nn.Sequential(
        nn.ModuleList([
            nn.Conv1d(
                in_channels = embedding_dim, 
                out_channels = n_filters, 
                kernel_size = fs
              ) for fs in filter_sizes
        ])
    )
    self.out = nn.Sequential(
        nn.Linear(len(filter_sizes) * n_filters, output_dim)
    )
    self.dropout = nn.Dropout(dropout)
        
  def forward(self, text):
    embedded = self.embedding(text)  
    embedded = embedded.permute(0, 2, 1)
    conved = [F.relu(conv(embedded)) for conv in self.convs[0]]
    pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
    cat = self.dropout(torch.cat(pooled, dim = 1))
    return self.out(cat)

### Model Instance
In the following code cell we are going to create a model instance.

In [37]:
INPUT_DIM = len(stoi) 
EMBEDDING_DIM = 100
OUTPUT_DIM = len(labels_dict)
DROPOUT = 0.5
PAD_IDX = stoi['<pad>'] 
N_FILTERS = 100
FILTER_SIZES = [3, 4, 5]

farb_model = FARBModel(
    INPUT_DIM, EMBEDDING_DIM, N_FILTERS, FILTER_SIZES, OUTPUT_DIM, DROPOUT, PAD_IDX
).to(device)
farb_model

FARBModel(
  (embedding): Sequential(
    (0): Embedding(98, 100, padding_idx=1)
  )
  (convs): Sequential(
    (0): ModuleList(
      (0): Conv1d(100, 100, kernel_size=(3,), stride=(1,))
      (1): Conv1d(100, 100, kernel_size=(4,), stride=(1,))
      (2): Conv1d(100, 100, kernel_size=(5,), stride=(1,))
    )
  )
  (out): Sequential(
    (0): Linear(in_features=300, out_features=44, bias=True)
  )
  (dropout): Dropout(p=0.5, inplace=False)
)

### Counting Model Parameters
In the following code cell we are going to count the model parameters.

In [38]:
models.model_params(farb_model)

TOTAL MODEL PARAMETERS: 	143,344
TOTAL TRAINABLE PARAMETERS: 	143,344


### Loading Embedding Vectors
In the following code cell we are going to load the pretained custom vectors in our embedding layer. We are going to load the embedding vectors tha suits our data using the `farb_model.embedding[0].weight.data.copy_(EMBEDDING_MATRIX)` as follows:

In [39]:
farb_model.embedding[0].weight.data.copy_(EMBEDDING_MATRIX)

tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        ...,
        [-0.3809, -0.3992,  0.3630,  ..., -0.0667,  0.7872, -0.9540],
        [-0.0788,  0.6902,  0.4748,  ..., -0.2634, -0.6086, -0.1911],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],
       device='cuda:0')

### Optimizer and Criterion

In the following code cell we are going to define the `optimizer` and `criterion`. For the `optimizer` we are going to use the `Adam` optimizer with default parameters and for the criterion we are going to use the `CrossEntropyLoss()` function since this is a `multi-class` classification.

In [40]:
optimizer = torch.optim.Adam(farb_model.parameters())
criterion = nn.CrossEntropyLoss().to(device)

In the following code cell we are going to create our `categorical_accuracy` function, which is a function that calulates the the catecorical accuracy between the predicted labels and real labels.

In [41]:
def categorical_accuracy(preds, y):
  top_pred = preds.argmax(1, keepdim = True)
  correct = top_pred.eq(y.view_as(top_pred)).sum()
  acc = correct.float() / y.shape[0]
  return acc

### Train and Evaluate functions
In the following code cell we are going to create our `train` and `evalute` functions:

In [42]:
def train(model, iterator, optimizer, criterion):
  epoch_loss,epoch_acc = 0, 0
  model.train()
  for batch in iterator:
    y, X = batch
    X = X.to(device)
    y = y.to(device)
    optimizer.zero_grad()

    predictions = model(X).squeeze(1)
    loss = criterion(predictions, y)
    acc = categorical_accuracy(predictions, y)
    loss.backward()
    optimizer.step()
    epoch_loss += loss.item()
    epoch_acc += acc.item()
  return epoch_loss / len(iterator), epoch_acc / len(iterator)

def evaluate(model, iterator, criterion):
  epoch_loss,epoch_acc = 0, 0
  model.eval()
  with torch.no_grad():
    for batch in iterator:
      y, X = batch
      X = X.to(device)
      y = y.to(device)
      predictions = model(X).squeeze(1)
      loss = criterion(predictions, y)
      acc = categorical_accuracy(predictions, y)
      epoch_loss += loss.item()
      epoch_acc += acc.item()
  return epoch_loss / len(iterator), epoch_acc / len(iterator)

### Training Loop
In the following code cell we are going to run the training loop. We are going to save the model when the loss decreased.

In [43]:
N_EPOCHS = 200
MODEL_NAME = 'farb-model.pt'

best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
  start = time.time()
  train_loss, train_acc = train(farb_model, train_loader, optimizer, criterion)
  valid_loss, valid_acc = evaluate(farb_model, val_loader, criterion)
  title = f"EPOCH: {epoch+1:02}/{N_EPOCHS:02} {'saving best model...' if valid_loss < best_valid_loss else 'not saving...'}"
  if valid_loss < best_valid_loss:
      best_valid_loss = valid_loss
      torch.save(farb_model.state_dict(), MODEL_NAME)
  end = time.time()
  data = [
       ["Training", f'{train_loss:.3f}', f'{train_acc:.3f}', f"{hms_string(end - start)}" ],
       ["Validation", f'{valid_loss:.3f}', f'{valid_acc:.3f}', "" ],       
   ]
  columns = ["CATEGORY", "LOSS", "ACCURACY", "ETA"]
  tabulate_data(columns, data, title)


+--------------------------------------------+
|     EPOCH: 01/200 saving best model...     |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   | 3.803 |    0.016 | 0:00:02.10 |
| Validation | 3.696 |    0.053 |            |
+------------+-------+----------+------------+
+--------------------------------------------+
|     EPOCH: 02/200 saving best model...     |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   | 3.678 |    0.053 | 0:00:00.06 |
| Validation | 3.602 |    0.158 |            |
+------------+-------+----------+------------+
+--------------------------------------------+
|     EPOCH: 03/200 saving best model...     |
+------------+-------+----------+------------+
| CATEGORY   |  LOSS | ACCURACY |        ETA |
+------------+-------+----------+------------+
| Training   

### Evaluating the best model.
In the following code cell we are going to evaluate the best model using on the `test` data as follows:

In [44]:
column_names = ["Set", "Loss", "Accuracy", "ETA (time)"]
farb_model.load_state_dict(torch.load(MODEL_NAME))
test_loss, test_acc = evaluate(farb_model, test_loader, criterion)
title = "Model Evaluation Summary"
data_rows = [["Test", f'{test_loss:.3f}', f'{test_acc * 100:.2f}%', ""]]

tabulate_data(column_names, data_rows, title)

+--------------------------------------+
|       Model Evaluation Summary       |
+------+-------+----------+------------+
| Set  |  Loss | Accuracy | ETA (time) |
+------+-------+----------+------------+
| Test | 0.034 |   99.11% |            |
+------+-------+----------+------------+


### Model Inference
In the following code cell we are going to make predictions with the best model. We will have the function called `inference_preprocess_text` which is a function that process the text for inference.

In [45]:
def inference_preprocess_text(text, max_len=100, padding="pre"):
  assert padding=="pre" or padding=="post", "the padding can be either pre or post"
  text_holder = torch.zeros(max_len, dtype=torch.int32) # fixed size tensor of max_len with  = 0
  processed_text = torch.tensor(text_pipeline(text), dtype=torch.int32)
  pos = min(max_len, len(processed_text))
  if padding == "pre":
    text_holder[:pos] = processed_text[:pos]
  else:
    text_holder[-pos:] = processed_text[-pos:]
  text_list= text_holder.unsqueeze(dim=0)
  return text_list

### Predicting Tags
In the following code cell we are going to create a function that predicts the `tags` given a certain `pattern` called `predict_tags`.

In [46]:
class Prediction:
  def __init__(self, pattern: str, tag: str, tagId: int, confidence: float):
    self.pattern = pattern
    self.tag = tag
    self.tagId = tagId
    self.confidence = confidence

  def __repr__(self) -> str:
    return f"<FARB Preciction: {self.tag}>"

  def __str__(self) -> str:
    return f"<FARB Preciction: {self.tag}>"

  def to_json(self):
    return {
        'pattern':  self.pattern,
        'tag':  self.tag,
        'tagId':  self.tagId,
        'confidence':  self.confidence,
    }

In [47]:
def predict_tag(model, sentence, device): 
  model.eval()
  with torch.no_grad():
    tensor = inference_preprocess_text(sentence).to(device)
    probabilities = torch.softmax(model(tensor).squeeze(0), dim=0)
    prediction = torch.argmax(probabilities)
    prediction = prediction.detach().cpu().item()
    tags = {v:k for k, v in labels_dict.items()}
    tag = tags[prediction]
   
    return Prediction(
        sentence.lower(), tag, int(prediction), float(round(probabilities[prediction].item(), 2))
    )

In [48]:
predict_tag(farb_model, "How to cure Cuts?", device)

<FARB Preciction: cuts>

### Downloading the model.
We are going to download the model

In [49]:
files.download(MODEL_NAME)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>