# **Deep Natural Language Processing @ PoliTO**

---


**Teaching Assistant:** Lorenzo Vaiani

**Credits:** Moreno La Quatra

**Practice 4:** Named Entities Recognition & Intent Detection

## Named Entities Recognition (NER)

The Named Entity Recognition task aims at identifying and classifying named entities in a text. Named entities are real-world objects such as persons, locations, organizations, etc. The task takes as input a sentence and determines the boundaries of the named entities and their type.

For example, given the sentence:

```
I went to Paris last week.
```

the task is to identify the named entity `Paris` as a location.

Hereafter an illustration of the NER task:

![https://miro.medium.com/max/875/0*mlwDqNm7DFc_4maP.jpeg](https://miro.medium.com/max/875/0*mlwDqNm7DFc_4maP.jpeg)   

In the first part of this practice, you will:
- explore the NER task using pre-trained models available on Spacy and HuggingFace
- evaluate the performance of a SpaCy NER model on a custom dataset
- evaluate the performance of a HuggingFace NER model on a custom dataset

NB: the library used to evaluate the performance of the models is `seqeval`, which is a library for evaluating sequence labeling tasks.

### **Question 1: Data preparation**

The first step is to prepare the data. In this practice, you will use the WikiGold dataset[1][2], which is a collection of annotated sentences from Wikipedia. The dataset is available in [CONLL](https://simpletransformers.ai/docs/ner-data-formats/#text-file-in-conll-format) format. The dataset is available [here](https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/NER/wikigold.conll.txt).

**Please, read carefully the following instructions before starting to work on the practice.**

You need to extract clean sentences (no annotation) and, for each sentence, the corresponding annotations. The dataset has the following format:

- `sentences`: list of sentences
- `annotations`: list of list of entities (both string and class information). E.g., `[[('010', 'MISC'), ('Japanese', 'MISC'), ('The Mad Capsule Markets', 'ORG')], [('Osc-Dis', 'MISC'), ('Introduction 010', 'MISC'), ('Come', 'MISC')], ...]`. You can remove I- prefix because the data collection does not actually cotains valuable prefixes.

---


[1] Balasuriya, Dominic, et al. "Named entity recognition in wikipedia."
    Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources. Association for Computational Linguistics, 2009.

[2] Nothman, Joel, et al. "Learning multilingual named entity recognition
    from Wikipedia." Artificial Intelligence 194 (2013): 151-175

---

The following cell downloads the dataset on Google Colab.

In [1]:
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/NER/wikigold.conll.txt

--2023-11-16 16:10:02--  https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/NER/wikigold.conll.txt
Résolution de raw.githubusercontent.com (raw.githubusercontent.com)… 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connexion à raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 318530 (311K) [text/plain]
Sauvegarde en : « wikigold.conll.txt.1 »


2023-11-16 16:10:02 (14,8 MB/s) — « wikigold.conll.txt.1 » sauvegardé [318530/318530]



To avoid spending too much time on data processing, the following cells prepare the dataset for you.
After running the cell, you will have the following variables:
- `sentences_with_labels`: a list of tokens with their corresponding labels
- `sentences`: a list of the sentences in the dataset
- `labels`: a list of lists of labels. Each element in the outer list corresponds to a list of labels for a sentence in the dataset.

Hereafter an example of the provided data:

```python
sentences_with_labels[0] = [
    ['010', 'I-MISC'],
    ['is', 'O'],
    ['the', 'O'],
    ['tenth', 'O'],
    ['album', 'O'],
    ['from', 'O'],
    ['Japanese', 'I-MISC'],
    ['Punk', 'O'],
    ['Techno', 'O'],
    ['band', 'O'],
    ['The', 'I-ORG'],
    ['Mad', 'I-ORG'],
    ['Capsule', 'I-ORG'],
    ['Markets', 'I-ORG'],
    ['.', 'O']
]

sentences[0] = [
    '010 is the tenth album from Japanese Punk Techno band The Mad Capsule Markets .'
]

labels[0] = [
    ('010', 'MISC'),
    ('Japanese', 'MISC'),
    ('The Mad Capsule Markets', 'ORG')
]
```

Please, note that the labels are not in IOB format. You can ignore the I- prefix because the data collection does not actually contains valuable prefixes.
Get familar with the data by printing the first 10 sentences and their corresponding labels. Which are the labels in the dataset?

In [2]:
%%capture
! pip install datasets
! pip install transformers
! pip install spacy
! python -m spacy download en_core_web_sm

In [3]:
def split_text_label(filename):
    f = open(filename)
    split_labeled_text = []
    sentence = []
    for line in f:
        if len(line)==0 or line.startswith('-DOCSTART') or line[0]=="\n":
             if len(sentence) > 0:
                 split_labeled_text.append(sentence)
                 sentence = []
             continue
        splits = line.split(' ')
        sentence.append([splits[0],splits[-1].rstrip("\n")])
    if len(sentence) > 0:
        split_labeled_text.append(sentence)
        sentence = []
    return split_labeled_text
sentences_with_labels = split_text_label("wikigold.conll.txt")

In [4]:
print(sentences_with_labels[0])

[['010', 'I-MISC'], ['is', 'O'], ['the', 'O'], ['tenth', 'O'], ['album', 'O'], ['from', 'O'], ['Japanese', 'I-MISC'], ['Punk', 'O'], ['Techno', 'O'], ['band', 'O'], ['The', 'I-ORG'], ['Mad', 'I-ORG'], ['Capsule', 'I-ORG'], ['Markets', 'I-ORG'], ['.', 'O']]


In [5]:
%%capture
! pip install datasets
! pip install transformers
! pip install spacy
! python -m spacy download en_core_web_sm

In [6]:
sentences = []

for sent_list in sentences_with_labels:
    sentence = [s[0] for s in sent_list]
    sentence = " ".join(sentence)
    sentences.append(sentence)

In [7]:
print (sentences[0])

010 is the tenth album from Japanese Punk Techno band The Mad Capsule Markets .


In [8]:
labels = []
overall_labels = []
for sent_list in sentences_with_labels:
    current_labels = []
    prev = "O"
    current_entity = ""
    for w, l in sent_list:
        overall_labels.append(l)
        if l != "O" and prev != "O":
            if l == prev:
                # continue entity
                current_entity += w + " "
            else:
                # end prev and start a new one
                current_labels.append((current_entity.strip(), prev.split("-")[1]))
                current_entity = w + " "
        elif l == "O" and  prev != "O":
            # end prev
            current_labels.append((current_entity.strip(), prev.split("-")[1]))
            current_entity = ""
        elif l != "O" and prev == "O":
            # start new
            current_entity = w + " "

        prev = l
    labels.append(current_labels)

print (labels)
overall_labels = list(set(overall_labels))
overall_labels = [o for o in overall_labels if o != "O"]
overall_labels = [o.split("-")[1] for o in overall_labels]
print (overall_labels)

[[('010', 'MISC'), ('Japanese', 'MISC'), ('The Mad Capsule Markets', 'ORG')], [('Osc-Dis', 'MISC'), ('Introduction 010', 'MISC'), ('Come', 'MISC')], [('Kojima Minoru', 'PER'), ('Good Day', 'MISC'), ('Wardanceis', 'MISC'), ('UK', 'LOC'), ('Killing Joke', 'ORG')], [('XXX can of This', 'MISC')], [('Cannabis', 'MISC'), ('Cannabis', 'MISC'), ('P.O.P', 'MISC'), ('HUMANITY', 'MISC')], [('UK', 'LOC'), ('OSC-DIS', 'MISC')], [('139th', 'ORG'), ('Camp Howe', 'LOC'), ('Pittsburgh', 'LOC')], [('Frederick H. Collier', 'PER')], [('Second Battle of Bull Run', 'MISC'), ("Howe 's Brigade", 'ORG'), ("Couch 's Division", 'ORG'), ('IV Corps', 'ORG'), ('Army of the Potomac', 'ORG'), ("De Trobriand 's 55th New York", 'ORG'), ('Gardes Lafayette', 'ORG')], [('62d NY', 'ORG'), ('93d PVI', 'ORG'), ('98th PVI', 'ORG'), ('102d PVI', 'ORG')], [("Couch 's Division", 'ORG'), ('Army', 'ORG'), ('Potomac', 'LOC')], [('139th', 'ORG'), ('Poolesville', 'LOC'), ('Sandy Hook', 'LOC'), ('Maryland', 'LOC'), ('Battle of Antieta

In [9]:
print (labels[0])

[('010', 'MISC'), ('Japanese', 'MISC'), ('The Mad Capsule Markets', 'ORG')]


### **Question 2: Inference with SpaCy for entity recognition**

SpaCy is a free, open-source library for advanced Natural Language Processing in Python. It features NER models for different languages including English.
The models are available [here](https://spacy.io/models).

For this question you asked to instantiate a spacy model for English and perform inference on the sentences in the dataset. The English model contains a superset of the labels in the dataset. For this reason, you need to map the labels that are not in the dataset to the `MISC` label.

You are expected to generate an output similar to the following:
```python
[('010', 'MISC'), ('Japanese', 'MISC'), ('The Mad Capsule Markets', 'ORG')]
```

Please pay attention to the token attributes (you can find more information [here](https://spacy.io/api/token#attributes)) and the entity attributes (you can find more information [here](https://spacy.io/api/entityrecognizer)).

The following cell instantiates a spacy model for English.

In [11]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [19]:
print (sentences[0])
print(len(sentences))

010 is the tenth album from Japanese Punk Techno band The Mad Capsule Markets .
1696


In [24]:
# your code here

pred_ner = []
for sentence in sentences:
    doc = nlp(sentence)
    current_ner = []
    for entity in doc.ents:
        label = "MISC" if entity.label_ not in overall_labels else entity.label_
        current_ner.append((entity.text, label))

    pred_ner.append(current_ner)
print (pred_ner)

[[('010', 'MISC'), ('tenth', 'MISC'), ('Japanese Punk Techno', 'ORG'), ('The Mad Capsule Markets', 'ORG')], [('Osc-Dis', 'MISC'), ('Introduction 010', 'MISC')], [('Kojima Minoru', 'MISC'), ('Good Day', 'MISC'), ('Wardanceis', 'ORG'), ('UK', 'MISC'), ('Killing Joke', 'ORG')], [], [('Cannabis', 'MISC'), ('Cannabis', 'MISC'), ('P.O.P', 'MISC'), ('HUMANITY', 'MISC')], [('UK Edition', 'MISC')], [('139th', 'MISC'), ('Camp Howe', 'MISC'), ('Pittsburgh', 'MISC'), ('September 1 , 1862', 'MISC')], [('Frederick H. Collier', 'MISC'), ('first', 'MISC')], [('Second Battle of Bull Run', 'MISC'), ("Howe 's Brigade of Couch 's Division of the IV Corps of the Army of the Potomac", 'ORG'), ("De Trobriand 's", 'MISC'), ('New York', 'MISC'), ('Gardes Lafayette', 'MISC'), ('September 11 , 1862', 'MISC')], [('62d', 'MISC'), ('PVI', 'ORG'), ('PVI', 'ORG'), ('the 102d PVI', 'ORG')], [("Couch 's Division", 'ORG'), ('Army', 'ORG'), ('Potomac', 'LOC')], [('139th', 'MISC'), ('the next week', 'MISC'), ('Poolesville

### **Question 3: Compute metrics for evaluating NER**

The output of NER models consists of a set of named entities. To evaluate the performance of a model, we need to compare the predicted named entities with the ground truth.

For this question, you need to use [`eval4ner`](https://github.com/cyk1337/eval4ner) package to evaluate the performance of the model.

**Note**: please use `pip install git+https://github.com/MorenoLaQuatra/eval4ner` to use a fixed version of the library. Before passing the parameter to the evaluation function, create a deepcopy of each variable:
```python
from copy import deepcopy
sentences_copy = deepcopy(sentences) #pass sentences_copy to the eval script
```

The issue has been already reported to the original author.

In [15]:
%%capture
! pip install git+https://github.com/MorenoLaQuatra/eval4ner

In [27]:
# your code here
import eval4ner.muc as muc
import pprint
from copy import deepcopy
sentences_copy = deepcopy(sentences)

labels_new = [[(label, text) for text, label in label] for label in labels]
pred_ner_new = [[(label, text) for text, label in pred] for pred in pred_ner]

result = muc.evaluate_all(pred_ner_new, labels_new, sentences_copy)



 NER evaluation scores:
  strict mode, Precision=0.2186, Recall=0.2580, F1:0.2295
   exact mode, Precision=0.5249, Recall=0.6387, F1:0.5561
 partial mode, Precision=0.5801, Recall=0.7113, F1:0.6157
    type mode, Precision=0.2807, Recall=0.3409, F1:0.2968


### **Question 4: Inference with transformers pipeline**

Transformer-based models can be fine-tuned for token-level classification. The task is to classify each token in a sentence and assign it to a class.
The NER task is a token-level classification task and the models can be used for performing inference on the sentences in the dataset.

You can use the pipeline available on the HuggingFace [transformers library](https://huggingface.co/docs/transformers/main_classes/pipelines). The pipeline allows to perform inference on a list of sentences.

Evaluate the **standard** model using the pipeline (`pipe = pipeline("ner")`). Check the documentation here: https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TokenClassificationPipeline


A few notes about the question (**read carefully**):
1. The output of the pipeline differs with respect to spacy. Please be sure to process data correctly before running evaluation.
2. `ignore_labels` parameter could be used to exclude labels from the prediction.
3. `##` symbol is used when a token is a continuation of a previous one (Poli + ##TO). You may need to check this specific case to merge the tokens correctly.
4. Use seqeval to evaluate the performance of the model.

In [28]:
! pip install datasets transformers



In [29]:
# your code here
import datasets
from transformers import pipeline
from transformers.pipelines.base import KeyDataset
import tqdm
from tqdm import tqdm

pipe = pipeline("ner")
pipe.ignore_labels = []

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
(…)conll03-english/resolve/main/config.json: 100%|██████████| 998/998 [00:00<00:00, 1.41MB/s]
model.safetensors: 100%|██████████| 1.33G/1.33G [00:44<00:00, 29.9MB/s]
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForToke

In [32]:
for i, s in enumerate(sentences[:4]):
    out = pipe(s)
    for o in out:
        print(o)

{'entity': 'I-MISC', 'score': 0.99836165, 'index': 8, 'word': 'Japanese', 'start': 28, 'end': 36}
{'entity': 'I-MISC', 'score': 0.8849569, 'index': 9, 'word': 'Punk', 'start': 37, 'end': 41}
{'entity': 'I-MISC', 'score': 0.82731473, 'index': 10, 'word': 'Tech', 'start': 42, 'end': 46}
{'entity': 'I-MISC', 'score': 0.9429677, 'index': 11, 'word': '##no', 'start': 46, 'end': 48}
{'entity': 'I-ORG', 'score': 0.8888787, 'index': 13, 'word': 'The', 'start': 54, 'end': 57}
{'entity': 'I-ORG', 'score': 0.9945515, 'index': 14, 'word': 'Mad', 'start': 58, 'end': 61}
{'entity': 'I-ORG', 'score': 0.99449736, 'index': 15, 'word': 'Cap', 'start': 62, 'end': 65}
{'entity': 'I-ORG', 'score': 0.99419725, 'index': 16, 'word': '##sul', 'start': 65, 'end': 68}
{'entity': 'I-ORG', 'score': 0.997577, 'index': 17, 'word': '##e', 'start': 68, 'end': 69}
{'entity': 'I-ORG', 'score': 0.9970092, 'index': 18, 'word': 'Markets', 'start': 70, 'end': 77}
{'entity': 'I-MISC', 'score': 0.87035537, 'index': 15, 'word'

In [34]:

pred_transformers_ner = []
overall_labels_transformers = []
for i, s in enumerate(tqdm(sentences)):
    out = pipe(s)
    current_labels = []
    prev = "O"
    current_entity = ""
    for o in out:
        overall_labels_transformers.append(o['entity'])
        l = o['entity']
        w = o['word']
        if l != "O" and prev != "O":
            if l == prev:
                # continue entity
                current_entity += w + " "
            else:
                # end prev and start a new one
                current_entity = current_entity.strip()
                current_entity = current_entity.replace(" ##", "")
                current_labels.append((current_entity, prev.split("-")[1]))
                current_entity = w + " "
        elif l == "O" and  prev != "O":
            # end prev
            current_entity = current_entity.strip()
            current_entity = current_entity.replace(" ##", "")
            current_labels.append((current_entity, prev.split("-")[1]))
            current_entity = ""
        elif l != "O" and prev == "O":
            # start new
            current_entity = w + " "

        prev = l
    pred_transformers_ner.append(current_labels)

print(pred_transformers_ner[0])

100%|██████████| 1696/1696 [03:27<00:00,  8.17it/s]

[('Japanese Punk Techno', 'MISC')]





In [38]:
print(pred_transformers_ner[:5])

overall_labels_transformers = list(set(overall_labels_transformers))
print (overall_labels_transformers)
overall_labels_transformers = [o for o in overall_labels_transformers if o != "O"]
print (overall_labels_transformers)
overall_labels_transformers = [o.split("-")[1] for o in overall_labels_transformers]
print (overall_labels_transformers)

[[('Japanese Punk Techno', 'MISC')], [], [('Kojima Minoru', 'PER'), ('Good Day', 'MISC'), ('Wardance', 'ORG'), ('UK', 'LOC')], [], [('Cannabis Cannabis P .', 'MISC'), ('P', 'ORG'), ('H', 'MISC')]]
['I-ORG', 'I-MISC', 'I-PER', 'I-LOC']
['I-ORG', 'I-MISC', 'I-PER', 'I-LOC']
['ORG', 'MISC', 'PER', 'LOC']


In [40]:
sentences_copy = deepcopy(sentences)
labels_new = [[(label, text) for text, label in label] for label in labels]
pred_transformers_ner_new = [[(label, text) for text, label in label] for label in pred_transformers_ner]


result = muc.evaluate_all(pred_transformers_ner_new, labels_new, sentences_copy)


 NER evaluation scores:
  strict mode, Precision=0.4177, Recall=0.3239, F1:0.3527
   exact mode, Precision=0.4342, Recall=0.3325, F1:0.3634
 partial mode, Precision=0.4604, Recall=0.3545, F1:0.3859
    type mode, Precision=0.4452, Recall=0.3480, F1:0.3768


## Intent Detection

In data mining, intention mining or intent mining is the problem of determining a user's intention from logs of his/her behavior in interaction with a computer system, such as in search engines. Intent Detection is the identification and categorization of what a user online intended or wanted to find when they type or speak with a conversational agent (or a search engine).

![https://d33wubrfki0l68.cloudfront.net/32e2326762c75a0357ab1ae1976a60d4bbce724b/f4ac0/static/a5878ba6b0e4e77163dc07d07ecf2291/2b6c7/intent-classification-normal.png](https://d33wubrfki0l68.cloudfront.net/32e2326762c75a0357ab1ae1976a60d4bbce724b/f4ac0/static/a5878ba6b0e4e77163dc07d07ecf2291/2b6c7/intent-classification-normal.png)

In this section, you will use the [ATIS dataset](https://github.com/yvchen/JointSLU): https://github.com/yvchen/JointSLU ; https://www.kaggle.com/siddhadev/atis-dataset-clean/home

The task is to classify the intent of a sentence. The dataset is split into train, validation and test sets. **Use the provided splits** to train and evaluate your models.


In [35]:
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/IntentDetection/atis.train.csv
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/IntentDetection/atis.dev.csv
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/IntentDetection/atis.test.csv

--2023-11-16 17:27:48--  https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/IntentDetection/atis.train.csv
Résolution de raw.githubusercontent.com (raw.githubusercontent.com)… 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connexion à raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 838864 (819K) [text/plain]
Sauvegarde en : « atis.train.csv.1 »


2023-11-16 17:27:49 (15,8 MB/s) — « atis.train.csv.1 » sauvegardé [838864/838864]

--2023-11-16 17:27:49--  https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P4/IntentDetection/atis.dev.csv
Résolution de raw.githubusercontent.com (raw.githubusercontent.com)… 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connexion à raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 112033 (109K) [

### **Question 5: Two-step classification model**

Train a classification model to identify the intent from a given sentence. The model is required to leverage on pretrained BERT model to generate sentence embeddings (important: **no fine-tuning**). The model is required to use the embeddings to perform classification.

Once extracted the embeddings, you can use any classifier you want. For example, you can use a linear classifier (e.g. Logistic Regression) or a neural network (e.g. MLP). For convenience, you can use the `sklearn` library for training the classifier (https://scikit-learn.org/stable/supervised_learning.html).

![https://github.com/MorenoLaQuatra/DeepNLP/blob/main/practices/P4/IntentDetection/no_finetuning.png?raw=true](https://github.com/MorenoLaQuatra/DeepNLP/blob/main/practices/P4/IntentDetection/no_finetuning.png?raw=true)


Assess the performance of the trained model (the model on top of BERT) on the test set by using the **classification accuracy**, **precision**, **recall** and **F1-score**. You can use the `sklearn` library for computing the metrics (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics).


Note: you can use the `sentence-transformers` library to generate sentence embeddings (https://www.sbert.net/docs/pretrained_models.html).

In [36]:
%%capture
!pip install sentence-transformers
!pip install sklearn

In [43]:
# your code here
from sentence_transformers import SentenceTransformer
from sklearn.metrics import classification_report
from sklearn.preprocessing import LabelEncoder
import pandas as pd
from sklearn.neural_network import MLPClassifier

In [45]:
train_df = pd.read_csv("atis.train.csv")
test_df = pd.read_csv("atis.test.csv")
dev_df = pd.read_csv("atis.dev.csv")

print (train_df.head())


train_sentences = train_df["tokens"].tolist()
test_sentences = test_df["tokens"].tolist()
dev_sentences = dev_df["tokens"].tolist()

train_sentences = [s.replace("BOS ", "") for s in train_sentences]
test_sentences = [s.replace("BOS ", "") for s in test_sentences]
dev_sentences = [s.replace("BOS ", "") for s in dev_sentences]

train_sentences = [s.replace(" EOS", "") for s in train_sentences]
test_sentences = [s.replace(" EOS", "") for s in test_sentences]
dev_sentences = [s.replace(" EOS", "") for s in dev_sentences]

            id                                             tokens  \
0  train-00001  BOS what is the cost of a round trip flight fr...   
1  train-00002  BOS now i need a flight leaving fort worth and...   
2  train-00003  BOS i need to fly from kansas city to chicago ...   
3  train-00004         BOS what is the meaning of meal code s EOS   
4  train-00005  BOS show me all flights from denver to pittsbu...   

                                               slots             intent  
0  O O O O O O O B-round_trip I-round_trip O O B-...       atis_airfare  
1  O O O O O O O B-fromloc.city_name I-fromloc.ci...        atis_flight  
2  O O O O O O B-fromloc.city_name I-fromloc.city...        atis_flight  
3  O O O O O O B-meal_code I-meal_code I-meal_code O  atis_abbreviation  
4  O O O O O O B-fromloc.city_name O B-toloc.city...        atis_flight  


In [48]:
# Data encoding
BERT_MODEL_NAME = "stsb-mpnet-base-v2"
bert_model = SentenceTransformer(BERT_MODEL_NAME)

X_train = bert_model.encode(train_sentences, show_progress_bar=True)
X_test = bert_model.encode(test_sentences, show_progress_bar=True)
X_dev = bert_model.encode(dev_sentences, show_progress_bar=True)

(…)5066b519ab1e99c3f54a0594e/.gitattributes: 100%|██████████| 868/868 [00:00<00:00, 2.01MB/s]
(…)9ab1e99c3f54a0594e/1_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 1.12MB/s]
(…)82d245066b519ab1e99c3f54a0594e/README.md: 100%|██████████| 3.67k/3.67k [00:00<00:00, 19.9MB/s]
(…)d245066b519ab1e99c3f54a0594e/config.json: 100%|██████████| 588/588 [00:00<00:00, 4.35MB/s]
(…)a0594e/config_sentence_transformers.json: 100%|██████████| 122/122 [00:00<00:00, 522kB/s]
pytorch_model.bin: 100%|██████████| 438M/438M [00:17<00:00, 25.2MB/s] 
(…)e99c3f54a0594e/sentence_bert_config.json: 100%|██████████| 52.0/52.0 [00:00<00:00, 202kB/s]
(…)b1e99c3f54a0594e/special_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 1.28MB/s]
(…)5066b519ab1e99c3f54a0594e/tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 1.16MB/s]
(…)9ab1e99c3f54a0594e/tokenizer_config.json: 100%|██████████| 1.19k/1.19k [00:00<00:00, 5.25MB/s]
(…)82d245066b519ab1e99c3f54a0594e/vocab.txt: 100%|██████████| 232k/232k [

In [49]:
train_labels = train_df["intent"].tolist()
test_labels = test_df["intent"].tolist()
dev_labels = dev_df["intent"].tolist()

def label_encoding(labels, le):
    # instantiate labelencoder object
    y = le.transform(labels)
    return y

atis_labels = []
for label in set(train_labels):
    atis_labels.append(label)

le = LabelEncoder()
le.fit(atis_labels)

train_y = label_encoding(train_labels, le)
test_y = label_encoding(test_labels, le)
dev_y = label_encoding(dev_labels, le)

In [52]:
def mlp_training(X,y):
    # Create an MLP classifier
    clf = MLPClassifier()

    # Fit the classifier using the training data
    clf.fit(X, y)
    return clf

model = MLPClassifier()
model.fit(X_train, train_y)


#Validation Step

def validation(model, X, y, test_type):
    # Predict the labels
    y_pred = model.predict(X)

    # Count the number of correct predictions
    n_correct = 0
    for i in range(len(y)):
        if y_pred[i] == y[i]:
            n_correct += 1

    print("Predicted "+str(n_correct)+" correctly out of "+str(len(y))+" "+test_type+" examples")

validation(model, X_dev, dev_y, "dev")
validation(model, X_test, test_y, "test")

print ("\n\n Dev Classification Report:")

y_true, y_pred = dev_y, model.predict(X_dev)
print(classification_report(y_true, y_pred))

print ("\n\n Test Classification Report:")

y_true, y_pred = test_y, model.predict(X_test)
print(classification_report(y_true, y_pred))

Predicted 547 correctly out of 572 dev examples
Predicted 561 correctly out of 586 test examples


 Dev Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.76      0.84        17
           1       1.00      0.88      0.93         8
           2       0.90      0.93      0.91        46
           3       0.92      0.69      0.79        16
           4       1.00      0.67      0.80         3
           5       1.00      1.00      1.00         4
           6       1.00      0.50      0.67         2
           7       1.00      1.00      1.00         3
           8       0.97      0.99      0.98       423
           9       0.75      1.00      0.86         3
          10       0.00      0.00      0.00         2
          11       0.86      0.86      0.86         7
          12       1.00      1.00      1.00         2
          13       0.96      0.96      0.96        28
          14       0.00      0.00      0.00         1
        

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


### **Question 6: Finetuning end-to-end classification model**

Another approach is to fine-tune the BERT model for the classification task. A classification head is added on top of the pretrained BERT model. The classification head is trained end-to-end with the BERT model.
This approach is more effective than the previous one because the model is trained end-to-end. However, the model requires more training time and resources.

Train a new BERT model for the task of [sequence classification](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForSequenceClassification) (include BERT fine-tuning).  

![https://github.com/MorenoLaQuatra/DeepNLP/blob/main/practices/P4/IntentDetection/finetuning.png?raw=true](https://github.com/MorenoLaQuatra/DeepNLP/blob/main/practices/P4/IntentDetection/finetuning.png?raw=true)

Assess the performance of the generated model by using the same metrics used in the previous question.

Which model has better performance? Why?

In [None]:
# your code here
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModelForSequenceClassification.from_pretrained("roberta-base", num_labels = len(set(train_y)))