# Practice exercises after Lecture 5
This notebook contains the practice exercise with instructions and explanations.

Work through the cells below in sequential order, executing each cell as you progress. Throughout the notebook, you will encounter instructions marked with the words **YOUR CODE HERE** followed by **raise NotImplementedError()**. You will have to substitute  *raise NotImplementedError()* with your own code.
Follow the instructions and write the code to complete the tasks.

Along the way, you will also find questions. Try to reflect on the questions before/after running the code. 


You will get familiar with HuggingFace, a "platform where the machine learning community collaborates on models, datasets, and applications".  [[cit. HuggingFace main page]](https://huggingface.co/).

You will learn how to finetune a transformer-based model for a binary text classification task using a pretrained model avaiable on HuggingFace. Specifically, you will finetune a transformer to classify movie reviews, from [IMDB dataset](https://huggingface.co/datasets/imdb), as either positive or negative reviews.

Moreover, the goal of this exercise is to learn  how to use HuggingFace for fast model prototyping which allows you to quickly experiment with different architectures, adapt exsiting pretrained models and create baseline models for custom tasks  (for example, the Mini-Project problem).


This notebook was developed at the [Idiap Research Institute](https://www.idiap.ch) by [Alina Elena Baia](mailto:alina.baia.idiap.ch>), [Darya Baranouskaya](mailto:darya.baranouskaya.idiap.ch) and [Olena Hrynenko](mailto:olena.hrynenko.idiap.ch) (equal contribution). This notebook is based on the [HuggingFace fine-tuning tutorial](https://huggingface.co/docs/transformers/training#fine-tune-a-pretrained-model). Any reproduction or distribution of this document, in whole or in part, is prohibited unless permission is granted by the authors.
<!--
SPDX-FileCopyrightText: Copyright (c) 2024 Idiap Research Institute <contact@idiap.ch>
SPDX-FileContributor: Alina Elena Baia <alina.baia.idiap.ch>
SPDX-FileContributor: Darya Baranouskaya <darya.baranouskaya.idiap.ch>
SPDX-FileContributor: Olena Hrynenko <olena.hrynenko.idiap.ch>
-->

**Why using a pretrained model?**

"There are significant benefits to using a pretrained model. It reduces computation costs, your carbon footprint, and allows you to use state-of-the-art models without having to train one from scratch. Huggingface Transformers provides access to thousands of pretrained models for a wide range of tasks. When you use a pretrained model, you train it on a dataset specific to your task. This is known as fine-tuning, an incredibly powerful training technique. " [[cit. HuggingFace fine-tuning tutorial]](https://huggingface.co/docs/transformers/training#fine-tune-a-pretrained-model).

This notebook uses [BERT-tiny model](https://github.com/google-research/bert/blob/master/README.md), named "google/bert_uncased_L-2_H-128_A-2". We use it for teaching purposes as it requires fewer computational resources and it allows fast training on CPU.

NOTE: The code can be easily adapted to use more performant models by changing the model name from the list of [available models](https://huggingface.co/docs/transformers/model_doc/auto) (i.e. DistilBERT or BERT). However GPU access and additional training time is needed when using bigger models.


Therefore, you will: 

  - 5.1 Get familiar with HuggingFace datasets and tokenizers.

  - 5.2 Load and customize a HuggingFace pretrained transformer for text classification.

  - 5.3 Fine-tune and evaluate the transformer.

  - 5.4 Use the fine-tuned model for inference.

In [None]:
import getpass
import os
import re
import torch

# For efficient usage of the hardware resources when running on JupyterHub EPFL,
# we will limit the number of threads. If you are running this code on your local
# machine, the following code will not do anything.

if re.search('^https://gnoto\\.epfl\\.ch$', os.environ.get("EXTERNAL_URL", "")) != None:
    num_threads_limit = 4
else:
    num_threads_limit = torch.get_num_threads()
print(f"Limiting the number of threads to {num_threads_limit}")
torch.set_num_threads(num_threads_limit)
print(f"PyTorch is using {torch.get_num_threads()} threads")

_ = torch.set_flush_denormal(True) # To avoid long training time on CPU

# Dataset and tokenization

  - 5.1 Get familiar with HuggingFace datasets and tokenizers.


Next, prepare the IMDB dataset. You can use the HuggingFace dataset library to:
 - load a dataset from the HuggingFace Hub: "Over 1,000 datasets for many NLP tasks like text classification, question answering, language modeling, etc, are provided on the HuggingFace Hub and can be viewed and explored online with the 🤗 Datasets viewer." [[cit. HuggingFace loading datasets]](https://huggingface.co/docs/datasets/v1.11.0/loading_datasets.html). Check [this link](https://huggingface.co/datasets/viewer/) if you want to explore a dataset using a user-friendly interface.
 - load a dataset from local files: "Generic loading scripts are provided for: CSV files (with the csv script), JSON files (with the json script), text files (read as a line-by-line dataset with the text script), pandas pickled dataframe (with the pandas script)" [[cit. HuggingFace loading datasets]](https://huggingface.co/docs/datasets/v1.11.0/loading_datasets.html). For example and future reference, you can load a dataset from a local .csv file into a dataset object with the following code:
 ```
 from datasets import load_dataset
 imdb = load_dataset('csv', data_files=["<your-path-to-dataset>/IMDB-Dataset.csv"])
 ```


 In this notebook, you will load the IMDB dataset from the HuggingFace Hub, and you will also learn how to adapt the dataset to be used with native Pytorch. On Gnoto, you already have the dataset downloaded for you in the "/EE559-shared/huggingface/imdb" directory, so instead of `load_dataset` you can use an alternative function `load_from_disk`.

 IMDB "is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well." [[cit. HuggingFace IMDB dataset]](https://huggingface.co/datasets/imdb)


In [None]:
#load the dataset from the existing directory using the hugging face function 
from datasets import load_from_disk
imdb = load_from_disk("/EE559-shared/huggingface/imdb")

Let us explore the dataset and its content.

"What is in the Dataset object?

The datasets.Dataset object that you get when you execute for instance the following commands:

```
from datasets import load_dataset
dataset = load_dataset("imdb")
```
It behaves like a normal python container. You can query its length, get rows, columns and also a lot of metadata on the dataset (description, citation, split sizes, etc)." [[cit. HuggingFace What’s in the Dataset object]](https://huggingface.co/docs/datasets/v1.11.0/exploring.html)


In [None]:
# check the dataset structure
imdb.pop('unsupervised') #we will not need this part of the dataset for this exercise
print(imdb)

NOTE: the dataset does not come with a validation split. As the goal of this exercises is to learn the process of finetuning HuggingFace models and not to find the best model that solves the task, we will not be using a validation set. However, keep in mind that for real-world problem using a validation set is important because it gives "an unbiased estimate of the skill of the final tuned model when comparing or selecting between final models" [[cit. Machine Learning Mastery Difference Between Test and Validation]](https://machinelearningmastery.com/difference-test-validation-datasets/).



In [None]:
# check the size of the dataset
# like this
print(imdb.shape)

#or like this
print(len(imdb["train"]["text"]), len(imdb["train"]["label"]))
print(len(imdb["test"]["text"]), len(imdb["test"]["label"]))

In [None]:
# take a look at an example
imdb["train"][0]

There are two fields in this dataset: [[cit. HuggingFace Text classification]](https://huggingface.co/docs/transformers/tasks/sequence_classification).
 - text: the movie review text.
 - label: a value that is either 0 for a negative review or 1 for a positive review.


In [None]:
# take a look at an example
print("review: ", imdb["train"][0]["text"])
print("label: ", imdb["train"][0]["label"])

print("review: ", imdb["train"][-1]["text"])
print("label: ", imdb["train"][-1]["label"])

print(len(imdb["train"]["label"]), imdb["train"]["label"][:10],imdb["train"]["label"][-10:] )

In [None]:
#check class distribution in training and testing set

from collections import Counter
print(Counter(imdb["train"]["label"]))

print(Counter(imdb["test"]["label"]))

Next, load the BERT-tiny tokenizer to preprocess the text data and create a preprocessing function to tokenize text and truncate sequences to be no longer than model's maximum input length [[cit. HuggingFace Text classification]](https://huggingface.co/docs/transformers/tasks/sequence_classification).



"In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the from_pretrained()
method. AutoClasses are here to do this job for you so that you automatically
retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary." [[cit. HuggingFace Auto Classes]](https://huggingface.co/docs/transformers/model_doc/auto).


Details about tokenizers class can also be found at [HuggingFace tokenizer](https://huggingface.co/transformers/v2.11.0/main_classes/tokenizer.html) page:

- model_max_length (int): "the maximum length in number of tokens
  for the inputs to the transformer model. When the tokenizer is loaded with
  from_pretrained, this will be set to the value stored for the associated
  model in max_model_input_sizes (see above).
  If no value is provided, will default to VERY_LARGE_INTEGER (int(1e30)).
  no associated max_length can be found in max_model_input_sizes." [[cit. HuggingFace tokenizer]](https://huggingface.co/transformers/v2.11.0/main_classes/tokenizer.html)





In [None]:
from transformers import AutoTokenizer

#  Instantiating  AutoTokenizer will directly create a class of the relevant architecture.
tokenizer = AutoTokenizer.from_pretrained("google/bert_uncased_L-2_H-128_A-2", model_max_length=512)


In [None]:
#explore the tokenizer

tokenizer


Why using padding and truncation?
"Batched inputs are often different lengths, so they can’t be converted to fixed-size tensors. Padding and truncation are strategies for dealing with this problem, to create rectangular tensors from batches of varying lengths. Padding adds a special padding token to ensure shorter sequences will have the same length as either the longest sequence in a batch or the maximum length accepted by the model. Truncation works in the other direction by truncating long sequences." [[cit. HuggingFace Padding and truncation]](https://huggingface.co/docs/transformers/en/pad_truncation#padding-and-truncation)

More information about padding and truncation is available at [HuggingFace Padding and truncation](https://huggingface.co/docs/transformers/en/pad_truncation#padding-and-truncation) page.


In [None]:
def tokenize_function(examples):
    '''
     padding: 'max_length': pad to a length specified by the max_length argument or the
     maximum length accepted by the model if no max_length is provided (max_length=None).
     Padding will still be applied if you only provide a single sequence. [from documentation]


     truncation: True or 'longest_first': truncate to a maximum length specified
     by the max_length argument or the maximum length accepted by the model if
     no max_length is provided (max_length=None). This will truncate the token by
     token, removing a token from the longest sequence in the pair until the
     proper length is reached. [from documentation]
    '''
    return tokenizer(examples["text"], padding="max_length", truncation=True)

'''
 apply the function to all the elements in the dataset (individually or in batches)
 https://huggingface.co/docs/datasets/v1.11.0/package_reference/main_classes.html?highlight=dataset%20map#datasets.Dataset.map
 batch mode is very powerful. It allows you to speed up processing
 more info here: https://huggingface.co/docs/datasets/en/about_map_batch
'''
cache_files = {
    "train": "~/.cache/imdb/imdb_train_tokenized.arrow",
    "test": "~/.cache/imdb/imdb_test_tokenized.arrow"
} #path to the local cache files, where the current computation from the following function will be stored. 
# Caching saves RAM when working with large datasets and saves time instead of doing transformations on the fly.
tokenized_imdb = imdb.map(tokenize_function, batched=True, cache_file_names=cache_files)

Explore the tokenized dataset and check the differences in structure with respect to the original dataset.

In [None]:
# explore the tokenized dataset

# print the tokenized dataset
print(tokenized_imdb)


Explore the tokenized dataset.

Can you spot the [CLS] and [SEP] tokens in the tokenized reviews?

In [None]:
# print the first review of the tokenized dataset
# YOUR CODE HERE
raise NotImplementedError()


# print the label of the first review of the tokenized dataset
# YOUR CODE HERE
raise NotImplementedError()

# print the tokens of the first review of the tokenized dataset
# Can you spot the [CLS] and [SEP] tokens in the tokenized review?
# YOUR CODE HERE
raise NotImplementedError()


# print the attention_mask of the first review of the tokenized dataset
# YOUR CODE HERE
raise NotImplementedError()



Look at the token_type_ids of a sample.
What do you notice?

Check [HuggingFace Glossary](https://huggingface.co/docs/transformers/en/glossary) for the explanation.

In [None]:
# print the token_type_ids of the first review of the tokenized dataset
# YOUR CODE HERE
raise NotImplementedError()

So far, you used the HuggingFace dataset object to explore the dataset. We need to do some dataset manipulation in order to make the dataset compatible with native pytorch.

Therefore, manually postprocess tokenized_dataset to prepare it for training with Pytorch, following these steps:
 -  "Remove the text column because the model does not accept raw text as an input"
 - "Rename the label column to labels because the model expects the argument to be named labels"
 - "Set the format of the dataset to return PyTorch tensors instead of lists"

  [[cit. HuggingFace Train in Native PyTorch]](https://huggingface.co/docs/transformers/training#train-in-native-pytorch)

In [None]:
# before any manual postprocessing
print(tokenized_imdb)
print(type(tokenized_imdb), type(tokenized_imdb["train"]["label"]))

In [None]:
# Remove the text column because the model does not accept raw text as an input

tokenized_imdb = tokenized_imdb.remove_columns(["text"])
print(tokenized_imdb)
print(type(tokenized_imdb), type(tokenized_imdb["train"]["label"]))

In [None]:
#Rename the label column to labels because the model expects the argument to be named labels

tokenized_imdb = tokenized_imdb.rename_column("label", "labels")
print(tokenized_imdb)
print(type(tokenized_imdb), type(tokenized_imdb["train"]["labels"]))

In [None]:
# Set the format of the dataset to return PyTorch tensors instead of lists

tokenized_imdb.set_format("torch")
print(tokenized_imdb)
print(type(tokenized_imdb), type(tokenized_imdb["train"]["labels"]))

The dataset has 25k samples for training and 25k for testing. Using the entire training set will take a long time to finish the fine-tuning step. Therefore, for teaching purposes, we will "create a smaller subset of the original dataset to speed up the fine-tuning" [[cit. HuggingFace Train in Native PyTorch]](https://huggingface.co/docs/transformers/training#train-in-native-pytorch)

In [None]:
# create a smaller subset of the dataset as previously shown to speed up the fine-tuning

small_train_dataset = tokenized_imdb["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_imdb["test"].shuffle(seed=42).select(range(1000))

Then, "create a DataLoader for your training and test datasets so you can iterate over batches of data." [[cit. HuggingFace Train in Native PyTorch]](https://huggingface.co/docs/transformers/training#train-in-native-pytorch)

In [None]:
# create a DataLoader for your training and test datasets so you can iterate over batches of data:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(small_train_dataset, shuffle=True, batch_size=8)
eval_dataloader = DataLoader(small_eval_dataset, batch_size=8)

In [None]:
# get some random training samples
dataiter = iter(train_dataloader)
batch_data= next(dataiter)

print(batch_data)

#NOTICE CLS token =101 and SEP token = 102

# Model and optimizer

  - 5.2 Load and customize a HuggingFace pretrained transformer for text classification.



"Before you start training your model, create a map of the expected ids to their labels with id2label and label2id." [[cit. HuggingFace Text classification]](https://huggingface.co/docs/transformers/tasks/sequence_classification).

Read [HuggingFace Discussion: "Change label names on inference API"](https://discuss.huggingface.co/t/change-label-names-on-inference-api/3063) for more information on changing the label names.

In [None]:
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

Load the BERT-tiny with the number of expected labels using Auto classes.

Documentation of BERT-tiny available [here](https://github.com/google-research/bert/blob/master/README.md).


In [None]:
from transformers import AutoModelForSequenceClassification

'''
BERT-tiny license
Copyright 2018 The Google AI Language Team Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''


model_bert_l4 = AutoModelForSequenceClassification.from_pretrained(
    "google/bert_uncased_L-4_H-128_A-2", num_labels=2, id2label=id2label, label2id=label2id)



What to do about this warning message: *Some weights ... were not initialized from the model checkpoint ... newly initialized* ?

"The warning is telling you that some weights were randomly initialized (here you classification head), which is normal since you are instantiating a pretrained model for a different task. It’s there to remind you to finetune your model (it’s not usable for inference directly)" [[cit. HuggingFace Discussion: "Some weights of the model were not used warning"]](https://discuss.huggingface.co/t/is-some-weights-of-the-model-were-not-used-warning-normal-when-pre-trained-bert-only-by-mlm/5672)

More discussions on this warning here:

[GitHub Discussion: "What to do about this warning message: "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification"](https://github.com/huggingface/transformers/issues/5421)

[Stack Overflow Discussion: "Python: BERT Error - Some weights of the model checkpoint at were not used when initializing BertModel"](https://stackoverflow.com/questions/67546911/python-bert-error-some-weights-of-the-model-checkpoint-at-were-not-used-when)

NOTE: If you have GPU access and time to wait for the training to finish, you can use  [DistilBERT](https://arxiv.org/pdf/1910.01108v4.pdf), a variant of BERT.

According to the paper: *While most prior work investigated the use of
distillation for building task-specific models, we leverage knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster.*

```
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id)
```
Finetuning DistilBERT on the 25k training samples of IMDB for 2 epochs will yield an accuracy of ~92%, but it will take circa 40 minutes of GPU training (when using a T4 GPU)

In [None]:
print("bert-tiny variant number of parameters: ", model_bert_l4.num_parameters())


You need to define the optimizer and learning rate scheduler to fine-tune the model. Let’s use the AdamW optimizer from PyTorch:

AdamW optimizer "implements gradient bias correction as well as weight decay. The optimizer allows us to apply different hyperpameters for specific parameter groups. For example, we can apply weight decay to all parameters other than bias and layer normalization terms." [[cit. HuggingFace transformers training]](https://huggingface.co/transformers/v3.5.1/training.html)

Why use AdamW? From the [paper](https://arxiv.org/abs/1711.05101): "We empirically showed that our version of Adam with decoupled weight decay yields substantially better generalization performance than the common implementation of Adam with L2 regularization." "Our proposed decoupled weight decay has
already been adopted by many researchers, and the community has implemented
it in TensorFlow and PyTorch."


Also this paper shows that "fine-tuning with AdamW
performs substantially better than SGD on modern Vision Transformer and Con-
vNeXt models. "


In [None]:
from torch.optim import AdamW

optimizer = AdamW(model_bert_l4.parameters(), lr=5e-5)

Define a learning rate scheduler.

"One commonly used technique for training a Transformer is learning rate warm-up. This means that we gradually increase the learning rate from 0 on to our originally specified learning rate in the first few iterations. Thus, we slowly start learning instead of taking very large steps from the beginning. In fact, training a deep Transformer without learning rate warm-up can make the model diverge and achieve a much worse performance on training and testing." [[cit. UvA DL - Transformers and Multi-Head Attention ]](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial6/Transformers_and_MHAttention.html)

[Stack Overflow Discussion: "What does "learning rate warm-up" mean?"](https://stackoverflow.com/questions/55933867/what-does-learning-rate-warm-up-mean).


What is learning rate warm-up and when do we need it?
"If your data set is highly differentiated, you can suffer from a sort of "early over-fitting". If your shuffled data happens to include a cluster of related, strongly-featured observations, your model's initial training can skew badly toward those features -- or worse, toward incidental features that aren't truly related to the topic at all.

Warm-up is a way to reduce the primacy effect of the early training examples. Without it, you may need to run a few extra epochs to get the convergence desired, as the model un-trains those early superstitions." [[cit. Stack Overflow Discussion: Optimizer and scheduler for BERT fine-tuning]](https://stackoverflow.com/questions/60120043/optimizer-and-scheduler-for-bert-fine-tuning)


Linear scheduler: creates "a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer" [[cit. HuggingFace: optimizer_schedules]](https://huggingface.co/docs/transformers/en/main_classes/optimizer_schedules)

In [None]:
from transformers import get_scheduler

num_epochs = 2
num_training_steps = num_epochs * len(train_dataloader)
# feel free to experiment with different num_warmup_steps
lr_scheduler = get_scheduler(
    name="linear", optimizer=optimizer, num_warmup_steps=1, num_training_steps=num_training_steps
)

Specify  the device to use: CPU or GPU (if you have access to one). Keep in mind that training on a CPU may take several hours instead of a couple of minutes, depeding on the model you are using.

In [None]:
import torch

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model_bert_l4.to(device)

# Training and evaluation

  - 5.3 Fine-tune and evaluate the transformer.





According to [HuggingFace documentation](https://huggingface.co/docs/transformers/en/training), "you don’t have to pass a loss argument to your models when you compile() them! Hugging Face models automatically choose a loss that is appropriate for their task and model architecture if this argument is left blank. You can always override this by specifying a loss yourself if you want to. "

In [None]:
# use the tqdm library to add a progress bar over the number of training steps
from tqdm.auto import tqdm

progress_bar = tqdm(range(num_training_steps))

# put the model in train mode
model_bert_l4.train()

# iterate over epochs
for epoch in range(num_epochs):
    # iterate over batches in training set
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        '''
        **kwargs is a common idiom to allow an arbitrary number of arguments to functions
        The **kwargs will give you all keyword arguments as a dictionary
        https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters
        '''
        

        outputs = model_bert_l4(**batch)
        '''
        Note that Transformers models all have a default task-relevant loss function,
        so you don’t need to specify one unless you want to

        Get the loss form the outputs
        in this example, the outputs are instances of subclasses of ModelOutput
        https://huggingface.co/transformers/v4.3.0/main_classes/output.html
        Those are data structures containing all the information returned by
        the model, but that can also be used as tuples or dictionaries.

        The outputs object has a loss and logits attribute
        You can access each attribute as you would usually do,
        and if that attribute has not been returned by the model, you will get None.
        for instance, outputs.loss is the loss computed by the model
        '''

        # YOUR CODE HERE
        raise NotImplementedError()

        # do the backward pass
        # YOUR CODE HERE
        raise NotImplementedError()

        # perform one step of the optimizer
        # YOUR CODE HERE
        raise NotImplementedError()

        # peform one step of the lr_scheduler, similar with the optimizer
        # YOUR CODE HERE
        raise NotImplementedError()

        # zero the gradients, call zero_grad() on the optimizer
        # YOUR CODE HERE
        raise NotImplementedError()

        progress_bar.update(1)



[HuggingFace Evaluate Metric](https://huggingface.co/evaluate-metric)

[HuggingFace Metrics](https://huggingface.co/docs/datasets/en/how_to_metrics)

To evaluate the model's performance on the test set, you can use the HuggingFace evaluate library. Keep in mind that "instead of calculating and reporting the metric at the end of each epoch, this time you’ll accumulate all the batches with add_batch and calculate the metric at the very end. " [[cit. HuggingFace Train in Native PyTorch]](https://huggingface.co/docs/transformers/training#train-in-native-pytorch).

Metric Card for [accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy).

In [None]:
import evaluate

# define the metric you want to use to evaluate your model
metric = evaluate.load("accuracy")
progress_bar = tqdm(range(len(eval_dataloader)))

# put the model in eval mode
model_bert_l4.eval()
# iterate over batches of evaluation dataset
for batch in eval_dataloader:
    batch = {k: v.to(device) for k, v in batch.items()}
    with torch.no_grad():
        '''
        pass the batches to the model and get the outputs
        (hint: look at the training loop)
        outputs = ...
        '''
        # YOUR CODE HERE
        raise NotImplementedError()

    
    '''
    get the logits from the outputs,
    similar as you did for the loss in the training loop
    logits = ...
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

    # use argmax to get the predicted class
    predictions = torch.argmax(logits, dim=-1)
    
    '''
    metric.add_batch() adds a batch of predictions and references
    Metric.add_batch() by passing it your model predictions, and the references
    the model predictions should be evaluated against
    '''
    metric.add_batch(predictions=predictions, references=batch["labels"])
    progress_bar.update(1)
# calculate a metric by  calling metric.compute()
metric.compute()



# Inference
  - 5.4 Use the fine-tuned model for inference.

Write your own positive and negative movie reviews and use the finetuned model to classify them.

Is the model predicting as you would expect?

In [None]:
text_pos = "This movis is awesome."
text_neg = "This was a very bad movie. It was so boring that I've fell asleep while watching it."
text_pos_neg = [text_pos, text_neg]

In [None]:

'''
Inference on single review - get the prediction for the positive review. 

first, use the tokenier to tokenize the input text
inputs = ...

hint: look at the tokenizer definition provided previously
remember to specify the padding and trunction
'''


# YOUR CODE HERE
raise NotImplementedError()

inputs = inputs.to(device)

with torch.no_grad():
    # pass the input to the model
    # outputs = ...
    outputs = model_bert_l4(**inputs)

    # get the logits
    # logits = ...
    # YOUR CODE HERE
    raise NotImplementedError()

#get the predicted id class
predicted_class_id = logits.argmax().item()
print("predicted_class_id: ", predicted_class_id)

# get the predicted class name
print(model_bert_l4.config.id2label[predicted_class_id])

'''
Now, get the prediction for the negetaive review
first, use the tokenier to tokenize the input text
inputs = ...
hint: look at the tokenizer definition provided previously
remember to specify the padding and trunction
'''

# YOUR CODE HERE
raise NotImplementedError()

with torch.no_grad():
    '''
    pass the input to the model and get the logits using a single line of code
    also, in the same line of code rememeber to put the inputs to the correct device
    logits = ...
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

#get the predicted id class
predicted_class_id = logits.argmax().item()
print("predicted_class_id: ", predicted_class_id)

# get the predicted class name
print(model_bert_l4.config.id2label[predicted_class_id])

In [None]:
'''
Inference on multiple reviews - use text_pos_neg list of reviews

first, use the tokenier to tokenize the input text
inputs = ...

hint: look at the tokenizer definition provided previously
remember to specify the padding and trunction
'''

# YOUR CODE HERE
raise NotImplementedError()

with torch.no_grad():
    '''
    pass the input to the model and get the logits using a single line of code
    also, in the same line of code rememeber to put the inputs to the correct device
    logits = ...
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

# check out the logits
print("logits: ", logits)

#get the predicted id class
predicted_class_id = logits.argmax(dim=1)
print(predicted_class_id)

# get the predicted class name
for pred in predicted_class_id:
  print(model_bert_l4.config.id2label[pred.item()])

Congratulations, you have reached the end of this exercise!

Additionally, you can try using  DistilBERT, for example, and see how the accuracy changes.