In [None]:
# Transformers installation
! pip install transformers datasets evaluate accelerate
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git

# Text classification

In [None]:
#@title
from IPython.display import HTML

HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/leNG9fN9FQU?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>')

Text classification is a common NLP task that assigns a label or class to text. Some of the largest companies run text classification in production for a wide range of practical applications. One of the most popular forms of text classification is sentiment analysis, which assigns a label like üôÇ positive, üôÅ negative, or üòê neutral to a sequence of text.

This guide will show you how to:

1. Finetune [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) on the [IMDb](https://huggingface.co/datasets/imdb) dataset to determine whether a movie review is positive or negative.
2. Use your finetuned model for inference.

<Tip>

To see all architectures and checkpoints compatible with this task, we recommend checking the [task-page](https://huggingface.co/tasks/text-classification).

</Tip>

Before you begin, make sure you have all the necessary libraries installed:

```bash
pip install transformers datasets evaluate accelerate
```

We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:

In [None]:
from huggingface_hub import notebook_login

notebook_login()

## Load IMDb dataset

Start by loading the IMDb dataset from the ü§ó Datasets library:

In [None]:
from datasets import load_dataset

imdb = load_dataset("imdb")

Then take a look at an example:

In [None]:
imdb["test"][0]

{
    "label": 0,
    "text": "I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn't match the background, and painfully one-dimensional characters cannot be overcome with a 'sci-fi' setting. (I'm sure there are those of you out there who think Babylon 5 is good sci-fi TV. It's not. It's clich√©d and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. It's really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it

There are two fields in this dataset:

- `text`: the movie review text.
- `label`: a value that is either `0` for a negative review or `1` for a positive review.

## Preprocess

The next step is to load a DistilBERT tokenizer to preprocess the `text` field:

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")

Create a preprocessing function to tokenize `text` and truncate sequences to be no longer than DistilBERT's maximum input length:

In [None]:
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

To apply the preprocessing function over the entire dataset, use ü§ó Datasets [map](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map) function. You can speed up `map` by setting `batched=True` to process multiple elements of the dataset at once:

In [None]:
tokenized_imdb = imdb.map(preprocess_function, batched=True)

Now create a batch of examples using [DataCollatorWithPadding](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.DataCollatorWithPadding). It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.

In [None]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## Evaluate

Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the ü§ó [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy) metric (see the ü§ó Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):

In [None]:
import evaluate

accuracy = evaluate.load("accuracy")

Then create a function that passes your predictions and labels to [compute](https://huggingface.co/docs/evaluate/main/en/package_reference/main_classes#evaluate.EvaluationModule.compute) to calculate the accuracy:

In [None]:
import numpy as np


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.

## Train

Before you start training your model, create a map of the expected ids to their labels with `id2label` and `label2id`:

In [None]:
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

<Tip>

If you aren't familiar with finetuning a model with the [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer), take a look at the basic tutorial [here](https://huggingface.co/docs/transformers/main/en/tasks/../training#train-with-pytorch-trainer)!

</Tip>

You're ready to start training your model now! Load DistilBERT with [AutoModelForSequenceClassification](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForSequenceClassification) along with the number of expected labels, and the label mappings:

In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
)

At this point, only three steps remain:

1. Define your training hyperparameters in [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments). The only required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) will evaluate the accuracy and save the training checkpoint.
2. Pass the training arguments to [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.
3. Call [train()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.train) to finetune your model.

In [None]:
training_args = TrainingArguments(
    output_dir="my_awesome_model",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_imdb["train"],
    eval_dataset=tokenized_imdb["test"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

<Tip>

[Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) applies dynamic padding by default when you pass `tokenizer` to it. In this case, you don't need to specify a data collator explicitly.

</Tip>

Once training is completed, share your model to the Hub with the [push_to_hub()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.push_to_hub) method so everyone can use your model:

In [None]:
trainer.push_to_hub()

<Tip>

For a more in-depth example of how to finetune a model for text classification, take a look at the corresponding
[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb).

</Tip>

## Inference

Great, now that you've finetuned a model, you can use it for inference!

Grab some text you'd like to run inference on:

In [None]:
text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."

The simplest way to try out your finetuned model for inference is to use it in a [pipeline()](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline). Instantiate a `pipeline` for sentiment analysis with your model, and pass your text to it:

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="stevhliu/my_awesome_model")
classifier(text)

[{'label': 'POSITIVE', 'score': 0.9994940757751465}]

You can also manually replicate the results of the `pipeline` if you'd like:

Tokenize the text and return PyTorch tensors:

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_model")
inputs = tokenizer(text, return_tensors="pt")

Pass your inputs to the model and return the `logits`:

In [None]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("stevhliu/my_awesome_model")
with torch.no_grad():
    logits = model(**inputs).logits

Get the class with the highest probability, and use the model's `id2label` mapping to convert it to a text label:

In [None]:
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]

'POSITIVE'

# Task
Finetune a `bert-base-cased` model for 5-class sentiment classification on the SST-5 dataset. The task includes loading the SST-5 dataset, defining `id2label` and `label2id` for 5 classes, loading the `bert-base-cased` tokenizer, preprocessing the dataset with dynamic padding, loading the `bert-base-cased` model for sequence classification with 5 labels, defining `TrainingArguments` (e.g., `learning_rate=2e-5`, `per_device_train_batch_size=16`, `num_train_epochs=5`), initializing and training a Hugging Face `Trainer`, and finally reporting the training hyperparameters and the model's performance (e.g., accuracy) on the SST-5 dataset's test set.

## Load SST-5 Dataset

### Subtask:
Load the SST-5 dataset from Hugging Face Datasets. This will replace the IMDb dataset.


**Reasoning**:
To load the SST-5 dataset, I need to import the `load_dataset` function from the `datasets` library and then use it to load 'SetFit/sst5'.



In [1]:
from datasets import load_dataset

sst5 = load_dataset("SetFit/sst5")
print("SST-5 dataset loaded successfully.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/421 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


train.jsonl: 0.00B [00:00, ?B/s]

dev.jsonl: 0.00B [00:00, ?B/s]

test.jsonl: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/8544 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1101 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2210 [00:00<?, ? examples/s]

SST-5 dataset loaded successfully.


**Reasoning**:
The previous step successfully loaded the SST-5 dataset. The next logical step, according to the main task description, is to define the `id2label` and `label2id` mappings for the 5 sentiment classes in the SST-5 dataset.



In [2]:
id2label = {0: "very negative", 1: "negative", 2: "neutral", 3: "positive", 4: "very positive"}
label2id = {"very negative": 0, "negative": 1, "neutral": 2, "positive": 3, "very positive": 4}
print("id2label and label2id mappings defined for 5 classes.")

id2label and label2id mappings defined for 5 classes.


**Reasoning**:
The next logical step is to load the tokenizer for the `bert-base-cased` model, which is specified in the main task, to prepare the text data for processing.



In [3]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
print("BERT-base-cased tokenizer loaded successfully.")

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

BERT-base-cased tokenizer loaded successfully.


**Reasoning**:
With the tokenizer loaded, the next step is to preprocess the SST-5 dataset by tokenizing the text and applying truncation, similar to how the IMDb dataset was preprocessed in the original notebook.



In [4]:
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_sst5 = sst5.map(preprocess_function, batched=True)
print("SST-5 dataset tokenized successfully.")

Map:   0%|          | 0/8544 [00:00<?, ? examples/s]

Map:   0%|          | 0/1101 [00:00<?, ? examples/s]

Map:   0%|          | 0/2210 [00:00<?, ? examples/s]

SST-5 dataset tokenized successfully.


**Reasoning**:
After tokenizing the dataset, the next step is to create a `DataCollatorWithPadding` instance. This will efficiently pad batches of sequences to the longest length in that batch during training, which is crucial for optimizing GPU memory usage and training speed, as demonstrated in the original notebook.



In [5]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
print("Data collator with padding initialized successfully.")

Data collator with padding initialized successfully.


**Reasoning**:
Now that the data is tokenized and the data collator is ready, the next step is to load the `bert-base-cased` model for sequence classification, specifying the number of labels (5) and the `id2label` mapping, as required by the main task.



In [6]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-cased", num_labels=5, id2label=id2label, label2id=label2id
)
print("BERT-base-cased model for sequence classification loaded successfully.")

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERT-base-cased model for sequence classification loaded successfully.


**Reasoning**:
The model has been loaded. The next logical step is to define the training arguments as specified by the task, initialize the Hugging Face `Trainer` with the model, datasets, tokenizer, data collator, and evaluation metrics, and then start the training process.



In [7]:
from transformers import TrainingArguments, Trainer
import numpy as np
import evaluate

# Re-define compute_metrics and accuracy if not in current scope, though they should be from previous cells
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)


training_args = TrainingArguments(
    output_dir="sst5_bert_finetuned",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=True,
    report_to="none" # Disable reporting to prevent errors if not logged in
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_sst5["train"],
    eval_dataset=tokenized_sst5["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer initialized. Starting training...")
trainer.train()
print("Model training completed.")

ModuleNotFoundError: No module named 'evaluate'

**Reasoning**:
The previous cell failed because the `evaluate` library was not found, indicated by `ModuleNotFoundError: No module named 'evaluate'`. Although `evaluate` was installed at the beginning of the notebook, it seems to be missing or inaccessible in the current environment. To fix this, I need to explicitly install the `evaluate` library in a new code cell.



In [8]:
pip install evaluate
print("Evaluatelibrary installed successfully.")

SyntaxError: invalid syntax (ipython-input-465303846.py, line 1)

**Reasoning**:
The previous `pip install evaluate` command failed with a `SyntaxError` because shell commands in Jupyter/Colab notebooks need to be prefixed with `!`. I will correct this by adding `!` before `pip install evaluate`.



In [9]:
!
pip install evaluate
print("Evaluate library installed successfully.")

SyntaxError: invalid syntax (ipython-input-4128091612.py, line 2)

**Reasoning**:
The previous attempt to install the `evaluate` library failed due to a `SyntaxError`. The `!` for shell commands must be on the same line as the command itself. I will correct the `pip install` command to `!pip install evaluate` to ensure proper execution.



In [10]:
!
pip install evaluate
print("Evaluate library installed successfully.")

SyntaxError: invalid syntax (ipython-input-4128091612.py, line 2)

**Reasoning**:
The previous attempt to install the `evaluate` library failed due to a `SyntaxError`. The `!` for shell commands must be on the same line as the command itself. I will correct the `pip install` command to `!pip install evaluate` to ensure proper execution.



In [11]:
import sys
!{sys.executable} -m pip install evaluate
print("Evaluate library installed successfully.")

Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m84.1/84.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6
Evaluate library installed successfully.


**Reasoning**:
The `evaluate` library has now been successfully installed. The next logical step is to re-run the code cell that previously failed due to the missing `evaluate` library. This cell initializes the training arguments and the `Trainer`, then starts the model training.



In [12]:
from transformers import TrainingArguments, Trainer
import numpy as np
import evaluate

# Re-define compute_metrics and accuracy if not in current scope, though they should be from previous cells
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)


training_args = TrainingArguments(
    output_dir="sst5_bert_finetuned",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=True,
    report_to="none" # Disable reporting to prevent errors if not logged in
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_sst5["train"],
    eval_dataset=tokenized_sst5["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer initialized. Starting training...")
trainer.train()
print("Model training completed.")

Downloading builder script: 0.00B [00:00, ?B/s]

  trainer = Trainer(


HfHubHTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/repos/create (Request ID: Root=1-69165439-31b0e992302554dd073e13bd;2d3f917f-86c6-4846-bb12-55d8cb287baa)

Invalid username or password.

**Reasoning**:
The previous cell failed with an `HfHubHTTPError: 401 Client Error: Unauthorized`, indicating that the model cannot be pushed to the Hugging Face Hub because the user is not authenticated or the token is invalid. To resolve this and allow training to proceed, I will set `push_to_hub=False` in the `TrainingArguments`.



In [None]:
from transformers import TrainingArguments, Trainer
import numpy as np
import evaluate

# Re-define compute_metrics and accuracy if not in current scope, though they should be from previous cells
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)


training_args = TrainingArguments(
    output_dir="sst5_bert_finetuned",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False, # Changed to False to avoid authentication error
    report_to="none" # Disable reporting to prevent errors if not logged in
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_sst5["train"],
    eval_dataset=tokenized_sst5["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer initialized. Starting training...")
trainer.train()
print("Model training completed.")

  trainer = Trainer(


Trainer initialized. Starting training...




Epoch,Training Loss,Validation Loss


**Reasoning**:
The previous cell completed training but showed a `FutureWarning` about the `tokenizer` argument in `Trainer` being deprecated. To address this, I will update the `Trainer` initialization to use `processing_class=tokenizer` instead of `tokenizer=tokenizer`.



In [1]:
from transformers import TrainingArguments, Trainer
import numpy as np
import evaluate

# Re-define compute_metrics and accuracy if not in current scope, though they should be from previous cells
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)


training_args = TrainingArguments(
    output_dir="sst5_bert_finetuned",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False, # Changed to False to avoid authentication error
    report_to="none" # Disable reporting to prevent errors if not logged in
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_sst5["train"],
    eval_dataset=tokenized_sst5["validation"],
    processing_class=tokenizer, # Changed 'tokenizer' to 'processing_class'
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer initialized. Starting training...")
trainer.train()
print("Model training completed.")

ModuleNotFoundError: No module named 'evaluate'

**Reasoning**:
The previous cell failed with a `ModuleNotFoundError: No module named 'evaluate'`, even though `evaluate` was installed earlier. This often indicates that the Python kernel's environment isn't fully updated or the package wasn't correctly registered in the current session. To ensure `evaluate` is available, I will add an explicit `!pip install evaluate` command at the beginning of the cell to guarantee its presence before the import statement.



In [2]:
!pip install evaluate
from transformers import TrainingArguments, Trainer
import numpy as np
import evaluate

# Re-define compute_metrics and accuracy if not in current scope, though they should be from previous cells
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)


training_args = TrainingArguments(
    output_dir="sst5_bert_finetuned",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False, # Changed to False to avoid authentication error
    report_to="none" # Disable reporting to prevent errors if not logged in
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_sst5["train"],
    eval_dataset=tokenized_sst5["validation"],
    processing_class=tokenizer, # Changed 'tokenizer' to 'processing_class'
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer initialized. Starting training...")
trainer.train()
print("Model training completed.")

Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m84.1/84.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script: 0.00B [00:00, ?B/s]

NameError: name 'model' is not defined

**Reasoning**:
The previous cell failed with `NameError: name 'model' is not defined`, indicating that the `model` variable was not initialized in the current execution context. This can happen if preceding cells defining the model were not run. To fix this, I will include the `AutoModelForSequenceClassification.from_pretrained` call in the current cell to ensure the `model` is properly defined before initializing the `Trainer`.



In [3]:
!pip install evaluate
from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification
import numpy as np
import evaluate

# Re-define compute_metrics and accuracy if not in current scope, though they should be from previous cells
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

# Re-loading model to ensure it's defined
# id2label and label2id should be defined in previous cells
id2label = {0: "very negative", 1: "negative", 2: "neutral", 3: "positive", 4: "very positive"}
label2id = {"very negative": 0, "negative": 1, "neutral": 2, "positive": 3, "very positive": 4}

model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-cased", num_labels=5, id2label=id2label, label2id=label2id
)

training_args = TrainingArguments(
    output_dir="sst5_bert_finetuned",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False, # Changed to False to avoid authentication error
    report_to="none" # Disable reporting to prevent errors if not logged in
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_sst5["train"],
    eval_dataset=tokenized_sst5["validation"],
    processing_class=tokenizer, # Changed 'tokenizer' to 'processing_class'
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer initialized. Starting training...")
trainer.train()
print("Model training completed.")



config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


NameError: name 'tokenized_sst5' is not defined

## Initialize and Train Trainer (Retry 4)

### Subtask:
Initialize the Hugging Face Trainer and start the finetuning process for the BERT model on the SST-5 dataset, ensuring all necessary components are defined.


**Reasoning**:
The previous cell failed with a `NameError: name 'tokenized_sst5' is not defined`, indicating that the dataset preprocessing was not available in the current execution. To fix this and ensure all components are properly defined before training, I will combine all the necessary setup steps (install evaluate, import libraries, define id2label/label2id, load tokenizer, preprocess dataset, define data collator, load model, define compute_metrics, and set up the Trainer) into a single code cell, as detailed in the instructions.



In [4]:
!pip install evaluate
from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification, AutoTokenizer, DataCollatorWithPadding
from datasets import load_dataset
import numpy as np
import evaluate

# 1. Ensure evaluate library is installed (handled by !pip install)

# 2. Imports are now at the top of this cell

# 3. Re-define id2label and label2id
id2label = {0: "very negative", 1: "negative", 2: "neutral", 3: "positive", 4: "very positive"}
label2id = {"very negative": 0, "negative": 1, "neutral": 2, "positive": 3, "very positive": 4}
print("id2label and label2id mappings defined for 5 classes.")

# Load SST-5 dataset (if not already loaded, ensures 'sst5' is available)
sst5 = load_dataset("SetFit/sst5")
print("SST-5 dataset loaded successfully.")

# 4. Re-load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
print("BERT-base-cased tokenizer loaded successfully.")

# 5. Re-define preprocess_function and re-tokenize the sst5 dataset
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_sst5 = sst5.map(preprocess_function, batched=True)
print("SST-5 dataset tokenized successfully.")

# 6. Re-initialize the DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
print("Data collator with padding initialized successfully.")

# 7. Re-load the bert-base-cased model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-cased", num_labels=5, id2label=id2label, label2id=label2id
)
print("BERT-base-cased model for sequence classification loaded successfully.")

# 8. Load the accuracy metric
accuracy = evaluate.load("accuracy")
print("Accuracy metric loaded.")

# 9. Define the compute_metrics function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)
print("compute_metrics function defined.")

# 10. Define TrainingArguments
training_args = TrainingArguments(
    output_dir="sst5_bert_finetuned",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False, # Changed to False to avoid authentication error
    report_to="none" # Disable reporting to prevent errors if not logged in
)
print("TrainingArguments defined.")

# 11. Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_sst5["train"],
    eval_dataset=tokenized_sst5["validation"],
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
print("Trainer initialized. Starting training...")

# 12. Start the training process
trainer.train()
print("Model training completed.")

id2label and label2id mappings defined for 5 classes.


README.md:   0%|          | 0.00/421 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


train.jsonl: 0.00B [00:00, ?B/s]

dev.jsonl: 0.00B [00:00, ?B/s]

test.jsonl: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/8544 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1101 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2210 [00:00<?, ? examples/s]

SST-5 dataset loaded successfully.


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

BERT-base-cased tokenizer loaded successfully.


Map:   0%|          | 0/8544 [00:00<?, ? examples/s]

Map:   0%|          | 0/1101 [00:00<?, ? examples/s]

Map:   0%|          | 0/2210 [00:00<?, ? examples/s]

SST-5 dataset tokenized successfully.
Data collator with padding initialized successfully.


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERT-base-cased model for sequence classification loaded successfully.
Accuracy metric loaded.
compute_metrics function defined.
TrainingArguments defined.
Trainer initialized. Starting training...


Epoch,Training Loss,Validation Loss,Accuracy
1,1.255,1.140759,0.497729
2,0.9614,1.159222,0.489555
3,0.7272,1.288403,0.508629
4,0.5396,1.463798,0.504087
5,0.3908,1.594639,0.488647


Model training completed.


**Reasoning**:
After successfully training the model, the next logical step is to evaluate its performance on the test set, as specified in the main task. This will provide an accuracy score for the finetuned model.



In [5]:
print("Evaluating model on the test set...")
metrics = trainer.evaluate(tokenized_sst5["test"])
print(f"Model evaluation metrics on test set: {metrics}")

Evaluating model on the test set...


Model evaluation metrics on test set: {'eval_loss': 1.0766422748565674, 'eval_accuracy': 0.5298642533936652, 'eval_runtime': 7.7537, 'eval_samples_per_second': 285.027, 'eval_steps_per_second': 17.927, 'epoch': 5.0}


## Training Summary

**Training Hyperparameters:**
- `output_dir`: sst5_bert_finetuned
- `learning_rate`: 2e-5
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 5
- `weight_decay`: 0.01
- `eval_strategy`: epoch
- `save_strategy`: epoch
- `load_best_model_at_end`: True
- `push_to_hub`: False
- `report_to`: none

**Model Performance on SST-5 Test Set:**
- `eval_accuracy`: 0.5298642533936652
- `eval_loss`: 1.0766422748565674

## Final Task

### Subtask:
Summarize the changes made, the hyperparameters used, and the final performance metrics of the sentiment classifier on the SST-5 dataset using the `bert-base-cased` model.


## Summary:

### Q&A
The sentiment classifier was successfully fine-tuned on the SST-5 dataset.

**Changes made during the solving process:**
1.  The `evaluate` library was installed using `!{sys.executable} -m pip install evaluate` to resolve `ModuleNotFoundError`.
2.  `push_to_hub` in `TrainingArguments` was changed from `True` to `False` to prevent `HfHubHTTPError` (authentication errors).
3.  `report_to` in `TrainingArguments` was set to `"none"` to avoid potential errors related to reporting services.
4.  The definition of `id2label`, `label2id`, and the loading of the model (`AutoModelForSequenceClassification`) were re-included in the training cell in the final successful attempt to resolve `NameError` issues caused by environment inconsistencies.
5.  Similarly, `!pip install evaluate` was added at the beginning of the cell to ensure the library's presence consistently.

**Hyperparameters used for training:**
*   `output_dir`: `sst5_bert_finetuned`
*   `learning_rate`: 2e-5
*   `per_device_train_batch_size`: 16
*   `per_device_eval_batch_size`: 16
*   `num_train_epochs`: 5
*   `weight_decay`: 0.01
*   `eval_strategy`: `epoch`
*   `save_strategy`: `epoch`
*   `load_best_model_at_end`: `True`
*   `push_to_hub`: `False`
*   `report_to`: `none`

**Final performance metrics on the SST-5 test set:**
*   **Accuracy:** 0.52986
*   **Loss:** 1.07664

### Data Analysis Key Findings
*   The `bert-base-cased` model was successfully fine-tuned for 5-class sentiment classification on the SST-5 dataset.
*   The model achieved an `eval_accuracy` of approximately 52.99% and an `eval_loss` of approximately 1.077 on the SST-5 test set after 5 training epochs.
*   The initial training process encountered several environment-related challenges, including `ModuleNotFoundError` for the `evaluate` library and `NameError` for previously defined variables, which were resolved by ensuring all necessary components were re-initialized within the execution scope.

### Insights or Next Steps
*   The achieved accuracy of 52.99% for a 5-class classification task suggests a reasonable baseline, but there is significant room for improvement. Further optimization could explore different learning rates, batch sizes, or a greater number of training epochs, or investigate more advanced architectures or ensemble methods.
*   Given the initial stability issues encountered (e.g., `NameError`, `ModuleNotFoundError`), it is crucial to ensure a consistent and robust execution environment for machine learning pipelines, especially when dealing with multiple sequential steps or interactive development. Encapsulating critical definitions and installations within the same execution block can mitigate such issues.
