<a href="https://colab.research.google.com/github/Jonasbukhave/BashRepoTest/blob/main/Excercise_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Project 1: Fine-tuning Transformer Models for Optimal Performance

### Objectives:
1. Fine-tune the model from Chapter 2 for additional epochs
2. Determine the optimal number of epochs
3. Compute and compare performance metrics
4. Analyze the impact of fine-tuning for one extra epoch
5. Perform error analysis
6. Attempt to trick the model with a made-up tweet

Fine-tune the model in Chapter 2 in Natural Language Processing with Transformers for more epochs. What number of epochs is optimal? Compute the same performance metrics as the chapter. Does fine-tuning for one extra epoch improve model performance? Discuss why / why not. Perform an error analysis and use the information from it to see if you can trick your model into making an obvious wrong prediction about a made-up tweet. Attach your Jupyter notebook as an appendix

### Load Dataset, Model, and Performance Metrics


In [None]:
# Pip install libraries:
!pip install transformers
!pip install datasets
!pip install torch

Collecting datasets
  Downloading datasets-3.0.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.0-py3-none-any.whl (474 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.3/474.3 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[2K 

In [None]:
# Load the dataset:
from datasets import load_dataset
dataset = load_dataset("emotion", revision="main") # TODO: Remove the revision part - Try loading the main branch of the dataset

In [None]:
# Load the model and dataset:
from transformers import AutoModelForSequenceClassification
import torch
from transformers import AutoTokenizer
model_ckpt = "distilbert-base-uncased"
num_labels = len(dataset["train"].features["label"].names) # 6 labels expressing 6 emotions

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=num_labels).to(device)


# Tokenize the dataset:
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
def tokenize(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)


emotions_encoded = dataset.map(tokenize, batched=True, batch_size=None)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



Map:   0%|          | 0/16000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [None]:
## Compute the same performance metrics as the chapter
from sklearn.metrics import accuracy_score, f1_score

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    f1 = f1_score(labels, preds, average="weighted")
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc, "f1": f1}

### Create Trainer
#### Create Training Arguments for base comparison

I start by creating training arguments for a base comparison similar to the chapter, which I will use to compare the performance of the model after fine-tuning for additional epochs.


In [None]:
from transformers import TrainingArguments, Trainer

num_epochs = 10 # Arbitrarily chosen a relatively high number.
batch_size = 64
logging_steps = len(emotions_encoded["train"]) // batch_size
model_name = f"{model_ckpt}-finetuned-emotion"

training_args_base = TrainingArguments(
    output_dir=model_name,
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    num_train_epochs=num_epochs,
    disable_tqdm=False,
    logging_steps=logging_steps,
    push_to_hub=False,
)


# Create a Trainer
trainer = Trainer(
    model=model,
    args=training_args_base,
    train_dataset=emotions_encoded["train"],
    eval_dataset=emotions_encoded["validation"],
    compute_metrics=compute_metrics,

)

# Save the results:


training = trainer.train()

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 16,000
  Num Epochs = 10
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 2,500
  Number of trainable parameters = 66,958,086


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.1313,0.157411,0.9355,0.93593
2,0.0897,0.159653,0.9375,0.936769
3,0.0818,0.149566,0.9395,0.940102
4,0.068,0.170692,0.9365,0.936612
5,0.0533,0.184219,0.9365,0.93632
6,0.043,0.20198,0.9365,0.93628
7,0.0325,0.21725,0.936,0.935936
8,0.0279,0.226165,0.9355,0.93532
9,0.0207,0.223781,0.939,0.939222
10,0.0188,0.229085,0.9395,0.939468


The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 2000
  Batch size = 64
Saving model checkpoint to distilbert-base-uncased-finetuned-emotion/checkpoint-500
Configuration saved in distilbert-base-uncased-finetuned-emotion/checkpoint-500/config.json
Model weights saved in distilbert-base-uncased-finetuned-emotion/checkpoint-500/model.safetensors
The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 2000
  Batch size = 64
The following columns in the evaluation se

The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 2000
  Batch size = 64


{'eval_loss': 0.22908547520637512, 'eval_accuracy': 0.9395, 'eval_f1': 0.939467671106937, 'eval_runtime': 4.2015, 'eval_samples_per_second': 476.023, 'eval_steps_per_second': 7.616, 'epoch': 10.0}


### Perform error anlysis
We perform error analysis by sorting all the validation samples by the model loss. In the code below we compute the losses and map the losses to the predicted and true (ground truth) labels in a dataframe.  



In [None]:
from torch.nn.functional import cross_entropy

def forward_pass_with_label(batch):
  inputs = {k:v.to(device) for k,v in batch.items() if k in tokenizer.model_input_names}
  with torch.no_grad():
    output = model(**inputs)
    pred_label = torch.argmax(output.logits, axis=-1)
    loss = cross_entropy(output.logits, batch["label"].to(device), reduction="none")

  # Place outputs on CPU for compatibility with other dataset columns
  return {"loss": loss.cpu().numpy(), "predicted_label": pred_label.cpu().numpy()}

def label_int2str(row):
 return dataset["train"].features["label"].int2str(row)

In [None]:
# Convert our dataset back to PyTorch tensors
emotions_encoded.set_format("torch",
 columns=["input_ids", "attention_mask", "label"])
# Compute loss values
emotions_encoded["validation"] = emotions_encoded["validation"].map(
 forward_pass_with_label, batched=True, batch_size=16)

# Create pd DataFrame
emotions_encoded.set_format("pandas")
cols = ["text", "label", "predicted_label", "loss"]
df_test = emotions_encoded["validation"][:][cols]
df_test["label"] = df_test["label"].apply(label_int2str)
df_test["predicted_label"] = (df_test["predicted_label"]
 .apply(label_int2str))

# Displaying 10 samples with highest losses
df_test.sort_values("loss", ascending=False).head(10)


Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Unnamed: 0,text,label,predicted_label,loss
882,i feel badly about reneging on my commitment t...,love,sadness,9.507489
1950,i as representative of everything thats wrong ...,surprise,sadness,9.414063
1111,im lazy my characters fall into categories of ...,joy,fear,8.743126
1963,i called myself pro life and voted for perry w...,joy,sadness,8.490437
1801,i feel that he was being overshadowed by the s...,love,sadness,8.328728
1919,i should admit when consuming alcohol myself i...,fear,sadness,8.101649
1672,i feel that being faithful isnt enough in your...,love,joy,7.987519
318,i felt ashamed of these feelings and was scare...,fear,sadness,7.735647
1658,i said before i feel like a hypocrite advocati...,love,joy,7.124881
1509,i guess this is a memoir so it feels like that...,joy,fear,7.063156


In [25]:
# displaying 10 samples with lowest loss:
df_test.sort_values("loss", ascending=True).head(10)

Unnamed: 0,text,label,predicted_label,loss
323,im starting to feel unwelcome in there,sadness,sadness,0.000129
1621,i feel so disturbed and unsettled that i m not...,sadness,sadness,0.00013
69,i have no extra money im worried all of the ti...,sadness,sadness,0.00013
558,i hope she leaves you and i hope you feel hear...,sadness,sadness,0.00013
1303,i feel pathetic and uninspired,sadness,sadness,0.000131
600,i learnt that expectations of people are not a...,sadness,sadness,0.000133
375,i mention that i feel really unwelcome,sadness,sadness,0.000133
1965,i started feeling pathetic and ashamed,sadness,sadness,0.000133
369,i just need a few minutes to feel put upon and...,sadness,sadness,0.000133
866,i feel quite jaded and unenthusiastic about li...,sadness,sadness,0.000134


### Create wrong prediction

In [36]:
# Start by pushing to the hub:
from google.colab import userdata
from huggingface_hub import login
import os
os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")
HF_TOKEN = userdata.get("HF_TOKEN")
login(token=HF_TOKEN)

trainer.push_to_hub(commit_message="Lets go - Initial Commit")

Saving model checkpoint to distilbert-base-uncased-finetuned-emotion
Configuration saved in distilbert-base-uncased-finetuned-emotion/config.json


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Model weights saved in distilbert-base-uncased-finetuned-emotion/model.safetensors
Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9395}, {'name': 'F1', 'type': 'f1', 'value': 0.939467671106937}]}
No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/Jonasbukhave/distilbert-base-uncased-finetuned-emotion/commit/df8cd9800926cfaa8be4488abd3473e1a6beb7dc', commit_message='Lets go - Initial Commit', commit_description='', oid='df8cd9800926cfaa8be4488abd3473e1a6beb7dc', pr_url=None, pr_revision=None, pr_num=None)

In [40]:
# Load in the model and use it to predict:
from transformers import pipeline
classifier = pipeline("text-classification", model="Jonasbukhave/distilbert-base-uncased-finetuned-emotion")

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Jonasbukhave--distilbert-base-uncased-finetuned-emotion/snapshots/df8cd9800926cfaa8be4488abd3473e1a6beb7dc/config.json
Model config DistilBertConfig {
  "_name_or_path": "Jonasbukhave/distilbert-base-uncased-finetuned-emotion",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "single_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropout

OSError: Can't load tokenizer for 'Jonasbukhave/distilbert-base-uncased-finetuned-emotion'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Jonasbukhave/distilbert-base-uncased-finetuned-emotion' is the correct path to a directory containing all relevant files for a DistilBertTokenizerFast tokenizer.

In [41]:
tweet = "Feels like everything is slipping away, and I’m not sure how to stop it." # Fear
pred = classifier(tweet, return_all_scores=True)
preds_df = pd.DataFrame(pred[0])
plt.bar(labels, 100 * preds_df["score"], color='C0')
plt.title(f'"{custom_tweet}"')
plt.ylabel("Class probability (%)")
plt.show()

NameError: name 'classifier' is not defined