# Baseline Model Fine-tuning Demo

In [62]:
# we will start with a quick example of fine-tuning a model. There's no expectation for this demo to make sense entirely yet. It's to give a high level understanding of the process of fine-tuning a model, the steps involved, and the general flow of the code. It also will serve as a baseline performance we can iterate on by selecting different models, different hyperparameters, and different data augmentation strategies. Different models will be explored in the "Models" section of this notebook, and different hyperparameters and data augmentation strategies will be explored in the "Hyperparameters" section of this notebook.

# The steps include:
# 1. Load in the data
# 2. Load the tokenizer
# 2. Load in the pre-trained model
# 3. Define an evaluation metric
# 4. Train the model
# 5. Evaluate the model

In [58]:
%%capture
!pip install transformers
!pip install datasets
!pip install evaluate
!pip install torch
!pip install scikit-learn

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

In [59]:
import numpy as np
import evaluate
import torch
from datasets import load_from_disk
from transformers import AutoTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments

In [60]:
# %%capture

# 1. Load in the data
dataset = load_from_disk("data/yelp_review_tiny/")

# 2. Load the tokenizer and define the tokenization function
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

# 3. Tokenize our training and test data
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# 4. Load in the pre-trained model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=5)

# 5. Define an evaluation metric
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# 6. Set up the Trainer
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    compute_metrics=compute_metrics,
)

# 7. Train the model
trainer.train()

Loading cached processed dataset at /Users/d/repos/bert/data/yelp_review_tiny/train/cache-82a3bf31b6b0592c.arrow


Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

  0%|          | 0/39 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

{'eval_loss': 1.6107069253921509, 'eval_accuracy': 0.2, 'eval_runtime': 1.5442, 'eval_samples_per_second': 12.952, 'eval_steps_per_second': 1.943, 'epoch': 1.0}


  0%|          | 0/3 [00:00<?, ?it/s]

{'eval_loss': 1.5988649129867554, 'eval_accuracy': 0.2, 'eval_runtime': 1.5265, 'eval_samples_per_second': 13.102, 'eval_steps_per_second': 1.965, 'epoch': 2.0}


  0%|          | 0/3 [00:00<?, ?it/s]

{'eval_loss': 1.587171196937561, 'eval_accuracy': 0.25, 'eval_runtime': 1.5297, 'eval_samples_per_second': 13.074, 'eval_steps_per_second': 1.961, 'epoch': 3.0}
{'train_runtime': 93.2984, 'train_samples_per_second': 3.215, 'train_steps_per_second': 0.418, 'train_loss': 1.6029120225172777, 'epoch': 3.0}


TrainOutput(global_step=39, training_loss=1.6029120225172777, metrics={'train_runtime': 93.2984, 'train_samples_per_second': 3.215, 'train_steps_per_second': 0.418, 'train_loss': 1.6029120225172777, 'epoch': 3.0})

In [45]:
model.to('cpu')

def predict(text):
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    return predicted_class_id

review_5stars = "This is a great restaurant!"
review_4stars = "This was overall a good experience, above average restaurant"
review_3stars = "This was an alright restaurant"
review_2stars = "This is a pretty bad restaurant. I might try again but I was disappointed"
review_1stars = "This is a terrible restaurant!"

reviews = [review_5stars, review_4stars, review_3stars, review_2stars, review_1stars]

for review in reviews:
    print(f"Model predicted {predict(review)+1} stars for review: {review}")

Model predicted 5 stars for review: This is a great restaurant!
Model predicted 4 stars for review: This was overall a good experience, above average restaurant
Model predicted 4 stars for review: This was an alright restaurant
Model predicted 2 stars for review: This is a pretty bad restaurant. I might try again but I was disappointed
Model predicted 5 stars for review: This is a terrible restaurant!


# Don't run this!! But see that simply adding more data, same code, the accuracy improves

In [18]:
# %%capture

# 1. Load in the data
dataset = load_from_disk("data/yelp_review_small/")

# 2. Load the tokenizer and define the tokenization function
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

# 3. Tokenize our training and test data
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# 4. Load in the pre-trained model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=5)

# 5. Define an evaluation metric
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# 6. Set up the Trainer
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    compute_metrics=compute_metrics,
)

# 7. Train the model
trainer.train()

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_transform.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.bias', 'pre_classifi

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]



  0%|          | 0/300 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

{'eval_loss': 1.0930134057998657, 'eval_accuracy': 0.525, 'eval_runtime': 15.3889, 'eval_samples_per_second': 12.996, 'eval_steps_per_second': 1.625, 'epoch': 1.0}


  0%|          | 0/25 [00:00<?, ?it/s]

{'eval_loss': 1.0063376426696777, 'eval_accuracy': 0.58, 'eval_runtime': 15.3839, 'eval_samples_per_second': 13.001, 'eval_steps_per_second': 1.625, 'epoch': 2.0}


  0%|          | 0/25 [00:00<?, ?it/s]

{'eval_loss': 1.015762448310852, 'eval_accuracy': 0.595, 'eval_runtime': 15.3847, 'eval_samples_per_second': 13.0, 'eval_steps_per_second': 1.625, 'epoch': 3.0}
{'train_runtime': 745.9547, 'train_samples_per_second': 3.217, 'train_steps_per_second': 0.402, 'train_loss': 0.9802055867513021, 'epoch': 3.0}


TrainOutput(global_step=300, training_loss=0.9802055867513021, metrics={'train_runtime': 745.9547, 'train_samples_per_second': 3.217, 'train_steps_per_second': 0.402, 'train_loss': 0.9802055867513021, 'epoch': 3.0})

In [42]:
model.to('cpu')

def predict(text):
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    return predicted_class_id

review_5stars = "This is a great restaurant!"
review_4stars = "This was overall a good experience, above average restaurant"
review_3stars = "This was an alright restaurant"
review_2stars = "This is a pretty bad restaurant. I might try again but I was disappointed"
review_1stars = "This is a terrible restaurant!"

reviews = [review_5stars, review_4stars, review_3stars, review_2stars, review_1stars]

for review in reviews:
    print(f"Model predicted class {predict(review)} for review: {review}")

Model predicted class 4 for review: This is a great restaurant!
Model predicted class 3 for review: This was overall a good experience, above average restaurant
Model predicted class 2 for review: This was an alright restaurant
Model predicted class 1 for review: This is a pretty bad restaurant. I might try again but I was disappointed
Model predicted class 0 for review: This is a terrible restaurant!
