## Intro
In this notebook's code we will fine-tune a transformer model based on a very small batch of data.

In [1]:
import torch
from transformers import AdamW, AutoTokenizer, AutoModelForSequenceClassification

# Load tokenizer and model
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

# Data
raw_data = [
    "There may be too many of them!",
    "You will not believe this",
    "Hey man! That is not cool at all."
]
batch = tokenizer(raw_data, padding=True, truncation=True, return_tensors="pt")

# Set the targets
batch["labels"] = torch.tensor([0, 1, 0])

# Now let us predict same examples before the update
with torch.no_grad():
    outputs = torch.nn.functional.softmax(model(**batch).logits, dim=-1)

print(outputs)

# Fine-tune
optimizer = AdamW(model.parameters())
loss = model(**batch).loss
loss.backward()
optimizer.step()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tensor([[0.5432, 0.4568],
        [0.5321, 0.4679],
        [0.5821, 0.4179]])




Now let us predict the same examples:

In [3]:
# Now let us predict the same examples
with torch.no_grad():
    outputs = torch.nn.functional.softmax(model(**batch).logits, dim=-1)

print(outputs)

tensor([[0.9155, 0.0845],
        [0.9383, 0.0617],
        [0.8877, 0.1123]])
