In [1]:
!pip install transformers



Load the pre-trained BERT model and tokenizer

In [2]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification

model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2) # Binary classification

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Prepare and tokenize the input sentences

In [3]:
sentences = ["This is a positive sentence.", "This is a negative sentence."]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

Perform forward pass through the model and get the log it

In [4]:
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

Calculate the probabilities for each class using the softmax function and get the class predictions

In [5]:
probabilities = torch.softmax(logits, dim=-1)
predictions = torch.argmax(probabilities, dim=-1)

Print the results

In [8]:
for sentence, prediction in zip(sentences, predictions):
    print(f"Sentence: {sentence}")
    print(f"Prediction: {'Positive' if prediction == 1 else 'Negative'}\n")

Sentence: This is a positive sentence.
Prediction: Negative

Sentence: This is a negative sentence.
Prediction: Negative



This might output the wrong results, now we need to fine-tune the model on our dataset.

In [9]:
from datasets import load_dataset
dataset = load_dataset("imdb")

Downloading builder script:   0%|          | 0.00/4.31k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.59k [00:00<?, ?B/s]

Downloading and preparing dataset imdb/plain_text to /Users/dougwoodrow/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0...


Downloading data:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Dataset imdb downloaded and prepared to /Users/dougwoodrow/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Tokenizing the data with the Hugging Face `Trainer`

In [10]:
from transformers import BertTokenizerFast

tokenizer = BertTokenizerFast.from_pretrained(model_name)
train_dataset = dataset["train"]

def tokenize(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)

train_dataset = train_dataset.map(tokenize, batched=True, batch_size=len(train_dataset))
train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Doing the fine-tuning

In [11]:
from transformers import Trainer, TrainingArguments, BertForSequenceClassification

training_args = TrainingArguments(
    output_dir="output",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    logging_dir="logs",
)

model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

trainer.train()

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Step,Training Loss


KeyboardInterrupt: 

Save dat model...

In [None]:
trainer.save_model("fine-tuned-bert")

Now let's try predicting again...

In [None]:
import torch

tokenizer = BertTokenizerFast.from_pretrained("fine-tuned-bert")
model = BertForSequenceClassification.from_pretrained("fine-tuned-bert")

sentences = ["This is a positive sentence.", "This is a negative sentence."]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

probabilities = torch.softmax(logits, dim=-1)
predictions = torch.argmax(probabilities, dim=-1)

for sentence, prediction in zip(sentences, predictions):
    print(f"Sentence: {sentence}")
    print(f"Prediction: {'Positive' if prediction == 1 else 'Negative'}\n")