<a href="https://colab.research.google.com/github/dietmarja/LLM-Elements/blob/main/model_evaluation/evaluation_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Fine-tuning and evaluation of a pre-trained BERT model on a custom text classification dataset


In [2]:

#  Uninstallation/Installationof Dependencies:
import os
os.system("pip install -q transformers[torch] datasets evaluate")
os.system("pip uninstall -y pyarrow")
os.system("pip install pyarrow==14.0.1")
os.system("pip uninstall -y cudf-cu12 ibis-framework")
os.system("pip install cudf-cu12 ibis-framework")


import os
import pandas as pd
from datasets import Dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
from evaluate import load

In [3]:
# Load the pre-trained model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load your custom csv dataset
data = pd.read_csv("text_classification_data.csv")
print("Data loaded successfully:")
print(data.head())

# Convert the DataFrame to a Hugging Face Dataset
dataset = Dataset.from_pandas(data)
print("Dataset converted successfully")

# Preprocess the dataset
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True, max_length=128)

encoded_dataset = dataset.map(preprocess_function, batched=True)
print("Dataset preprocessed successfully")

# Define the evaluation metric
metric = load("accuracy", trust_remote_code=True)

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions.argmax(axis=1)
    return metric.compute(predictions=predictions, references=labels)

# Setup the training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_dir='./logs',
    logging_steps=10,
)

# Split the dataset into train and validation sets
encoded_dataset = encoded_dataset.train_test_split(test_size=0.2)
print("Dataset split into train and test sets")

# Create the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

# Train the model
print("Starting training...")
trainer.train()
print("Training completed")

# Evaluate the model
eval_result = trainer.evaluate()
print(f"Evaluation result: {eval_result}")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Data loaded successfully:
                                                text  label
0  This movie was amazing! I loved the plot and c...      1
1  The acting was superb, and the cinematography ...      1
2  I found the book to be quite boring and unenga...      0
3  The restaurant had excellent service and delic...      1
4     The software was buggy and crashed frequently.      0
Dataset converted successfully


Map:   0%|          | 0/10 [00:00<?, ? examples/s]

Dataset preprocessed successfully


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]



Dataset split into train and test sets
Starting training...


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.625457,0.5
2,No log,0.753126,0.5
3,No log,0.752591,0.5


Training completed


Evaluation result: {'eval_loss': 0.6254569888114929, 'eval_accuracy': 0.5, 'eval_runtime': 0.1906, 'eval_samples_per_second': 10.494, 'eval_steps_per_second': 5.247, 'epoch': 3.0}
