# Week 3 exercises

## Part 1 dataset preparation

In [None]:
!pip install -q scikit-learn
!pip install -q evaluate
!pip install -q datasets

In [None]:
# Load some modules
import pandas as pd
import numpy as np
import torch
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer, AutoModelForSequenceClassification

step 1 and 2. Load imdb dataset. Exercise Zoom session said also HF imdb dataset can be used. HF dataset is already in the right format, so the train and test parts are just concatenated vertically

In [None]:
# Download imdb from Kaggle
!pip install kaggle
!kaggle datasets download lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Dataset URL: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
License(s): other
imdb-dataset-of-50k-movie-reviews.zip: Skipping, found more recently modified local copy (use --force to force download)


In [None]:
# Unzip
!unzip imdb-dataset-of-50k-movie-reviews.zip

Archive:  imdb-dataset-of-50k-movie-reviews.zip
replace IMDB Dataset.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: n


In [None]:
# load to DataFrame
df_imdb = pd.read_csv('IMDB Dataset.csv')

# Rename sentiment feature to label
df_imdb = df_imdb.rename(columns={"sentiment": "label"})

# Map labels to 0 and 1
df_imdb["label"] = df_imdb["label"].map({"positive": 1, "negative": 0})

df_imdb.head()

Unnamed: 0,review,label
0,One of the other reviewers has mentioned that ...,1
1,A wonderful little production. <br /><br />The...,1
2,I thought this was a wonderful way to spend ti...,1
3,Basically there's a family where a little boy ...,0
4,"Petter Mattei's ""Love in the Time of Money"" is...",1


In [None]:
# Split to train and test sets
df_imdb_train, df_imdb_test = train_test_split(df_imdb, test_size=0.2, random_state=42)

Step 3. Model selection and tokenization

In [None]:
# Load the distilBERT
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
# Tokenize the training dataset
train_encodings = tokenizer(
    list(df_imdb_train["review"]),
    truncation=True,
    padding=True,
    max_length=256,
    return_tensors="pt"
)

# Tokenize the test dataset
test_encodings = tokenizer(
    list(df_imdb_test["review"]),
    truncation=True,
    padding=True,
    max_length=256,
    return_tensors="pt"
)

# And prepare the dataset
from torch.utils.data import Dataset

class SentimentDataset(Dataset):
  # For HuggingFace format
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return {
            "input_ids": self.encodings["input_ids"][idx],
            "attention_mask": self.encodings["attention_mask"][idx],
            "labels": self.labels[idx]
        }

# Create datasets for train and test
train_dataset = SentimentDataset(train_encodings, df_imdb_train["label"].tolist())
test_dataset = SentimentDataset(test_encodings, df_imdb_test["label"].tolist())  # Create test dataset

Step 4. Finetune the model

In [None]:
from transformers import Trainer, TrainingArguments
from evaluate import load

# Evaluator
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)  # Get predicted class indices

    accuracy_metric = load("accuracy")
    precision_metric = load("precision")
    recall_metric = load("recall")
    f1_metric = load("f1")

    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    precision = precision_metric.compute(predictions=predictions, references=labels, average="weighted")
    recall = recall_metric.compute(predictions=predictions, references=labels, average="weighted")
    f1 = f1_metric.compute(predictions=predictions, references=labels, average="weighted")


    return {
        "accuracy": accuracy["accuracy"],
        "precision": precision["precision"],
        "recall": recall["recall"],
        "f1": f1["f1"],
    }

# Define training arguments
training_args = TrainingArguments(
    learning_rate=5e-5, # requested
    output_dir="./wk3ex_bert_imdb_sentiment",
    evaluation_strategy="epoch", # requested
    save_strategy="epoch",
    num_train_epochs=2, # Two epochs requested
    per_device_train_batch_size=16, # requested 16 or 32
    per_device_eval_batch_size=16,
    logging_dir="./logs",
    logging_steps=10,
    report_to=["none"]  # Proper way to disable reporting tools like W&B
)

# Define Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
)

trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2405,0.239236,0.9093,0.910731,0.9093,0.909244
2,0.1183,0.280448,0.9201,0.920116,0.9201,0.920101


TrainOutput(global_step=5000, training_loss=0.20962510048747063, metrics={'train_runtime': 2002.8288, 'train_samples_per_second': 39.944, 'train_steps_per_second': 2.496, 'total_flos': 5298695946240000.0, 'train_loss': 0.20962510048747063, 'epoch': 2.0})

Step 5. Save locally and upload to HF

In [None]:
# Save locally
trainer.save_model("./wk3ex_bert_imdb_sentiment")

In [None]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

# login to HF
from huggingface_hub import login

login(HF_TOKEN)

trainer.push_to_hub("wk3ex_bert_imdb_sentiment")


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/Kelmeilia/wk3ex_bert_imdb_sentiment/commit/945aacc8695556855e016656b4fe1d029cea57bf', commit_message='wk3ex_bert_imdb_sentiment', commit_description='', oid='945aacc8695556855e016656b4fe1d029cea57bf', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Kelmeilia/wk3ex_bert_imdb_sentiment', endpoint='https://huggingface.co', repo_type='model', repo_id='Kelmeilia/wk3ex_bert_imdb_sentiment'), pr_revision=None, pr_num=None)

In [None]:
tokenizer.push_to_hub("wk3ex_bert_imdb_sentiment")

README.md:   0%|          | 0.00/1.75k [00:00<?, ?B/s]

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/Kelmeilia/wk3ex_bert_imdb_sentiment/commit/945aacc8695556855e016656b4fe1d029cea57bf', commit_message='Upload tokenizer', commit_description='', oid='945aacc8695556855e016656b4fe1d029cea57bf', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Kelmeilia/wk3ex_bert_imdb_sentiment', endpoint='https://huggingface.co', repo_type='model', repo_id='Kelmeilia/wk3ex_bert_imdb_sentiment'), pr_revision=None, pr_num=None)

Test the model

In [None]:
# test the model
from transformers import pipeline

# My model
model_name = "kelmeilia/wk3ex_bert_imdb_sentiment"

classifier = pipeline("text-classification", model=model_name)

# Make predictions
predictions = classifier("This movie was absolutely fantastic!")
print(predictions)

predictions = classifier(["This movie was terrible.", "I loved it!"]) #Batch prediction
print(predictions)

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'LABEL_1', 'score': 0.9977992177009583}]
[{'label': 'LABEL_0', 'score': 0.9984985589981079}, {'label': 'LABEL_1', 'score': 0.9980068802833557}]


The model should now be publicly available at HuggingFace.

Link: https://huggingface.co/Kelmeilia/wk3ex_bert_imdb_sentiment

Step 6. Did the backend, see repo at the end (Flask)

Step 7. Accessed the own model and used the Groq. Can be seen in the repo at the end. Groq token and HF token must be in the environment variables.

Step 8. Testing the backend.

Windows does not have curl, but powershell enables
```
$body = @{
    text  = "value"
    model = "llama"
} | ConvertTo-Json<br>

Invoke-WebRequest -Uri "http://127.0.0.1:5000/analyze/" -Method Post -Body $body -ContentType "application/json"
```
Which returns:
```
StatusCode        : 200
StatusDescription : OK
Content           : {
                      "confidence": 0.5,
                      "sentiment": "positive"
                    }

RawContent        : HTTP/1.1 200 OK
                    Connection: close
                    Content-Length: 51
                    Content-Type: application/json
                    Date: Fri, 31 Jan 2025 18:36:58 GMT
                    Server: Werkzeug/3.1.3 Python/3.10.11

                    {
                      "confidence": 0.5,
                      "sentimen...
Forms             : {}
Headers           : {[Connection, close], [Content-Length, 51], [Content-Type, application/json], [Date, Fri, 31 Jan
                    2025 18:36:58 GMT]...}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        : System.__ComObject
RawContentLength  : 51
```

and using vscodes RESTFUL extension:
>POST http://127.0.0.1:5000/analyze/ HTTP/1.1
>Content-Type: application/json
>
>{
>    "text" : "It was a hilarious comedy.",
>    "model" : "custom"
>}

returns:
```
HTTP/1.1 200 OK
Server: Werkzeug/3.1.3 Python/3.10.11
Date: Fri, 31 Jan 2025 18:40:20 GMT
Content-Type: application/json
Content-Length: 65
Connection: close

{
  "confidence": 0.995728075504303,
  "sentiment": "positive"
}
```

Testing with python requests are in the test_groq.py


Step 9. Define LLaMa prompt. I ended up after some experimentation with this:
```
{
    "role" : "user",
    "content" : f"Estimate the following text and answer in just one word, 'positive', \
    or 'negative' if the sentiment of the text is of positive or negative sentiment.\n\
    text: {text}"
}
```

It seems to perform overly well. To be strict, the model usually responds "Positive.", so I check that if the output.lower() has "positive"/"negative" in it.

Step 10. Testing done

Step 11. React done, in github

Step 12. Sent the full stack to GitHub:
https://github.com/SakuOrdrTab/week3_exercises

Step 13. The final adversary - youtube video. But here it is in it's all glory:
https://www.youtube.com/watch?v=BHlUUc5A-Is