In [5]:
my_variable = "Hello World"
print(my_variable)


Hello World


In [3]:
print(my_variable)

Hello World


Basic testing using already existing model

In [17]:
from transformers import pipeline

# Load a pre-trained sentiment analysis model
classifier = pipeline("sentiment-analysis")

# Test with a review
review = "The movie was not ok."
result = classifier(review)

print(result)  


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'NEGATIVE', 'score': 0.9997368454933167}]


In [5]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "bert-base-uncased"  # Pre-trained BERT model
num_labels = 3  # Positive, Neutral, Negative

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
# Define your dataset (movie reviews + labels)
movie_reviews = [
    "This movie was absolutely wonderful.",  # Positive
    "I really loved the film!",  # Positive
    "The movie was alright.",  # Neutral
    "I didn't enjoy the film that much.",  # Negative
    "This movie was not to my taste."  # Negative
]

labels = [2, 2, 1, 0, 0]  # Map labels: Negative=0, Neutral=1, Positive=2

# Tokenize the dataset
inputs = tokenizer(movie_reviews, padding=True, truncation=True, return_tensors="pt")


In [7]:
from transformers import Trainer, TrainingArguments
import torch
from torch.utils.data import Dataset

class MovieReviewDataset(Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item["labels"] = torch.tensor(self.labels[idx])
        return item

dataset = MovieReviewDataset(inputs, labels)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Train the model
trainer.train()


100%|██████████| 6/6 [00:55<00:00,  9.19s/it]


{'train_runtime': 55.0357, 'train_samples_per_second': 0.273, 'train_steps_per_second': 0.109, 'train_loss': 1.0507909456888835, 'epoch': 3.0}


TrainOutput(global_step=6, training_loss=1.0507909456888835, metrics={'train_runtime': 55.0357, 'train_samples_per_second': 0.273, 'train_steps_per_second': 0.109, 'total_flos': 92500810920.0, 'train_loss': 1.0507909456888835, 'epoch': 3.0})

The following does not appear to work as desired as the model seems to always predict postive, more tinkering is required.  It can predict not positive options with drastic inputs like "bad" or direct inputs from the training like "I didn't enjoy the film that much".  Not sure if this is an issue with set up or evidence that more training is needed.  

Upon further testing the model appears to be very different every time with different results being favoured.  Could suggest it just requires further training.

In [13]:
new_review = ["I really loved the film!"]

# Tokenize new review
new_inputs = tokenizer(new_review, padding=True, truncation=True, return_tensors="pt")

# Predict
outputs = model(**new_inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()

# Label mapping
label_map = {0: "Negative", 1: "Neutral", 2: "Positive"}
print(f"Predicted Sentiment: {label_map[predicted_class]}")


Predicted Sentiment: Negative
