<a href="https://colab.research.google.com/github/SandeepKonduruFeb12/aiml/blob/master/gold/GoldAssignment1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Objective
Fine-tune a large language model (LLM) for sentiment analysis on movie reviews, and then evaluate its performance.



1.  **Task Definition**: The primary task is to perform **Sentiment Analysis** on movie reviews. This involves classifying each review into one of several sentiment categories, typically 'positive', 'negative', or 'neutral'.

2.  **Dataset Structure**: The dataset required for fine-tuning should adhere to a specific structure. It must contain at least two key fields:
    *   `review_text`: This field will hold the actual text content of the movie review.
    *   `sentiment_label`: This field will contain the corresponding sentiment assigned to the `review_text` (e.g., 'positive', 'negative', 'neutral').

3.  **Dataset Characteristics**: The dataset will be a relatively small collection, typically ranging between **200 and 500 entries**. It is expected to be provided in a common structured data format such as **CSV** or **JSONL** (JSON Lines), aligning with standard practices for model fine-tuning data. Below defining as python data structure with 6 entries.

In [1]:
print('Installing libraries...')
%pip install accelerate bitsandbytes transformers trl peft datasets scipy -qq
print('Libraries installed.')

Installing libraries...
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.1/59.1 MB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m518.9/518.9 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25hLibraries installed.




### pretrained LLM:
Choose an open-source LLM suitable for fine-tuning, such as 'distilbert-base-uncased', and load the pre-trained model and its corresponding tokenizer.


In [5]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# 1. Define the variable model_name
model_name = 'distilbert-base-uncased'

# 3. Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
print(f"Tokenizer for {model_name} loaded successfully.")

# 4. Load the pre-trained model for sequence classification with 3 labels
# The task definition specifies 'positive', 'negative', or 'neutral' sentiments.
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
print(f"Model {model_name} loaded successfully with 3 labels.")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Tokenizer for distilbert-base-uncased loaded successfully.
Model distilbert-base-uncased loaded successfully with 3 labels.


### dataset:
Format our small dataset of movie reviews into an appropriate structure for fine-tuning. This will involve tokenizing the 'review_text' and mapping 'sentiment_label' to numerical IDs, suitable for a text classification task. We'll ensure the input format is compatible with the chosen LLM's expectations (e.g., input IDs, attention mask, and labels).


In [7]:
from datasets import Dataset

# 1. Define a small Python dictionary or list of dictionaries to represent your dataset.
# This dataset contains movie reviews and their corresponding sentiment labels.
data = [
    {"review_text": "This movie was fantastic! I loved every minute of it.", "sentiment_label": "positive"},
    {"review_text": "Absolutely terrible, a waste of time and money.", "sentiment_label": "negative"},
    {"review_text": "It was okay, nothing special, but not bad either.", "sentiment_label": "neutral"},
    {"review_text": "The acting was superb, and the story was engaging.", "sentiment_label": "positive"},
    {"review_text": "I fell asleep halfway through, utterly boring.", "sentiment_label": "negative"},
    {"review_text": "Could have been better, but had some good moments.", "sentiment_label": "neutral"}
]

# 2. Convert this dataset into a Hugging Face `Dataset` object.
dataset = Dataset.from_list(data)

print("Dataset created successfully:")
print(dataset)
print("First entry:")
print(dataset[0])

Dataset created successfully:
Dataset({
    features: ['review_text', 'sentiment_label'],
    num_rows: 6
})
First entry:
{'review_text': 'This movie was fantastic! I loved every minute of it.', 'sentiment_label': 'positive'}


 create a mapping from sentiment labels to numerical IDs and update the model's configuration with these labels, as specified in the instructions. This is crucial for preparing the dataset for a text classification task.



In [8]:
import numpy as np

# 3. Create a mapping from your sentiment labels to numerical IDs.
# Ensure the model's configuration is updated with these labels.
label_names = sorted(list(set(dataset['sentiment_label'])))
label_to_id = {label: i for i, label in enumerate(label_names)}
id_to_label = {i: label for i, label in enumerate(label_names)}

# Update the model's configuration with the new label mapping
model.config.label2id = label_to_id
model.config.id2label = id_to_label

print(f"Label to ID mapping: {label_to_id}")
print(f"ID to Label mapping: {id_to_label}")
print("Model config updated with label mappings.")


Label to ID mapping: {'negative': 0, 'neutral': 1, 'positive': 2}
ID to Label mapping: {0: 'negative', 1: 'neutral', 2: 'positive'}
Model config updated with label mappings.


In [9]:
def preprocess_function(examples):
    # Tokenize the review_text
    tokenized_inputs = tokenizer(examples['review_text'], truncation=True, padding='max_length', max_length=128)

    # Map sentiment_label to numerical IDs
    tokenized_inputs['labels'] = [label_to_id[label] for label in examples['sentiment_label']]
    return tokenized_inputs

print("Preprocessing function defined.")

Preprocessing function defined.


In [10]:
tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=['review_text', 'sentiment_label'])

print("Tokenized dataset created successfully:")
print(tokenized_dataset)
print("First entry of tokenized dataset:")
print(tokenized_dataset[0])

Map:   0%|          | 0/6 [00:00<?, ? examples/s]

Tokenized dataset created successfully:
Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 6
})
First entry of tokenized dataset:
{'input_ids': [101, 2023, 3185, 2001, 10392, 999, 1045, 3866, 2296, 3371, 1997, 2009, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [11]:
from transformers import TrainingArguments, Trainer

print("Imported TrainingArguments and Trainer.")

Imported TrainingArguments and Trainer.


In [13]:
import os

# 2. Define an output directory
output_dir = './results'

# 3. Instantiate TrainingArguments
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    # evaluation_strategy='epoch',  # Removed due to TypeError: unexpected keyword argument
    logging_dir=output_dir,
    # logging_strategy='epoch',     # Removed due to TypeError: unexpected keyword argument
    push_to_hub=False
)

print(f"TrainingArguments instantiated: {training_args.to_dict()}")



## Train the Model
Initiate the fine-tuning process using the prepared dataset and training arguments. Monitor the training progress.


In [15]:
from sklearn.metrics import accuracy_score

# 2. Define a function `compute_metrics`
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    # Convert predictions to class IDs
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": accuracy_score(labels, predictions)}

print("compute_metrics function defined.")

compute_metrics function defined.


In [17]:
from transformers import DataCollatorWithPadding

# Create a data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# 3. Instantiate a Trainer object
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    processing_class=tokenizer, # Changed from tokenizer=tokenizer to processing_class=tokenizer
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer instantiated successfully.")

Trainer instantiated successfully.


In [18]:
print("Starting model training...")
trainer.train()
print("Model training completed.")

Starting model training...


  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice:

 2


[34m[1mwandb[0m: You chose 'Use an existing W&B account'
[34m[1mwandb[0m: Logging into https://api.wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: Find your API key here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mkondurusandeep33[0m ([33mkondurusandeep33-personal[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin




Step,Training Loss


Model training completed.


In [19]:
import os

# 2. Define an output directory
output_dir = './results'

# 3. Instantiate TrainingArguments
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    logging_dir=output_dir,
    push_to_hub=False,
    report_to=[] # Disable Weights & Biases reporting to avoid interactive prompts
)

print(f"TrainingArguments instantiated: {training_args.to_dict()}")



**Reasoning**:
Now that the `TrainingArguments` have been updated to disable Weights & Biases reporting, I need to re-instantiate the `Trainer` object with these updated arguments. This ensures that the subsequent training process will not be interrupted by interactive prompts.



In [20]:
from transformers import DataCollatorWithPadding

# Create a data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# 3. Instantiate a Trainer object
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    processing_class=tokenizer, # Changed from tokenizer=tokenizer to processing_class=tokenizer
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("Trainer instantiated successfully.")

Trainer instantiated successfully.


In [21]:
print("Starting model training...")
trainer.train()
print("Model training completed.")

Starting model training...




Step,Training Loss


Model training completed.


## Evaluate the Model

Evaluate the fine-tuned model on a test set to assess its sentiment analysis capabilities. This will involve generating predictions, computing relevant metrics (e.g., accuracy, precision, recall, F1-score), and presenting the results.


In [24]:
import numpy as np

# 1. Split the tokenized_dataset into a training set and a test set.
# The dataset is small (6 entries), so a test_size of 0.3 will result in 1-2 test samples.
# Let's use a smaller test_size or ensure a minimum number of samples for evaluation.
# Given the very small size, we'll try to split to have at least one test sample. A 0.2 split means 1 test sample.
# For demonstration purposes, we'll use a fixed split for now, but in a real scenario, more data would be needed.
# Since our dataset is tiny (6 samples), splitting into train/test might leave too few samples for either.
# For the purpose of demonstration and following the instruction, we'll split it, but note its limitations.

train_test_split_dataset = tokenized_dataset.train_test_split(test_size=0.2, seed=42)

train_dataset = train_test_split_dataset['train']
test_dataset = train_test_split_dataset['test']

print(f"Training dataset size: {len(train_dataset)}")
print(f"Test dataset size: {len(test_dataset)}")

Training dataset size: 4
Test dataset size: 2


In [25]:
predictions_output = trainer.predict(test_dataset)
predictions = predictions_output.predictions
labels = predictions_output.label_ids

print("Predictions and true labels generated successfully.")



Predictions and true labels generated successfully.


In [26]:
from sklearn.metrics import precision_score, recall_score, f1_score

# 3. Extract the predicted labels from the predictions object
# The predictions are logits, so we take argmax to get the predicted class ID.
predicted_labels = np.argmax(predictions, axis=1)

# 4. Compute evaluation metrics
accuracy = accuracy_score(labels, predicted_labels)
precision = precision_score(labels, predicted_labels, average='weighted')
recall = recall_score(labels, predicted_labels, average='weighted')
f1 = f1_score(labels, predicted_labels, average='weighted')

# 5. Print the computed metrics
print("\n--- Model Evaluation Results ---")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")


--- Model Evaluation Results ---
Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000


## Summary:

### Data Analysis Key Findings

*   The dataset was split into a training set comprising 4 samples and a test set containing 2 samples.
*   The model achieved perfect evaluation metrics on the small test set, with an Accuracy of 1.0000, Precision of 1.0000, Recall of 1.0000, and F1-Score of 1.0000.

### Insights or Next Steps

*   The perfect scores on the evaluation metrics indicate that the fine-tuned model performed flawlessly on the very small test set. However, due to the extremely limited size of the test data (only 2 samples), these results may not be representative of the model's true performance on a larger, more diverse dataset.
*   To accurately assess the model's performance and suitability for sentiment analysis, it is crucial to re-evaluate it with a significantly larger and more varied test set. This would provide a more robust and reliable indication of its generalization capabilities and practical utility.
