<a href="https://colab.research.google.com/github/SwapnilGanguly/GenAI_Project_6thSem/blob/main/IMDb_Sentiment_Analysis_Finetuned.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install transformers datasets



In [3]:
import torch
from datasets import load_dataset
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments


In [4]:
# Example: IMDb movie reviews dataset
dataset = load_dataset("imdb")
dataset = dataset.shuffle(seed=42)  # shuffle for training


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [5]:
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
def tokenize(batch):
    return tokenizer(batch["text"], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(tokenize, batched=True)
tokenized_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])


In [7]:
!pip install --upgrade transformers datasets



In [8]:
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    report_to = 'none'
)


In [9]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"].select(range(2000)),  # use subset for quick training
    eval_dataset=tokenized_dataset["test"].select(range(500)),
)


In [10]:
trainer.train()


Epoch,Training Loss,Validation Loss
1,No log,0.395659
2,0.339300,0.414076


Epoch,Training Loss,Validation Loss
1,No log,0.395659
2,0.339300,0.414076
3,0.339300,0.410811


TrainOutput(global_step=750, training_loss=0.26527942657470704, metrics={'train_runtime': 657.0349, 'train_samples_per_second': 9.132, 'train_steps_per_second': 1.141, 'total_flos': 1578666332160000.0, 'train_loss': 0.26527942657470704, 'epoch': 3.0})

In [11]:
trainer.evaluate()


{'eval_loss': 0.41081055998802185,
 'eval_runtime': 14.6639,
 'eval_samples_per_second': 34.097,
 'eval_steps_per_second': 4.296,
 'epoch': 3.0}

In [None]:
model.save_pretrained("finetuned-bert-imdb")
tokenizer.save_pretrained("finetuned-bert-imdb")


In [22]:
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
import gradio as gr

# Load your fine-tuned model and tokenizer
model_path = "finetuned-bert-imdb"
tokenizer = BertTokenizer.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)

# Create pipeline for inference
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

# Define prediction function for Gradio
def predict_sentiment(text):
    label_map = {"LABEL_0": "NEGATIVE", "LABEL_1": "POSITIVE"}  # adjust based on how your labels are assigned

    try:
        result = classifier(text)[0]
        label = label_map.get(result['label'], result['label'])  # convert to friendly label
        score = round(result['score'] * 100, 2)
        return f"{label} ({score}%)"
    except Exception as e:
        return f"Error: {str(e)}"


# Build Gradio UI
interface = gr.Interface(fn=predict_sentiment, inputs="text", outputs="text", title="🎬 Movie Review Sentiment Classifier", description="Enter a movie review and get its sentiment prediction.")

# Launch the app
interface.launch(share=True)


Device set to use cuda:0


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://3445d8029aff7758e3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


