# IMDB Movie Reviews Sentiment Classification

This notebook finetunes and compares multiple transformer models on the IMDB movie reviews dataset. The best model is then further finetuned on the full dataset, and inference is run on random test samples.

## Plan

1. **Setup & Imports**
   - Install and import required libraries (transformers, datasets, sklearn, torch, etc.)

2. **Data Loading & Preprocessing**
   - Download and load the IMDB dataset
   - Preprocess the data (tokenization, train/val/test split, label encoding)

3. **Custom F1 Score Function**
   - Implement a custom F1 score function for evaluation

4. **Model Selection**
   - Select 5 different encoder models (from both old and new models)

5. **Finetuning on Subset**
   - For each model:
     - Finetune on a subset of the training data
     - Evaluate on the validation set using the custom F1 score

6. **Model Comparison**
   - Compare the F1 scores of all models
   - Select the best performing model

7. **Finetuning Best Model on Full Dataset**
   - Finetune the best model on the entire training set
   - Evaluate on the test set

8. **Inference on Random Samples**
   - Randomly sample 10 reviews from the test set
   - Run inference using the best model
   - Display the reviews, predicted labels, and true labels

9. **Conclusion**
   - Summarize findings and observations

In [2]:
import os
import random
import numpy as np
import pandas as pd
import torch
from datasets import load_dataset
from transformers import (AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments)
from sklearn.metrics import f1_score
from tqdm import tqdm
import matplotlib.pyplot as plt

# Check if GPU is available
print('CUDA available:', torch.cuda.is_available())

CUDA available: True


In [3]:
# Load IMDB dataset from Kaggle using kagglehub
import kagglehub
import pandas as pd

# Download the dataset
path = kagglehub.dataset_download('lakshmi25npathi/imdb-dataset-of-50k-movie-reviews')

# The CSV file is inside the downloaded directory
csv_path = os.path.join(path, 'IMDB Dataset.csv')
imdb = pd.read_csv(csv_path)
print(imdb.head())
print(imdb['sentiment'].value_counts())

                                              review sentiment
0  One of the other reviewers has mentioned that ...  positive
1  A wonderful little production. <br /><br />The...  positive
2  I thought this was a wonderful way to spend ti...  positive
3  Basically there's a family where a little boy ...  negative
4  Petter Mattei's "Love in the Time of Money" is...  positive
sentiment
positive    25000
negative    25000
Name: count, dtype: int64


In [4]:
# Split the Kaggle IMDB dataset using sklearn's train_test_split and convert to Hugging Face Datasets
from sklearn.model_selection import train_test_split
from datasets import Dataset

# Encode sentiment labels to integers
imdb['label'] = imdb['sentiment'].map({'positive': 1, 'negative': 0})

# Split into train and test (90% train, 10% test)
train_df, test_df = train_test_split(imdb, test_size=0.1, random_state=42, stratify=imdb['label'])

# Further split train into train/val (90% train, 10% val of the original train)
train_df, val_df = train_test_split(train_df, test_size=0.1, random_state=42, stratify=train_df['label'])

# Convert pandas DataFrames to Hugging Face Datasets
train_dataset = Dataset.from_pandas(train_df.reset_index(drop=True))
val_dataset = Dataset.from_pandas(val_df.reset_index(drop=True))
test_dataset = Dataset.from_pandas(test_df.reset_index(drop=True))

print('Train size:', len(train_dataset))
print('Validation size:', len(val_dataset))
print('Test size:', len(test_dataset))

# Example sample
print('Sample review:', train_dataset[0]['review'])
print('Label:', train_dataset[0]['label'])

Train size: 40500
Validation size: 4500
Test size: 5000
Sample review: Lately I have been watching a lot of Tom Hanks films and old Chaplin films and even some of Rowan Atkinson's early Bean performances, and it seems that all of them have their own unique charm that permeates throughout their work, something that allows them to identify with audience members of all ages, in a way that just makes you feel good. A Bug's Life has that same charm, it has a connection with real life that allows us to easily suspend disbelief and accept a lot of talking insects, because even though they talk, they still ACT just like real bugs. It's like the team that made the movie found a way to bring us into the mind of a child and allow us to think like them, to imagine bugs the way a young mind does.<br /><br />Honey, I Shrunk The Kids was one of my favorite films when I was younger, and to me, A Bug's Life is like a more realistic version of that movie, if only because the animation is so breathtaking

In [5]:
# Custom F1 score function for HuggingFace Trainer
from sklearn.metrics import f1_score

def compute_f1(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    f1 = f1_score(labels, preds, average='weighted')
    return {'f1': f1}

In [9]:
# Define model names (old and new)
model_names = {
    'bert': 'google-bert/bert-base-uncased',
    'roberta': 'FacebookAI/roberta-base',
    'deberta': 'microsoft/deberta-v3-base',  
    'electra': 'google/electra-small-discriminator',
    'distilbert': 'distilbert/distilbert-base-uncased',
    # Newer models
    'modernbert': 'answerdotai/ModernBERT-base',
    'ettin': 'jhu-clsp/ettin-encoder-17m',
    'gte': 'thenlper/gte-small'
}

def preprocess_function(examples, tokenizer, max_length=512):
    return tokenizer(examples['review'], truncation=True, padding='max_length', max_length=max_length)

In [None]:
# Finetune and evaluate each model on a subset of the data
from transformers import EarlyStoppingCallback

results = {}
subset_size = 2000  # Use a small subset for quick comparison

model_ckpt = model_names['bert']
print(f'\n===== Finetuning {model_ckpt} =====')
# Use use_fast=False to avoid SentencePiece conversion errors
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=2)

# Tokenize subset
train_subset = train_dataset.select(range(subset_size))
val_subset = val_dataset.select(range(500))
tokenized_train = train_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)
tokenized_val = val_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)

# Set format for PyTorch
tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_val.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

training_args = TrainingArguments(
    output_dir=f'./results/{model_ckpt.replace("/", "_")}',
    num_train_epochs=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    logging_steps=50,
    learning_rate=2e-5,
    weight_decay=0.01,
    remove_unused_columns=False,
    load_best_model_at_end=True,
    eval_strategy='steps',   # <-- FIX: Set both strategies to 'steps'
    save_strategy='steps',         # <-- FIX: Set both strategies to 'steps'
    eval_steps=50,                 # <-- Evaluate every 50 steps
    save_steps=50,                 # <-- Save every 50 steps
    report_to=None,
    seed=42
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    compute_metrics=compute_f1,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=1)]
)

trainer.train()
eval_result = trainer.evaluate()
results[model_ckpt] = eval_result['eval_f1']
print(f'F1 score for {model_ckpt}:', eval_result['eval_f1'])


===== Finetuning google-bert/bert-base-uncased =====


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|██████████| 2000/2000 [00:01<00:00, 1803.48 examples/s]
Map: 100%|██████████| 500/500 [00:00<00:00, 1968.38 examples/s]


Step,Training Loss,Validation Loss,F1
50,0.6989,0.652537,0.479534
100,0.4952,0.331605,0.874291
150,0.3235,0.335131,0.870295


F1 score for google-bert/bert-base-uncased: 0.8742905263157895


In [8]:
model_ckpt = model_names['roberta']
print(f'\n===== Finetuning {model_ckpt} =====')
# Use use_fast=False to avoid SentencePiece conversion errors
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=2)

# Tokenize subset
train_subset = train_dataset.select(range(subset_size))
val_subset = val_dataset.select(range(500))
tokenized_train = train_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)
tokenized_val = val_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)

# Set format for PyTorch
tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_val.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

training_args = TrainingArguments(
    output_dir=f'./results/{model_ckpt.replace("/", "_")}',
    num_train_epochs=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    logging_steps=50,
    learning_rate=2e-5,
    weight_decay=0.01,
    remove_unused_columns=False,
    load_best_model_at_end=True,
    eval_strategy='steps',   # <-- FIX: Set both strategies to 'steps'
    save_strategy='steps',         # <-- FIX: Set both strategies to 'steps'
    eval_steps=50,                 # <-- Evaluate every 50 steps
    save_steps=50,                 # <-- Save every 50 steps
    report_to=None,
    seed=42
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    compute_metrics=compute_f1,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=1)]
)

trainer.train()
eval_result = trainer.evaluate()
results[model_ckpt] = eval_result['eval_f1']
print(f'F1 score for {model_ckpt}:', eval_result['eval_f1'])


===== Finetuning FacebookAI/roberta-base =====


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at FacebookAI/roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|██████████| 2000/2000 [00:00<00:00, 2335.69 examples/s]
Map: 100%|██████████| 500/500 [00:00<00:00, 2637.79 examples/s]


Step,Training Loss,Validation Loss,F1
50,0.6476,0.400483,0.853822
100,0.3206,0.356182,0.899213
150,0.3279,0.441154,0.900217


F1 score for FacebookAI/roberta-base: 0.8992131651698811


In [10]:
model_ckpt = model_names['electra']
print(f'\n===== Finetuning {model_ckpt} =====')
# Use use_fast=False to avoid SentencePiece conversion errors
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=2)

# Tokenize subset
train_subset = train_dataset.select(range(subset_size))
val_subset = val_dataset.select(range(500))
tokenized_train = train_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)
tokenized_val = val_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)

# Set format for PyTorch
tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_val.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

training_args = TrainingArguments(
    output_dir=f'./results/{model_ckpt.replace("/", "_")}',
    num_train_epochs=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    logging_steps=50,
    learning_rate=2e-5,
    weight_decay=0.01,
    remove_unused_columns=False,
    load_best_model_at_end=True,
    eval_strategy='steps',   # <-- FIX: Set both strategies to 'steps'
    save_strategy='steps',         # <-- FIX: Set both strategies to 'steps'
    eval_steps=50,                 # <-- Evaluate every 50 steps
    save_steps=50,                 # <-- Save every 50 steps
    report_to=None,
    seed=42
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    compute_metrics=compute_f1,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=1)]
)

trainer.train()
eval_result = trainer.evaluate()
results[model_ckpt] = eval_result['eval_f1']
print(f'F1 score for {model_ckpt}:', eval_result['eval_f1'])


===== Finetuning google/electra-small-discriminator =====


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at google/electra-small-discriminator and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|██████████| 2000/2000 [00:01<00:00, 1757.68 examples/s]
Map: 100%|██████████| 500/500 [00:00<00:00, 1502.46 examples/s]


Step,Training Loss,Validation Loss,F1
50,0.6926,0.685363,0.396925
100,0.6886,0.67678,0.504568
150,0.6752,0.666409,0.651035
200,0.6568,0.642896,0.698613
250,0.6339,0.623895,0.766896


F1 score for google/electra-small-discriminator: 0.7668961053525369


In [11]:
model_ckpt = model_names['ettin']
print(f'\n===== Finetuning {model_ckpt} =====')
# Use use_fast=False to avoid SentencePiece conversion errors
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=2)

# Tokenize subset
train_subset = train_dataset.select(range(subset_size))
val_subset = val_dataset.select(range(500))
tokenized_train = train_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)
tokenized_val = val_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)

# Set format for PyTorch
tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_val.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

training_args = TrainingArguments(
    output_dir=f'./results/{model_ckpt.replace("/", "_")}',
    num_train_epochs=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    logging_steps=50,
    learning_rate=2e-5,
    weight_decay=0.01,
    remove_unused_columns=False,
    load_best_model_at_end=True,
    eval_strategy='steps',   # <-- FIX: Set both strategies to 'steps'
    save_strategy='steps',         # <-- FIX: Set both strategies to 'steps'
    eval_steps=50,                 # <-- Evaluate every 50 steps
    save_steps=50,                 # <-- Save every 50 steps
    report_to=None,
    seed=42
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    compute_metrics=compute_f1,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=1)]
)

trainer.train()
eval_result = trainer.evaluate()
results[model_ckpt] = eval_result['eval_f1']
print(f'F1 score for {model_ckpt}:', eval_result['eval_f1'])


===== Finetuning jhu-clsp/ettin-encoder-17m =====


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of ModernBertForSequenceClassification were not initialized from the model checkpoint at jhu-clsp/ettin-encoder-17m and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|██████████| 2000/2000 [00:00<00:00, 4363.38 examples/s]
Map: 100%|██████████| 500/500 [00:00<00:00, 3860.87 examples/s]


Step,Training Loss,Validation Loss,F1
50,0.8131,0.579469,0.693205
100,0.5957,0.497668,0.748438
150,0.4973,0.610944,0.656288


F1 score for jhu-clsp/ettin-encoder-17m: 0.7484377251672673


In [12]:
model_ckpt = model_names['gte']
print(f'\n===== Finetuning {model_ckpt} =====')
# Use use_fast=False to avoid SentencePiece conversion errors
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=2)

# Tokenize subset
train_subset = train_dataset.select(range(subset_size))
val_subset = val_dataset.select(range(500))
tokenized_train = train_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)
tokenized_val = val_subset.map(lambda x: preprocess_function(x, tokenizer), batched=True)

# Set format for PyTorch
tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_val.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

training_args = TrainingArguments(
    output_dir=f'./results/{model_ckpt.replace("/", "_")}',
    num_train_epochs=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    logging_steps=50,
    learning_rate=2e-5,
    weight_decay=0.01,
    remove_unused_columns=False,
    load_best_model_at_end=True,
    eval_strategy='steps',   # <-- FIX: Set both strategies to 'steps'
    save_strategy='steps',         # <-- FIX: Set both strategies to 'steps'
    eval_steps=50,                 # <-- Evaluate every 50 steps
    save_steps=50,                 # <-- Save every 50 steps
    report_to=None,
    seed=42
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    compute_metrics=compute_f1,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=1)]
)

trainer.train()
eval_result = trainer.evaluate()
results[model_ckpt] = eval_result['eval_f1']
print(f'F1 score for {model_ckpt}:', eval_result['eval_f1'])


===== Finetuning thenlper/gte-small =====


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at thenlper/gte-small and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|██████████| 2000/2000 [00:00<00:00, 3891.90 examples/s]
Map: 100%|██████████| 500/500 [00:00<00:00, 3305.83 examples/s]


Step,Training Loss,Validation Loss,F1
50,0.6538,0.520883,0.912089
100,0.4105,0.306399,0.924028
150,0.3192,0.359289,0.86789


F1 score for thenlper/gte-small: 0.9240282067418631


In [14]:
# Display all F1 scores in a sorted table
results = pd.DataFrame.from_dict(results, orient='index', columns=['F1 Score']).sort_values(by='F1 Score', ascending=False)
print('\nAll model F1 scores:')
display(results)


All model F1 scores:


Unnamed: 0,F1 Score
thenlper/gte-small,0.924028
FacebookAI/roberta-base,0.899213
google-bert/bert-base-uncased,0.874291
google/electra-small-discriminator,0.766896
jhu-clsp/ettin-encoder-17m,0.748438


In [16]:
# Select the best model based on F1 score
best_model_ckpt = 'thenlper/gte-small'

In [18]:
# Finetune the best model on the full training set and evaluate on the test set
print(f'\nFinetuning best model {best_model_ckpt} on the full dataset...')

# Reload tokenizer and model for best checkpoint
best_tokenizer = AutoTokenizer.from_pretrained(best_model_ckpt, use_fast=False)
best_model = AutoModelForSequenceClassification.from_pretrained(best_model_ckpt, num_labels=2)

# Tokenize full train and test sets
full_tokenized_train = train_dataset.map(lambda x: preprocess_function(x, best_tokenizer), batched=True)
full_tokenized_test = test_dataset.map(lambda x: preprocess_function(x, best_tokenizer), batched=True)

full_tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
full_tokenized_test.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

full_training_args = TrainingArguments(
    output_dir=f'./results/best_full_{best_model_ckpt.replace("/", "_")}',
    num_train_epochs=2,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    logging_steps=50,
    learning_rate=2e-5,
    weight_decay=0.01,
    remove_unused_columns=False,
    load_best_model_at_end=True,
    eval_strategy='steps',   # <-- FIX: Set both strategies to 'steps'
    save_strategy='steps',         # <-- FIX: Set both strategies to 'steps'
    eval_steps=50,                 # <-- Evaluate every 50 steps
    save_steps=50,                 # <-- Save every 50 steps
    report_to=None,
    seed=42
)

full_trainer = Trainer(
    model=best_model,
    args=full_training_args,
    train_dataset=full_tokenized_train,
    eval_dataset=full_tokenized_test,
    compute_metrics=compute_f1
)

full_trainer.train()
full_eval_result = full_trainer.evaluate()
print(f'\nTest F1 score for best model: {full_eval_result["eval_f1"]:.4f}')


Finetuning best model thenlper/gte-small on the full dataset...


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at thenlper/gte-small and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|██████████| 40500/40500 [02:45<00:00, 244.00 examples/s]
Map: 100%|██████████| 5000/5000 [00:19<00:00, 260.49 examples/s]


Step,Training Loss,Validation Loss,F1
50,0.6595,0.518654,0.897784
100,0.4342,0.316876,0.903045
150,0.2724,0.254883,0.921
200,0.2625,0.244927,0.91971
250,0.2831,0.249364,0.920988
300,0.302,0.295854,0.900108
350,0.2375,0.280737,0.918473
400,0.2456,0.242212,0.924997
450,0.274,0.219163,0.928999
500,0.2559,0.251821,0.924732


In [21]:
# Inference on 10 randomly sampled reviews from the test set
num_samples = 10
sample_indices = random.sample(range(len(test_dataset)), num_samples)
samples = [test_dataset[i] for i in sample_indices]

best_model.eval()

for i, sample in enumerate(samples):
    text = sample['review']
    true_label = sample['sentiment']
    inputs = best_tokenizer(text, return_tensors='pt', truncation=True, padding='max_length', max_length=256)
    inputs = {k: v.to(best_model.device) for k, v in inputs.items()}
    with torch.no_grad():
        outputs = best_model(**inputs)
        pred_label = torch.argmax(outputs.logits, dim=1).item()
    print(f'--- Sample {i+1} ---')
    print('Review:', text[:500] + ('...' if len(text) > 500 else ''))
    print('True label:', true_label, '| Predicted label:', pred_label)
    print()

--- Sample 1 ---
Review: When I was young I had seen very few movies. My parents in all their wisdom rented this one. I was very wary of what the movie was about, in fact I wasn't even allowed to watch it. My brother and sister got to of course and this made me very angry. So what did I do? Late at night I trashed the VCR! Kicked the screen of the TV in and called the police and reported vandals. I was arrested of course, I was unable to get my foot out of the TV set before the police arrived. I was only given a stern t...
True label: positive | Predicted label: 1

--- Sample 2 ---
Review: - When the local sheriff is killed, his wife takes over until and is determined to clean-up the town. Not everyone in town, however, is happy with what she's doing. When the sheriff orders a curfew in town, the local saloon owner (also a woman) hires a killer to take care of the sheriff. There's no way the saloon owner could know that the sheriff and the killer would fall in love.<br /><br />- Gunsli