# Fine-tuning BERTweet for Sentiment Analysis

This notebook demonstrates how to fine-tune the BERTweet model for sentiment analysis and save the result in PKL format for later use.

## 1. Install Required Packages

First, let's install the necessary packages for fine-tuning:

In [None]:
# in one cell, *only* this line:
%pip install --upgrade "accelerate>=0.26.0"













In [62]:
import subprocess, sys

def ensure_accelerate(min_version="0.26.0"):
    try:
        import accelerate
        from packaging import version
        if version.parse(accelerate.__version__) < version.parse(min_version):
            raise ImportError
    except ImportError:
        print(f"Installing/Upgrading accelerate>={min_version}…")
        subprocess.check_call([sys.executable, "-m", "pip",
                               "install", f"accelerate>={min_version}"])
        # after installation you must restart the process to pick it up
        print("Please restart the Python process now that accelerate is installed.")
        sys.exit(0)

ensure_accelerate()


















In [63]:
# Install required packages
!pip install transformers[torch] --upgrade
!pip install 'accelerate>=0.26.0'  # Required for Trainer with PyTorch
!pip install torch
!pip install nltk
!pip install emoji==0.6.0  # Must use version 0.6.0 or 0.5.4 for compatibility
!pip install datasets
!pip install scikit-learn
!pip install tqdm
!pip install pandas
!pip install matplotlib
!pip install seaborn
!pip install joblib





ERROR: Invalid requirement: "'accelerate": Expected package name at the start of dependency specifier
    'accelerate
    ^

    'accelerate
    ^





ERROR: Invalid requirement: '#': Expected package name at the start of dependency specifier
    #
    ^

    #
    ^











## 2. Import Libraries and Setup TweetNormalizer

Now let's import the necessary libraries and set up the TweetNormalizer for data preprocessing:

In [64]:
import os
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm
import re
import joblib
import random

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification, 
    Trainer, 
    TrainingArguments,
    EvalPrediction,
    EarlyStoppingCallback
)
from datasets import Dataset

# Set random seeds for reproducibility
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

# Create TweetNormalizer.py if it doesn't exist or has issues
tweet_normalizer_content = '''
# TweetNormalizer module for BERTweet
# Based on https://github.com/VinAIResearch/BERTweet

import re
import emoji

def normalizeTweet(tweet):
    """
    Normalize tweet text:
    1. Replace URLs with HTTPURL
    2. Replace user mentions with @USER
    3. Replace emojis with text
    4. Other normalizations for Twitter-specific content
    """
    # Replace URLs with HTTPURL
    tweet = re.sub(r\'https?://\\S+\', \'HTTPURL\', tweet)
    
    # Replace user mentions with @USER
    tweet = re.sub(r\'@\\w+\', \'@USER\', tweet)
    
    # Replace emojis with text representation
    tweet = emoji.demojize(tweet)
    
    # Other normalizations
    tweet = tweet.replace(\'#\', \' #\')  # Add space before hashtags
    tweet = re.sub(r\'\\s+\', \' \', tweet)  # Replace multiple spaces with single space
    
    return tweet.strip()
'''

with open('TweetNormalizer.py', 'w', encoding='utf-8') as f:
    f.write(tweet_normalizer_content)

print("TweetNormalizer.py created successfully!")

from TweetNormalizer import normalizeTweet

TweetNormalizer.py created successfully!



## 3. Load and Prepare Dataset

Load the sentiment140 dataset and prepare it for fine-tuning using a small sample for quick testing:

In [65]:
# Load the dataset
df = pd.read_csv('sentiment140.csv', encoding='ISO-8859-1')
print("Columns in the dataset:", df.columns.tolist())

# Rename columns if needed
df.columns = ['target', 'id', 'date', 'flag', 'user', 'text']

# Convert labels: Twitter Sentiment140 uses 0 for negative, 4 for positive
# Convert to 0 for negative, 1 for positive for easier handling
df['target'] = df['target'].replace(4, 1)

# Verify balance of classes
class_distribution = df['target'].value_counts()
print("\nClass distribution:")
print(class_distribution)

# Normalize tweets with improved function to handle errors
def safe_normalize(text):
    try:
        if pd.isna(text) or text == '':
            return ''
        return normalizeTweet(str(text))
    except Exception as e:
        print(f"Error normalizing text: {str(e)}")
        return str(text)  # Return original text if normalization fails

# Take a small sample first to verify the pipeline works
# For actual training, you'll want to increase this number
sample_size = 5000  # Small sample for quick testing and validation

# Sample from the dataset before normalization to speed things up
df_positive = df[df['target'] == 1].sample(sample_size//2, random_state=seed)
df_negative = df[df['target'] == 0].sample(sample_size//2, random_state=seed)
df_sample = pd.concat([df_positive, df_negative]).reset_index(drop=True)

# Now normalize only the sampled data
df_sample['normalized_text'] = df_sample['text'].apply(safe_normalize)

# Filter out empty or extremely short tweets as they may not be useful for training
df_sample = df_sample[df_sample['normalized_text'].str.len() > 5]

# Rebalance if filtering removed any tweets
if df_sample['target'].value_counts().size > 1:  # Check that both classes still exist
    min_class_size = min(df_sample['target'].value_counts())
    df_positive = df_sample[df_sample['target'] == 1].sample(min_class_size, random_state=seed)
    df_negative = df_sample[df_sample['target'] == 0].sample(min_class_size, random_state=seed)
    df_sample = pd.concat([df_positive, df_negative]).reset_index(drop=True)

# Shuffle the data
df_sample = df_sample.sample(frac=1, random_state=seed).reset_index(drop=True)

# Display some examples
print("\nSample data:")
print(df_sample[['text', 'normalized_text', 'target']].head())

# Print dataset size
print(f"\nDataset size: {len(df_sample)} tweets")
print(f"Class distribution in sample: {df_sample['target'].value_counts()}")
print("\nNote: Using a small sample size of {0} tweets for quick testing.".format(sample_size))
print("For better model performance, increase the sample_size variable.")

Columns in the dataset: ['0', '1467810369', 'Mon Apr 06 22:19:45 PDT 2009', 'NO_QUERY', '_TheSpecialOne_', "@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D"]
 ['0', '1467810369', 'Mon Apr 06 22:19:45 PDT 2009', 'NO_QUERY', '_TheSpecialOne_', "@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D"]

Class distribution:
target
1    800000
0    799999
Name: count, dtype: int64

Class distribution:
target
1    800000
0    799999
Name: count, dtype: int64

Sample data:
                                                text  \
0                          today is a beautiful day    
1  i've wasted my time making ANOTHER background!...   
2  Ughh todays beach weather... But I have work. ...   
3  @fashionesedaily @irlita @yunita_dee @Velasss ...   
4      is working on a SUNDAY!!!!!!!!!!!!!!!!!!!!!!    

                                     normalized_text  target  
0

## 4. Split the Data and Create Datasets

Split the data into training, validation, and test sets, and prepare the datasets for the Transformers library:

In [66]:
# Split the data into train and temp (for validation and test)
train_df, temp_df = train_test_split(df_sample, test_size=0.3, random_state=seed, stratify=df_sample['target'])
# Split temp into validation and test
val_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=seed, stratify=temp_df['target'])

# Convert pandas DataFrames to Hugging Face datasets
train_dataset = Dataset.from_pandas(train_df[['normalized_text', 'target']])
val_dataset = Dataset.from_pandas(val_df[['normalized_text', 'target']])
test_dataset = Dataset.from_pandas(test_df[['normalized_text', 'target']])

print(f"Training set size: {len(train_dataset)}")
print(f"Validation set size: {len(val_dataset)}")
print(f"Testing set size: {len(test_dataset)}")

# Verify class balance
print(f"\nTraining class distribution: {train_df['target'].value_counts()}")
print(f"Validation class distribution: {val_df['target'].value_counts()}")
print(f"Testing class distribution: {test_df['target'].value_counts()}")

Training set size: 3494
Validation set size: 749
Testing set size: 749

Training class distribution: target
0    1747
1    1747
Name: count, dtype: int64

Validation set size: 749
Testing set size: 749

Training class distribution: target
0    1747
1    1747
Name: count, dtype: int64
Validation class distribution: target
1    375
0    374
Name: count, dtype: int64
Testing class distribution: target
0    375
1    374
Name: count, dtype: int64
Validation class distribution: target
1    375
0    374
Name: count, dtype: int64
Testing class distribution: target
0    375
1    374
Name: count, dtype: int64


## 5. Load BERTweet Model and Tokenizer

Load the pre-trained BERTweet model and tokenizer with proper initialization:

In [67]:
# Select BERTweet model
model_name = "vinai/bertweet-base"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model with initialization appropriate for classification
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    num_labels=2,
    problem_type="single_label_classification",
    # Ensure proper weight initialization
    attention_probs_dropout_prob=0.1,
    hidden_dropout_prob=0.1,
    classifier_dropout=0.1,
)

# Verify model config
print(f"Loaded {model_name} for sequence classification with 2 labels")
print(f"Model config: {model.config}")

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at vinai/bertweet-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Loaded vinai/bertweet-base for sequence classification with 2 labels
Model config: RobertaConfig {
  "_attn_implementation_autoset": true,
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": 0.1,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 130,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "tokenizer_class": "BertweetTokenizer",
  "torch_dtype": "float32",
  "transformers_version": "4.51.3",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 64001
}


Model config: RobertaConfig {
  "_attn_implementation_autoset": true,
  "architectures": [
    "Rob

## 6. Tokenize the Datasets

Prepare the datasets by tokenizing the texts with improved error handling:

In [68]:
def tokenize_function(examples):
    """Tokenize the texts with error handling"""
    # Clean and validate input texts
    texts = [str(text) if text else "" for text in examples['normalized_text']]
    
    # Use truncation and padding for consistent lengths
    return tokenizer(
        texts, 
        padding='max_length', 
        truncation=True, 
        max_length=128,
        return_tensors="pt"
    )

# Tokenize datasets
tokenized_train_dataset = train_dataset.map(tokenize_function, batched=True)
tokenized_val_dataset = val_dataset.map(tokenize_function, batched=True)
tokenized_test_dataset = test_dataset.map(tokenize_function, batched=True)

# Set the format for PyTorch - FIX: Use tokenized datasets for renaming columns
tokenized_train_dataset = tokenized_train_dataset.rename_column("target", "labels")
tokenized_val_dataset = tokenized_val_dataset.rename_column("target", "labels")  # Fixed: was using val_dataset
tokenized_test_dataset = tokenized_test_dataset.rename_column("target", "labels")  # Fixed: was using test_dataset

tokenized_train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
tokenized_val_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
tokenized_test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

print("Datasets have been tokenized and formatted for PyTorch")

# Check a sample of tokenized data
sample_item = tokenized_train_dataset[0]
print(f"\nSample input_ids length: {len(sample_item['input_ids'])}")
print(f"Sample label: {sample_item['labels']}")

Map:   0%|          | 0/3494 [00:00<?, ? examples/s]

Map:   0%|          | 0/749 [00:00<?, ? examples/s]

Map:   0%|          | 0/749 [00:00<?, ? examples/s]

Datasets have been tokenized and formatted for PyTorch

Sample input_ids length: 128
Sample label: 0


Sample input_ids length: 128
Sample label: 0


## 7. Define Training Arguments and Metrics

Define the training configuration and evaluation metrics with fixes for zero-division warnings:

In [None]:
from transformers import TrainingArguments, Trainer, get_scheduler
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import os
import inspect

# Define compute_metrics function for evaluation with zero_division handling
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    
    # Use zero_division=0 to handle cases where a class has no predictions
    return {
        "accuracy": accuracy_score(labels, preds),
        "precision": precision_score(labels, preds, zero_division=0),
        "recall": recall_score(labels, preds, zero_division=0),
        "f1": f1_score(labels, preds, zero_division=0),
    }

# Calculate class weights to handle any imbalance
total = len(df_sample)
neg_weight = total / (2 * (df_sample['target'] == 0).sum())
pos_weight = total / (2 * (df_sample['target'] == 1).sum())
print(f"Class weights - Negative: {neg_weight:.4f}, Positive: {pos_weight:.4f}")

# Clean up any existing checkpoints to start fresh
checkpoint_dir = "./bertweet-sentiment"
if os.path.exists(checkpoint_dir):
    import shutil
    try:
        print(f"Removing old checkpoint directory: {checkpoint_dir}")
        shutil.rmtree(checkpoint_dir)
    except Exception as e:
        print(f"Warning: Could not remove checkpoint directory: {str(e)}")

# Check which arguments are supported by this version of transformers
training_args_params = inspect.signature(TrainingArguments).parameters
args_dict = {
    "output_dir": checkpoint_dir,
    "num_train_epochs": 3,
    "per_device_train_batch_size": 16,
    "per_device_eval_batch_size": 32,
    "learning_rate": 2e-5,
    "weight_decay": 0.01,
    "logging_dir": "./logs",
    "logging_steps": 10,
    "save_steps": 50,
    "save_total_limit": 1,
    "dataloader_num_workers": 2,
    "fp16": torch.cuda.is_available(),
}

# Add optional parameters only if supported
if "eval_steps" in training_args_params:
    args_dict["eval_steps"] = 50

if "evaluation_strategy" in training_args_params:
    # Import and use IntervalStrategy if available
    try:
        from transformers import IntervalStrategy
        args_dict["evaluation_strategy"] = IntervalStrategy.STEPS
    except ImportError:
        # Fallback to string version
        args_dict["evaluation_strategy"] = "steps"
elif "eval_strategy" in training_args_params:  # older versions
    args_dict["eval_strategy"] = "steps"

if "save_strategy" in training_args_params:
    # Import and use SaveStrategy if available
    try:
        from transformers import SaveStrategy
        args_dict["save_strategy"] = SaveStrategy.STEPS
    except ImportError:
        # Fallback to string version
        args_dict["save_strategy"] = "steps"

if "load_best_model_at_end" in training_args_params:
    args_dict["load_best_model_at_end"] = True
    
if "metric_for_best_model" in training_args_params:
    args_dict["metric_for_best_model"] = "f1"

if "optim" in training_args_params:
    args_dict["optim"] = "adamw_torch"

if "warmup_ratio" in training_args_params:
    args_dict["warmup_ratio"] = 0.1

if "gradient_accumulation_steps" in training_args_params:
    args_dict["gradient_accumulation_steps"] = 2

if "report_to" in training_args_params:
    args_dict["report_to"] = None

if "save_safetensors" in training_args_params:
    args_dict["save_safetensors"] = True

# Create training arguments with only supported parameters
training_args = TrainingArguments(**args_dict)

print("Training arguments initialized successfully with compatible settings")
print(f"Using output_dir: {training_args.output_dir}")

Using IntervalStrategy for both evaluation and saving (older transformers version)
Class weights - Negative: 1.0000, Positive: 1.0000
Class weights - Negative: 1.0000, Positive: 1.0000



ValueError: --load_best_model_at_end requires the save and eval strategy to match, but found
- Evaluation strategy: IntervalStrategy.NO
- Save strategy: SaveStrategy.STEPS

## 8. Create Custom Loss Function For Class Weights

Create a custom Trainer class to incorporate class weights into the training process:

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class WeightedLossTrainer(Trainer):
    """Custom trainer class that applies weighted loss"""
    def __init__(self, class_weights=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.class_weights = class_weights
        
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.logits
        
        if self.class_weights is not None:
            # Apply class weights to the loss
            weights = torch.tensor(self.class_weights, device=labels.device)
            loss_fct = nn.CrossEntropyLoss(weight=weights)
            loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        else:
            # Standard loss if no class weights
            loss = F.cross_entropy(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        
        return (loss, outputs) if return_outputs else loss

# Apply class weights
class_weights = [neg_weight, pos_weight]
print(f"Using class weights: {class_weights}")

## 9. Create Trainer and Fine-tune the Model

Initialize the Trainer with our custom weighted loss and fine-tune BERTweet on the sentiment dataset:

In [None]:
# Initialize Trainer with custom weighted loss
trainer = WeightedLossTrainer(
    class_weights=class_weights,
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_val_dataset,  # Use validation set during training
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]  # Stop if no improvement for 3 evaluations
)

# Record start time
from datetime import datetime
start_time = datetime.now()
print(f"Starting fine-tuning at {start_time}")
print(f"Training on {len(tokenized_train_dataset)} examples, validating on {len(tokenized_val_dataset)} examples")
print("Using small sample size for quick testing - this should complete in minutes")

# Fine-tune the model
try:
    trainer.train()
    print("Training completed successfully!")
except Exception as e:
    print(f"Training error: {str(e)}")
    print("Attempting to continue with reduced settings...")
    
    # Try again with more conservative settings
    training_args.per_device_train_batch_size = 8
    training_args.gradient_accumulation_steps = 1
    training_args.fp16 = False
    
    trainer = WeightedLossTrainer(
        class_weights=class_weights,
        model=model,
        args=training_args,
        train_dataset=tokenized_train_dataset,
        eval_dataset=tokenized_val_dataset,
        compute_metrics=compute_metrics,
    )
    trainer.train()

# Record end time
end_time = datetime.now()
training_duration = end_time - start_time
print(f"Fine-tuning complete at {end_time}!")
print(f"Training duration: {training_duration}")

In [None]:
# Save the final model with safetensors format
try:
    # Create a separate directory for the final model
    final_model_dir = "./bertweet-sentiment-final"
    os.makedirs(final_model_dir, exist_ok=True)
    
    # Save just the model weights (no optimizer state)
    model.save_pretrained(final_model_dir, safe_serialization=True)
    tokenizer.save_pretrained(final_model_dir)
    
    print(f"Final model saved successfully to {final_model_dir}")
except Exception as e:
    print(f"Error saving final model: {str(e)}")
    print("You can still use the model in memory for evaluation and predictions.")

## 10. Evaluate the Fine-tuned Model

Evaluate the fine-tuned model on the test dataset with detailed metrics:

In [None]:
# Evaluate the model
eval_results = trainer.evaluate(eval_dataset=tokenized_test_dataset)
print(f"Evaluation results: {eval_results}")

# Get predictions for confusion matrix
predictions = trainer.predict(tokenized_test_dataset)
preds = np.argmax(predictions.predictions, axis=-1)
labels = tokenized_test_dataset['labels'].numpy()

# Print more detailed metrics
accuracy = accuracy_score(labels, preds)
precision = precision_score(labels, preds, zero_division=0)
recall = recall_score(labels, preds, zero_division=0)
f1 = f1_score(labels, preds, zero_division=0)

print(f"\nDetailed metrics:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

# Create confusion matrix
cm = confusion_matrix(labels, preds)
print(f"\nConfusion Matrix:")
print(cm)

# Plot confusion matrix with percentages
plt.figure(figsize=(10, 8))
sns.heatmap(
    cm, 
    annot=True, 
    fmt='d',
    cmap='Blues',
    xticklabels=['Negative', 'Positive'],
    yticklabels=['Negative', 'Positive']
)
plt.title('BERTweet Confusion Matrix')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

# Check for any class imbalance in predictions
unique, counts = np.unique(preds, return_counts=True)
pred_distribution = dict(zip(unique, counts))
print(f"\nPrediction distribution: {pred_distribution}")

## 11. Save the Model in PKL Format

Save the fine-tuned model, tokenizer, and a prediction function for later use:

In [None]:
# Save the model
model_path = "./bertweet-sentiment-finetuned"
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

# Create an improved prediction function
def predict_sentiment(texts, model=model, tokenizer=tokenizer):
    """Predicts sentiment for a list of texts with proper error handling"""
    if not isinstance(texts, list):
        texts = [texts]  # Handle single text input
    
    # Normalize tweets
    normalized_texts = [safe_normalize(text) for text in texts]
    
    # Tokenize with error handling
    try:
        inputs = tokenizer(normalized_texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
        
        # Move to GPU if available
        if torch.cuda.is_available():
            inputs = {k: v.cuda() for k, v in inputs.items()}
            model.cuda()
        
        # Get predictions
        with torch.no_grad():
            outputs = model(**inputs)
        
        # Get probabilities and predictions
        probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
        preds = torch.argmax(probs, dim=-1)
        
        # Move back to CPU for processing
        preds = preds.cpu().numpy()
        probs = probs.cpu().numpy()
        
        results = []
        for i, pred in enumerate(preds):
            sentiment = "Positive" if pred == 1 else "Negative"
            confidence = probs[i][pred]
            results.append({
                "text": texts[i],
                "sentiment": sentiment,
                "confidence": float(confidence),  # Convert numpy float to Python float
                "probabilities": {"negative": float(probs[i][0]), "positive": float(probs[i][1])}
            })
        
        return results
    except Exception as e:
        print(f"Error during prediction: {str(e)}")
        return [{"text": text, "error": str(e)} for text in texts]

# Save the function with the model and tokenizer
model_package = {
    "model": model,
    "tokenizer": tokenizer,
    "predict_function": predict_sentiment
}

joblib.dump(model_package, "bertweet_sentiment_model.pkl")
print("Model saved as bertweet_sentiment_model.pkl")

## 12. Test the Saved Model

Load the saved model and test it with some example tweets:

In [None]:
# Load the saved model package
loaded_package = joblib.load("bertweet_sentiment_model.pkl")

# Extract components
loaded_predict_function = loaded_package["predict_function"]

# Test with some example tweets
test_tweets = [
    "I absolutely love this new phone! The battery life is amazing! #happy",
    "This service is terrible. I've been waiting for hours and still no response. #angry",
    "Just received my order. Can't wait to try it out.",
    "The weather today is quite nice.",
    "I'm so disappointed with the quality of this product. 😡"
]

# Get predictions
results = loaded_predict_function(test_tweets)

# Display results
for result in results:
    print(f"Text: {result['text']}")
    print(f"Sentiment: {result['sentiment']} (confidence: {result['confidence']:.4f})")
    print(f"Probabilities: Negative: {result['probabilities']['negative']:.4f}, Positive: {result['probabilities']['positive']:.4f}")
    print("---")

## 13. Batch Processing Function

Create an improved function for batch processing with error handling:

In [None]:
def batch_predict(texts, model_path="bertweet_sentiment_model.pkl", batch_size=32):
    """
    Loads the saved BERTweet model package, normalizes the texts,
    and returns the sentiment predictions with batching for efficiency.
    """
    # Load model package
    try:
        package = joblib.load(model_path)
        model = package["model"]
        tokenizer = package["tokenizer"]
        
        # Move model to GPU if available
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model.to(device)
        model.eval()
        
        # Process in batches
        results = []
        for i in range(0, len(texts), batch_size):
            batch_texts = texts[i:i + batch_size]
            
            # Normalize texts
            normalized_texts = [safe_normalize(text) for text in batch_texts]
            
            # Tokenize
            inputs = tokenizer(normalized_texts, padding=True, truncation=True, return_tensors="pt")
            inputs = {k: v.to(device) for k, v in inputs.items()}
            
            # Get predictions
            with torch.no_grad():
                outputs = model(**inputs)
            
            # Get probabilities and predictions
            probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
            preds = torch.argmax(probs, dim=-1)
            
            # Move back to CPU
            preds = preds.cpu().numpy()
            probs = probs.cpu().numpy()
            
            for j, pred in enumerate(preds):
                results.append({
                    "text": batch_texts[j],
                    "sentiment": "Positive" if pred == 1 else "Negative",
                    "pred_label": int(pred),
                    "confidence": float(probs[j][pred]),
                    "negative_prob": float(probs[j][0]),
                    "positive_prob": float(probs[j][1])
                })
        
        return results
    
    except Exception as e:
        print(f"Error in batch_predict: {str(e)}")
        return [{"error": str(e), "text": text} for text in texts]

# Example usage
sample_texts = ["I love this product!", "This is the worst experience ever.", "The weather is nice today."]
batch_results = batch_predict(sample_texts)
print("Batch prediction results:")
for result in batch_results:
    print(f"Text: {result['text']}")
    print(f"Sentiment: {result['sentiment']} (confidence: {result['confidence']:.4f})")
    print(f"Negative prob: {result['negative_prob']:.4f}, Positive prob: {result['positive_prob']:.4f}")
    print("---")

## 14. Error Analysis & Model Interpretability

Let's analyze some of the model's errors and understand why it makes certain predictions:

In [None]:
# Get a sample of test predictions to analyze errors
test_texts = test_df['normalized_text'].tolist()[:50]  # Use a smaller sample for quicker analysis
true_labels = test_df['target'].tolist()[:50]

# Get predictions
error_analysis_results = batch_predict(test_texts)

# Find examples of errors
errors = []
for i, result in enumerate(error_analysis_results):
    pred_label = result['pred_label']
    true_label = true_labels[i]
    confidence = result['confidence']
    
    if pred_label != true_label:
        errors.append({
            "text": result['text'],
            "true_sentiment": "Positive" if true_label == 1 else "Negative",
            "predicted_sentiment": result['sentiment'],
            "confidence": confidence,
            "neg_prob": result['negative_prob'],
            "pos_prob": result['positive_prob']
        })

# Display some errors
print(f"Found {len(errors)} errors in the sample")
print("\nSample of errors:")
for i, error in enumerate(errors[:5]):  # Show first 5 errors
    print(f"Error {i+1}:")
    print(f"Text: {error['text']}")
    print(f"True: {error['true_sentiment']}, Predicted: {error['predicted_sentiment']} (confidence: {error['confidence']:.4f})")
    print(f"Negative prob: {error['neg_prob']:.4f}, Positive prob: {error['pos_prob']:.4f}")
    print("---")

## Conclusion

We've successfully fixed the BERTweet fine-tuning process using a small sample size for quick testing and validation. The key improvements include:

1. Using a small sample (5,000 tweets) to quickly test the pipeline
2. Implementing class balancing to ensure equal representation of positive and negative tweets
3. Adding weighted loss function to handle class imbalance
4. Using proper metrics with zero_division handling
5. Including validation during training to monitor progress
6. Adding error analysis to understand model predictions

With this approach, you can quickly validate your fine-tuning pipeline and ensure it works correctly before scaling up to larger datasets. To improve model performance on real-world data, simply increase the `sample_size` variable in the data preparation step.