# Task 5: Fine-Tuning FinBERT for Sentiment Analysis

## Overview
This notebook implements fine-tuning of FinBERT to achieve ‚â•90% accuracy on the Financial PhraseBank dataset.

**Status**: Fine-tuning is **REQUIRED** because all three methods performed below 90%:
- FinBERT: 25.37%
- Local LLM: 71.59%
- RAG-Enhanced: 84.48%

## Objectives
1. Load and prepare the dataset for fine-tuning
2. Split data into train/validation/test sets
3. Fine-tune FinBERT model with proper hyperparameters
4. Evaluate fine-tuned model (must achieve ‚â•90% accuracy)
5. Save fine-tuned model and document training details



## Step 1: Install Required Libraries


In [1]:
# Install required libraries
!pip install transformers torch scikit-learn datasets -q


## Step 2: Import Libraries


In [30]:
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
    EarlyStoppingCallback
)
from transformers.trainer_utils import IntervalStrategy # Added import
from sklearn.metrics import (
    accuracy_score,
    precision_recall_fscore_support,
    confusion_matrix,
    classification_report
)
from sklearn.model_selection import train_test_split
import warnings
import time
import os
from datetime import datetime

warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

print("‚úì Libraries imported successfully!")
print(f"‚úì PyTorch version: {torch.__version__}")
print(f"‚úì CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úì GPU: {torch.cuda.get_device_name(0)}")

‚úì Libraries imported successfully!
‚úì PyTorch version: 2.9.0+cu126
‚úì CUDA available: True
‚úì GPU: Tesla T4


## Step 3: Load Dataset


In [31]:
# For Google Colab: Upload the preprocessed_dataset.csv file
# Click the upload button and select preprocessed_dataset.csv
from google.colab import files
import io

# Try to load from uploaded file or from local path
try:
    # Try loading from uploaded file (Colab)
    uploaded = files.upload()
    for filename in uploaded.keys():
        if 'preprocessed_dataset.csv' in filename:
            df = pd.read_csv(io.BytesIO(uploaded[filename]))
            print(f"‚úì Loaded {filename} from upload")
            break
except:
    # Try loading from local path
    try:
        df = pd.read_csv('preprocessed_dataset.csv')
        print("‚úì Loaded preprocessed_dataset.csv from local path")
    except:
        # Try alternative paths
        import glob
        csv_files = glob.glob('**/preprocessed_dataset.csv', recursive=True)
        if csv_files:
            df = pd.read_csv(csv_files[0])
            print(f"‚úì Loaded from: {csv_files[0]}")
        else:
            raise FileNotFoundError("Could not find preprocessed_dataset.csv. Please upload it.")

print(f"\nDataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(f"\nFirst few rows:")
print(df[['sentence', 'sentiment']].head())
print(f"\nSentiment distribution:")
print(df['sentiment'].value_counts())
print(f"\nSentiment distribution (percentages):")
print(df['sentiment'].value_counts(normalize=True) * 100)


Saving preprocessed_dataset.csv to preprocessed_dataset (2).csv

Dataset shape: (2264, 6)
Columns: ['sentence', 'processed_text', 'sentiment', 'token_count', 'original_length', 'processed_length']

First few rows:
                                            sentence sentiment
0  According to Gran , the company has no plans t...   neutral
1  For the last quarter of 2010 , Componenta 's n...  positive
2  In the third quarter of 2010 , net sales incre...  positive
3  Operating profit rose to EUR 13.1 mn from EUR ...  positive
4  Operating profit totalled EUR 21.1 mn , up fro...  positive

Sentiment distribution:
sentiment
neutral     1391
positive     570
negative     303
Name: count, dtype: int64

Sentiment distribution (percentages):
sentiment
neutral     61.439929
positive    25.176678
negative    13.383392
Name: proportion, dtype: float64


## Step 4: Prepare Dataset Class for PyTorch


In [32]:
class SentimentDataset(Dataset):
    """Custom Dataset class for sentiment analysis"""

    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]

        # Tokenize the text
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding='max_length',
            max_length=self.max_length,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

print("‚úì SentimentDataset class defined")


‚úì SentimentDataset class defined


## Step 5: Load FinBERT Model and Tokenizer


In [33]:
# Model configuration
MODEL_NAME = "ProsusAI/finbert"
NUM_LABELS = 3  # positive, neutral, negative
MAX_LENGTH = 128

print(f"Loading FinBERT model: {MODEL_NAME}")
print("This may take a moment on first run...")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Load model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=NUM_LABELS
)

# Move model to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

print(f"‚úì FinBERT model loaded successfully!")
print(f"‚úì Model device: {device}")
print(f"‚úì Number of labels: {NUM_LABELS}")
print(f"‚úì Model architecture: {model.__class__.__name__}")


Loading FinBERT model: ProsusAI/finbert
This may take a moment on first run...
‚úì FinBERT model loaded successfully!
‚úì Model device: cuda
‚úì Number of labels: 3
‚úì Model architecture: BertForSequenceClassification


## Step 6: Prepare Data for Training


In [34]:
# Extract sentences and sentiments
sentences = df['sentence'].tolist()
sentiments = df['sentiment'].tolist()

# Create label mapping
label_map = {'positive': 0, 'neutral': 1, 'negative': 2}
reverse_label_map = {0: 'positive', 1: 'neutral', 2: 'negative'}

# Convert sentiments to numeric labels
labels = [label_map[sent] for sent in sentiments]

print(f"Total samples: {len(sentences)}")
print(f"Label distribution: {pd.Series(sentiments).value_counts().to_dict()}")

# Stratified split: 70% train, 15% validation, 15% test
# First split: 70% train, 30% temp
X_train, X_temp, y_train, y_temp = train_test_split(
    sentences, labels,
    test_size=0.3,
    random_state=seed,
    stratify=labels
)

# Second split: 50% of temp (15% total) for validation, 50% (15% total) for test
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp,
    test_size=0.5,
    random_state=seed,
    stratify=y_temp
)

print(f"\nData split:")
print(f"  Training set: {len(X_train)} samples ({len(X_train)/len(sentences)*100:.1f}%)")
print(f"  Validation set: {len(X_val)} samples ({len(X_val)/len(sentences)*100:.1f}%)")
print(f"  Test set: {len(X_test)} samples ({len(X_test)/len(sentences)*100:.1f}%)")

print(f"\nTraining label distribution:")
train_sentiments = [reverse_label_map[label] for label in y_train]
print(pd.Series(train_sentiments).value_counts())


Total samples: 2264
Label distribution: {'neutral': 1391, 'positive': 570, 'negative': 303}

Data split:
  Training set: 1584 samples (70.0%)
  Validation set: 340 samples (15.0%)
  Test set: 340 samples (15.0%)

Training label distribution:
neutral     973
positive    399
negative    212
Name: count, dtype: int64


## Step 7: Create DataLoaders


In [35]:
# Create datasets
train_dataset = SentimentDataset(X_train, y_train, tokenizer, max_length=MAX_LENGTH)
val_dataset = SentimentDataset(X_val, y_val, tokenizer, max_length=MAX_LENGTH)
test_dataset = SentimentDataset(X_test, y_test, tokenizer, max_length=MAX_LENGTH)

# Create data loaders
BATCH_SIZE = 16
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

print(f"‚úì DataLoaders created")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Training batches: {len(train_loader)}")
print(f"  Validation batches: {len(val_loader)}")
print(f"  Test batches: {len(test_loader)}")


‚úì DataLoaders created
  Batch size: 16
  Training batches: 99
  Validation batches: 22
  Test batches: 22


## Step 8: Define Metrics Function


In [36]:
def compute_metrics(eval_pred):
    """Compute metrics for evaluation"""
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)

    # Calculate metrics
    accuracy = accuracy_score(labels, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, predictions, average='macro', zero_division=0
    )

    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }

print("‚úì Metrics function defined")


‚úì Metrics function defined


## Step 9: Configure Training Arguments


In [54]:
# Training hyperparameters
EPOCHS = 5
LEARNING_RATE = 2e-5
WEIGHT_DECAY = 0.01
WARMUP_STEPS = 100

# Create output directory for model checkpoints
output_dir = "./finbert_finetuned"
os.makedirs(output_dir, exist_ok=True)

# Training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    warmup_steps=WARMUP_STEPS,
    logging_dir=f"{output_dir}/logs",
    logging_steps=50,
    eval_strategy=IntervalStrategy.EPOCH,  # Changed from "epoch"
    save_strategy=IntervalStrategy.EPOCH,       # Changed from "epoch"
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    greater_is_better=True,
    save_total_limit=3,
    seed=seed,
    fp16=torch.cuda.is_available(),  # Use mixed precision if GPU available
    report_to="none"  # Disable wandb/tensorboard
)

print("="*80)
print("TRAINING CONFIGURATION")
print("="*80)
print(f"Model: {MODEL_NAME}")
print(f"Epochs: {EPOCHS}")
print(f"Learning Rate: {LEARNING_RATE}")
print(f"Batch Size: {BATCH_SIZE}")
print(f"Weight Decay: {WEIGHT_DECAY}")
print(f"Warmup Steps: {WARMUP_STEPS}")
print(f"Max Length: {MAX_LENGTH}")
print(f"Output Directory: {output_dir}")
print("="*80)

TRAINING CONFIGURATION
Model: ProsusAI/finbert
Epochs: 5
Learning Rate: 2e-05
Batch Size: 16
Weight Decay: 0.01
Warmup Steps: 100
Max Length: 128
Output Directory: ./finbert_finetuned


## Step 10: Initialize Trainer


In [55]:
# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
)

print("‚úì Trainer initialized successfully!")


‚úì Trainer initialized successfully!


## Step 11: Fine-Tune the Model


In [56]:
print("="*80)
print("STARTING FINE-TUNING")
print("="*80)
print(f"Training started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"This may take several minutes...")
print("="*80)

start_time = time.time()

# Train the model
trainer.train()

end_time = time.time()
training_time = end_time - start_time

print("="*80)
print("FINE-TUNING COMPLETED")
print("="*80)
print(f"Training time: {training_time/60:.2f} minutes ({training_time:.2f} seconds)")
print(f"Completed at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*80)


STARTING FINE-TUNING
Training started at: 2025-11-29 22:48:39
This may take several minutes...


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8793,0.110314,0.970588,0.954465,0.962113,0.956868
2,0.0775,0.11221,0.970588,0.942585,0.974744,0.956574
3,0.0181,0.072168,0.982353,0.963399,0.988104,0.974861
4,0.0055,0.077298,0.979412,0.9641,0.981856,0.972022
5,0.0032,0.073755,0.979412,0.966097,0.981856,0.973367


FINE-TUNING COMPLETED
Training time: 2.42 minutes (145.04 seconds)
Completed at: 2025-11-29 22:51:04


## Step 12: Evaluate on Validation Set


In [57]:
print("="*80)
print("VALIDATION SET EVALUATION")
print("="*80)

# Evaluate on validation set
val_results = trainer.evaluate(eval_dataset=val_dataset)

print(f"\nValidation Metrics:")
print(f"  Accuracy: {val_results['eval_accuracy']:.4f} ({val_results['eval_accuracy']*100:.2f}%)")
print(f"  Precision: {val_results['eval_precision']:.4f}")
print(f"  Recall: {val_results['eval_recall']:.4f}")
print(f"  F1-Score: {val_results['eval_f1']:.4f}")

if val_results['eval_accuracy'] >= 0.90:
    print(f"\n‚úÖ SUCCESS! Validation accuracy ‚â• 90%: {val_results['eval_accuracy']*100:.2f}%")
else:
    print(f"\n‚ö†Ô∏è Validation accuracy below 90%: {val_results['eval_accuracy']*100:.2f}%")
print("="*80)


VALIDATION SET EVALUATION



Validation Metrics:
  Accuracy: 0.9824 (98.24%)
  Precision: 0.9634
  Recall: 0.9881
  F1-Score: 0.9749

‚úÖ SUCCESS! Validation accuracy ‚â• 90%: 98.24%


## Step 13: Evaluate on Test Set


In [58]:
print("="*80)
print("TEST SET EVALUATION")
print("="*80)

# Evaluate on test set
test_results = trainer.evaluate(eval_dataset=test_dataset)

print(f"\nTest Set Metrics:")
print(f"  Accuracy: {test_results['eval_accuracy']:.4f} ({test_results['eval_accuracy']*100:.2f}%)")
print(f"  Precision: {test_results['eval_precision']:.4f}")
print(f"  Recall: {test_results['eval_recall']:.4f}")
print(f"  F1-Score: {test_results['eval_f1']:.4f}")

# Check if we achieved ‚â•90% accuracy
if test_results['eval_accuracy'] >= 0.90:
    print(f"\n{'='*80}")
    print(f"üéâ SUCCESS! Fine-tuned model achieved ‚â•90% accuracy!")
    print(f"   Test Accuracy: {test_results['eval_accuracy']*100:.2f}%")
    print(f"{'='*80}")
else:
    print(f"\n{'='*80}")
    print(f"‚ö†Ô∏è WARNING: Test accuracy is below 90%")
    print(f"   Test Accuracy: {test_results['eval_accuracy']*100:.2f}%")
    print(f"   Consider increasing epochs or adjusting hyperparameters")
    print(f"{'='*80}")
print("="*80)


TEST SET EVALUATION



Test Set Metrics:
  Accuracy: 0.9765 (97.65%)
  Precision: 0.9765
  Recall: 0.9536
  F1-Score: 0.9639

üéâ SUCCESS! Fine-tuned model achieved ‚â•90% accuracy!
   Test Accuracy: 97.65%


## Step 14: Detailed Test Set Evaluation with Confusion Matrix


In [59]:
# Get predictions on test set
predictions = trainer.predict(test_dataset)
y_pred = np.argmax(predictions.predictions, axis=1)
y_true = predictions.label_ids

# Calculate detailed metrics
accuracy = accuracy_score(y_true, y_pred)
precision, recall, f1, _ = precision_recall_fscore_support(
    y_true, y_pred, average=None, labels=[0, 1, 2], zero_division=0
)
macro_precision, macro_recall, macro_f1, _ = precision_recall_fscore_support(
    y_true, y_pred, average='macro', zero_division=0
)

# Confusion matrix
cm = confusion_matrix(y_true, y_pred, labels=[0, 1, 2])
cm_df = pd.DataFrame(
    cm,
    index=['positive', 'neutral', 'negative'],
    columns=['positive', 'neutral', 'negative']
)

print("="*80)
print("DETAILED TEST SET EVALUATION")
print("="*80)

print(f"\nOverall Metrics:")
print(f"  Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
print(f"  Macro Precision: {macro_precision:.4f}")
print(f"  Macro Recall: {macro_recall:.4f}")
print(f"  Macro F1-Score: {macro_f1:.4f}")

print(f"\nPer-Class Metrics:")
class_names = ['positive', 'neutral', 'negative']
for i, class_name in enumerate(class_names):
    print(f"\n  {class_name.upper()}:")
    print(f"    Precision: {precision[i]:.4f}")
    print(f"    Recall: {recall[i]:.4f}")
    print(f"    F1-Score: {f1[i]:.4f}")

print(f"\nConfusion Matrix:")
print(cm_df)

print(f"\nClassification Report:")
print(classification_report(
    y_true, y_pred,
    target_names=['positive', 'neutral', 'negative'],
    digits=4
))

print("="*80)


DETAILED TEST SET EVALUATION

Overall Metrics:
  Accuracy: 0.9765 (97.65%)
  Macro Precision: 0.9765
  Macro Recall: 0.9536
  Macro F1-Score: 0.9639

Per-Class Metrics:

  POSITIVE:
    Precision: 0.9438
    Recall: 0.9767
    F1-Score: 0.9600

  NEUTRAL:
    Precision: 0.9858
    Recall: 0.9952
    F1-Score: 0.9905

  NEGATIVE:
    Precision: 1.0000
    Recall: 0.8889
    F1-Score: 0.9412

Confusion Matrix:
          positive  neutral  negative
positive        84        2         0
neutral          1      208         0
negative         4        1        40

Classification Report:
              precision    recall  f1-score   support

    positive     0.9438    0.9767    0.9600        86
     neutral     0.9858    0.9952    0.9905       209
    negative     1.0000    0.8889    0.9412        45

    accuracy                         0.9765       340
   macro avg     0.9765    0.9536    0.9639       340
weighted avg     0.9770    0.9765    0.9762       340



## Step 15: Save Fine-Tuned Model


In [60]:
# Save the fine-tuned model
final_model_dir = f"{output_dir}/final_model"
os.makedirs(final_model_dir, exist_ok=True)

trainer.save_model(final_model_dir)
tokenizer.save_pretrained(final_model_dir)

print(f"‚úì Fine-tuned model saved to: {final_model_dir}")
print(f"‚úì Model and tokenizer saved successfully!")


‚úì Fine-tuned model saved to: ./finbert_finetuned/final_model
‚úì Model and tokenizer saved successfully!


## Step 16: Save Training Results and Metrics


In [61]:
# Create results summary
results_summary = {
    'Training Parameters': {
        'Model': MODEL_NAME,
        'Epochs': EPOCHS,
        'Learning Rate': LEARNING_RATE,
        'Batch Size': BATCH_SIZE,
        'Weight Decay': WEIGHT_DECAY,
        'Warmup Steps': WARMUP_STEPS,
        'Max Length': MAX_LENGTH,
        'Training Time (minutes)': round(training_time / 60, 2),
        'Training Time (seconds)': round(training_time, 2),
        'Random Seed': seed
    },
    'Validation Metrics': {
        'Accuracy': round(val_results['eval_accuracy'], 4),
        'Precision': round(val_results['eval_precision'], 4),
        'Recall': round(val_results['eval_recall'], 4),
        'F1-Score': round(val_results['eval_f1'], 4)
    },
    'Test Metrics': {
        'Accuracy': round(test_results['eval_accuracy'], 4),
        'Precision': round(macro_precision, 4),
        'Recall': round(macro_recall, 4),
        'F1-Score': round(macro_f1, 4)
    },
    'Per-Class Test Metrics': {
        'Positive Precision': round(precision[0], 4),
        'Positive Recall': round(recall[0], 4),
        'Positive F1': round(f1[0], 4),
        'Neutral Precision': round(precision[1], 4),
        'Neutral Recall': round(recall[1], 4),
        'Neutral F1': round(f1[1], 4),
        'Negative Precision': round(precision[2], 4),
        'Negative Recall': round(recall[2], 4),
        'Negative F1': round(f1[2], 4)
    }
}

# Save to CSV
results_df = pd.DataFrame([
    {
        'metric': key,
        'value': value
    }
    for category, metrics in results_summary.items()
    for key, value in metrics.items()
])

results_df.to_csv('finbert_finetuning_results.csv', index=False)
print("‚úì Training results saved to: finbert_finetuning_results.csv")

# Save test predictions
test_predictions_df = pd.DataFrame({
    'sentence': X_test,
    'true_label': [reverse_label_map[label] for label in y_test],
    'predicted_label': [reverse_label_map[label] for label in y_pred],
    'true_label_numeric': y_test,
    'predicted_label_numeric': y_pred,
    'is_correct': (y_test == y_pred)
})

test_predictions_df.to_csv('finbert_finetuned_test_predictions.csv', index=False)
print("‚úì Test predictions saved to: finbert_finetuned_test_predictions.csv")

# Save confusion matrix
cm_df.to_csv('finbert_finetuned_confusion_matrix.csv')
print("‚úì Confusion matrix saved to: finbert_finetuned_confusion_matrix.csv")

print("\n" + "="*80)
print("TRAINING SUMMARY")
print("="*80)
for category, metrics in results_summary.items():
    print(f"\n{category}:")
    for key, value in metrics.items():
        print(f"  {key}: {value}")
print("="*80)


‚úì Training results saved to: finbert_finetuning_results.csv
‚úì Test predictions saved to: finbert_finetuned_test_predictions.csv
‚úì Confusion matrix saved to: finbert_finetuned_confusion_matrix.csv

TRAINING SUMMARY

Training Parameters:
  Model: ProsusAI/finbert
  Epochs: 5
  Learning Rate: 2e-05
  Batch Size: 16
  Weight Decay: 0.01
  Warmup Steps: 100
  Max Length: 128
  Training Time (minutes): 2.42
  Training Time (seconds): 145.04
  Random Seed: 42

Validation Metrics:
  Accuracy: 0.9824
  Precision: 0.9634
  Recall: 0.9881
  F1-Score: 0.9749

Test Metrics:
  Accuracy: 0.9765
  Precision: 0.9765
  Recall: 0.9536
  F1-Score: 0.9639

Per-Class Test Metrics:
  Positive Precision: 0.9438
  Positive Recall: 0.9767
  Positive F1: 0.96
  Neutral Precision: 0.9858
  Neutral Recall: 0.9952
  Neutral F1: 0.9905
  Negative Precision: 1.0
  Negative Recall: 0.8889
  Negative F1: 0.9412


## Step 17: Download Results (Google Colab)


In [62]:
# For Google Colab: Download results
try:
    from google.colab import files

    print("Downloading results files...")
    files.download('finbert_finetuning_results.csv')
    files.download('finbert_finetuned_test_predictions.csv')
    files.download('finbert_finetuned_confusion_matrix.csv')
    print("\n‚úì All result files downloaded!")
except:
    print("Not running in Google Colab. Files are saved locally.")
    print("Files saved:")
    print("  - finbert_finetuning_results.csv")
    print("  - finbert_finetuned_test_predictions.csv")
    print("  - finbert_finetuned_confusion_matrix.csv")


Downloading results files...


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


‚úì All result files downloaded!


## Step 18: Compare with Original FinBERT Performance


In [63]:
# Load original FinBERT results if available
try:
    original_results = pd.read_csv('comparison_metrics_summary.csv')
    finbert_original = original_results[original_results['Method'] == 'FinBERT']

    if not finbert_original.empty:
        original_accuracy = finbert_original['Accuracy'].values[0]
        fine_tuned_accuracy = test_results['eval_accuracy']

        improvement = fine_tuned_accuracy - original_accuracy

        print("="*80)
        print("PERFORMANCE COMPARISON: ORIGINAL vs FINE-TUNED")
        print("="*80)
        print(f"\nOriginal FinBERT Accuracy: {original_accuracy:.4f} ({original_accuracy*100:.2f}%)")
        print(f"Fine-Tuned FinBERT Accuracy: {fine_tuned_accuracy:.4f} ({fine_tuned_accuracy*100:.2f}%)")
        print(f"Improvement: {improvement:.4f} ({improvement*100:.2f} percentage points)")
        print("="*80)

        if fine_tuned_accuracy >= 0.90:
            print("\n‚úÖ SUCCESS: Fine-tuning achieved the ‚â•90% accuracy requirement!")
        else:
            print("\n‚ö†Ô∏è Fine-tuning improved performance but did not reach 90%.")
            print("   Consider:")
            print("   - Increasing number of epochs")
            print("   - Adjusting learning rate")
            print("   - Using different batch size")
    else:
        print("Original FinBERT results not found in comparison_metrics_summary.csv")
except FileNotFoundError:
    print("comparison_metrics_summary.csv not found. Skipping comparison.")
except Exception as e:
    print(f"Error loading comparison: {e}")


comparison_metrics_summary.csv not found. Skipping comparison.
