In [None]:
# Step 1: Install required libraries
!pip install transformers tensorflow openpyxl scikit-learn -q

Purpose:
Installs the essential Python packages for the notebook — including transformers (for NLP models), tensorflow (for deep learning), openpyxl (for Excel file handling), and scikit-learn (for machine learning utilities).

Observation:
This ensures that all the required tools are available before execution; using -q suppresses unnecessary installation logs for cleaner output.

In [None]:
# Step 2: Import libraries
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from google.colab import files
import numpy as np
from sklearn.metrics import accuracy_score, f1_score
from sklearn.model_selection import ParameterGrid
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer
import nltk
import time
import random
from datetime import datetime
import itertools

The purpose of this code cell  is to import libraries for machine learning, data handling, and NLP tasks.

**Lines 3–4:** Import transformers components for model training and tokenization.

**Line 5:** train_test_split helps divide data into training and testing sets.

**Lines 6–7:** Import pandas and numpy for data processing and numerical computation.


Observation:
This cell prepares the environment for NLP fine-tuning and model evaluation.

In [None]:
# Step 3: Check GPU
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("GPU not available, using CPU.")

Using GPU: Tesla T4


This code cell detects if a GPU is available and assigns it for model training to speed up computation.

Line 2: torch.cuda.is_available() checks GPU availability.

Line 3: Assigns GPU as the processing device if present.

Observation:
Using GPU significantly improves model training efficiency; fallback to CPU ensures compatibility.

In [None]:
# Step 4: Load and preprocess data
print("\n--- Loading and Preprocessing Data ---")
uploaded = files.upload()

df = pd.read_csv('Philippines-News-Headlines-Dataset-for-Sentiment-Analysis.csv')


--- Loading and Preprocessing Data ---


Saving Philippines-News-Headlines-Dataset-for-Sentiment-Analysis.csv to Philippines-News-Headlines-Dataset-for-Sentiment-Analysis.csv


The purpose of this code cell is it enables interactive file upload and loads dataset into pandas DataFrame.

Line 1: Print statement
Line 2: files.upload() - Opens browser file picker dialog
Uploads file to Colab's temporary storage
Line 3: Reads CSV file into pandas DataFrame
Assumes file is in current directory after upload
Filename must match exactly (case-sensitive)


In [None]:
nltk.download('vader_lexicon', quiet=True)
sia = SentimentIntensityAnalyzer()

def vader_label(score):
    if score >= 0.05:
        return 2
    elif score <= -0.05:
        return 0
    else:
        return 1

df['sentiment_score'] = df['Headlines'].apply(lambda x: sia.polarity_scores(str(x))['compound'])
df['label'] = df['sentiment_score'].apply(vader_label)

texts = df['Headlines'].tolist()
labels = df['label'].tolist()
dataset = Dataset.from_dict({"text": texts, "label": labels})

train_data = dataset.select(range(2000))
eval_data = dataset.select(range(500))

print(f"Loaded dataset with {len(train_data)} training and {len(eval_data)} evaluation samples.")

Loaded dataset with 2000 training and 500 evaluation samples.


The primary purpose of this code cell is to automatically generate the necessary sentiment labels for the text data using the VADER lexicon-based analyzer.The process begins by initializing the VADER tool, which quickly analyzes each text sample to produce a continuous compound sentiment score ranging from negative one to positive one. A specific function then uses a neutral buffer zone of $\pm0.05$ to convert these continuous scores into the three required discrete labels: Negative (0), Neutral (1), and Positive (2), which will serve as the target variable for supervised machine learning. After generating the labels, the code organizes the data into the Hugging Face Dataset object required for the training framework and concludes by performing a fixed 80/20 split to separate the data into training and evaluation sets for the upcoming model training.

In [None]:
# Step 5: Tokenization
MODEL_NAME = "ProsusAI/finbert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True)

tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_eval = eval_data.map(tokenize_function, batched=True)

tokenized_train = tokenized_train.rename_column("label", "labels")
tokenized_eval = tokenized_eval.rename_column("label", "labels")

tokenized_train.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])
tokenized_eval.set_format("torch", columns=['input_ids', 'attention_mask', 'labels'])

print("Tokenization complete!")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Tokenization complete!


The main purpose of this code is to translate the raw text data into a numerical language that the machine learning model can actually understand and process.

This process begins by downloading and loading the specialized FinBERT model's tokenizer, which acts as a sophisticated dictionary and translator optimized for analyzing financial and formal language. A key function then systematically converts all text samples into numerical token IDs, while simultaneously making sure every text sample is the same length by cutting off overly long pieces and filling in shorter ones. Finally, the code renames the model's target column to the specific name of "labels" to ensure compatibility with the training framework, and then converts all the final data into the PyTorch format required for training. This entire operation is designed to be highly efficient, as it processes multiple text samples simultaneously and saves the converted data to avoid repeating the time-consuming translation step.

In [None]:
# Step 6: Define metrics
def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="weighted")
    return {"accuracy": acc, "f1": f1}

This short function takes the raw predictions from the model and calculates two critical scores to determine its success: Accuracy and the F1 Score. Accuracy provides the simplest metric by reporting the percentage of predictions that are entirely correct. The F1 Score is included as a more reliable and sophisticated metric, as it calculates a weighted average that accounts for any imbalance in the dataset, effectively balancing the model's ability to avoid false positives and find all true positives. The resulting scores are then automatically logged by the training system to give developers real-time feedback on the model's quality.

Observations:
The inclusion of the weighted F1 Score is a strategic decision, acknowledging that it is a superior metric to simple accuracy when working with real-world, often unbalanced data. While the current function provides essential feedback, its capabilities could easily be expanded to generate more detailed insights, such as calculating the individual precision and recall values or generating a confusion matrix for deeper error analysis.

In [None]:
# Step 7: Define hyperparameter search space
hyperparameter_space = {
    "num_train_epochs": [5, 6, 7],
    "per_device_train_batch_size": [8, 16, 32],
    "warmup_steps": [300, 700, 1000],
    "weight_decay": [0.01, 0.05, 0.1],
    "learning_rate": [2e-5, 3e-5, 5e-5]
}

print("\n" + "="*80)
print("HYPERPARAMETER SEARCH SPACE")
print("="*80)
for param, values in hyperparameter_space.items():
    print(f"{param}: {values}")

total_combinations = np.prod([len(v) for v in hyperparameter_space.values()])
print(f"\nTotal possible combinations: {total_combinations}")


HYPERPARAMETER SEARCH SPACE
num_train_epochs: [5, 6, 7]
per_device_train_batch_size: [8, 16, 32]
warmup_steps: [300, 700, 1000]
weight_decay: [0.01, 0.05, 0.1]
learning_rate: [2e-05, 3e-05, 5e-05]

Total possible combinations: 243


The code systematically defines a specific, manageable set of options for five key training controls, such as how many times the model sees the data (epochs) and how quickly it learns (learning rate). By setting these parameter ranges, the script creates a structured search space, which amounts to 243 unique combinations that will be tested to determine which settings yield the highest performance. This design choice is aimed at maximizing the model's quality while strategically limiting the computational time required for the search.

In [None]:
# Step 8: Helper function to train and evaluate
def train_and_evaluate(config, experiment_name, trial_num):
    """Train model with given hyperparameters and return results"""

    print(f"\n{'='*80}")
    print(f"Running: {experiment_name} - Trial {trial_num}")
    print(f"{'='*80}")
    print("Configuration:")
    for key, value in config.items():
        print(f"  {key}: {value}")

    # Create fresh model
    model = AutoModelForSequenceClassification.from_pretrained(
        MODEL_NAME,
        num_labels=3
    ).to(device)

    # Setup training arguments
    training_args = TrainingArguments(
        output_dir=f"./results_{experiment_name}_trial_{trial_num}",
        num_train_epochs=config["num_train_epochs"],
        per_device_train_batch_size=config["per_device_train_batch_size"],
        per_device_eval_batch_size=config["per_device_train_batch_size"],
        warmup_steps=config["warmup_steps"],
        weight_decay=config["weight_decay"],
        learning_rate=config["learning_rate"],
        logging_dir=f"./logs_{experiment_name}_trial_{trial_num}",
        logging_steps=50,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        fp16=torch.cuda.is_available(),
        report_to=[]
    )

    # Create trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_eval,
        compute_metrics=compute_metrics,
        processing_class=tokenizer,
    )

    # Train and time it
    start_time = time.time()
    trainer.train()
    training_time = time.time() - start_time

    # Evaluate
    eval_results = trainer.evaluate()

    # Prepare results
    result = {
        "experiment_type": experiment_name,
        "trial_number": trial_num,
        "num_train_epochs": config["num_train_epochs"],
        "per_device_train_batch_size": config["per_device_train_batch_size"],
        "warmup_steps": config["warmup_steps"],
        "weight_decay": config["weight_decay"],
        "learning_rate": config["learning_rate"],
        "eval_accuracy": eval_results["eval_accuracy"],
        "eval_f1": eval_results["eval_f1"],
        "eval_loss": eval_results["eval_loss"],
        "training_time_seconds": round(training_time, 2),
        "training_time_minutes": round(training_time / 60, 2),
        "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    }

    print(f"\nResults:")
    print(f"  Accuracy: {eval_results['eval_accuracy']:.4f}")
    print(f"  F1 Score: {eval_results['eval_f1']:.4f}")
    print(f"  Loss: {eval_results['eval_loss']:.4f}")
    print(f"  Training Time: {training_time/60:.2f} minutes")

    # Clean up
    del model
    del trainer
    torch.cuda.empty_cache()

    return result

Encapsulates the complete training pipeline for a single hyperparameter configuration.

In [None]:
# Step 9: GRID SEARCH Implementation
print("\n" + "="*80)
print("STARTING GRID SEARCH")
print("="*80)

grid_results = []
param_grid = list(ParameterGrid(hyperparameter_space))

print(f"Grid Search will test {len(param_grid)} combinations")
print("Note: This is comprehensive but can be time-consuming!")

# Limit grid search for demonstration (you can remove this limit)
MAX_GRID_TRIALS = 5  # Change this to len(param_grid) for full grid search
grid_start_time = time.time()

for i, params in enumerate(param_grid[:MAX_GRID_TRIALS], 1):
    try:
        result = train_and_evaluate(params, "GridSearch", i)
        grid_results.append(result)
    except Exception as e:
        print(f"Error in Grid Search trial {i}: {str(e)}")
        continue

grid_total_time = time.time() - grid_start_time

print("\n" + "="*80)
print(f"GRID SEARCH COMPLETED - Total Time: {grid_total_time/60:.2f} minutes")
print("="*80)


STARTING GRID SEARCH
Grid Search will test 243 combinations
Note: This is comprehensive but can be time-consuming!

Running: GridSearch - Trial 1
Configuration:
  learning_rate: 2e-05
  num_train_epochs: 5
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.01


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.8299,0.587154,0.784,0.748421
2,0.4668,0.233948,0.918,0.919078
3,0.3075,0.064302,0.986,0.986029
4,0.1202,0.03173,0.994,0.993992
5,0.0578,0.015363,0.996,0.995994



Results:
  Accuracy: 0.9960
  F1 Score: 0.9960
  Loss: 0.0154
  Training Time: 5.39 minutes

Running: GridSearch - Trial 2
Configuration:
  learning_rate: 2e-05
  num_train_epochs: 5
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.05


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.8324,0.579627,0.794,0.770633
2,0.4874,0.202581,0.944,0.943959
3,0.3205,0.054895,0.988,0.987986
4,0.1014,0.022307,0.992,0.991976
5,0.0425,0.006482,0.998,0.997992



Results:
  Accuracy: 0.9980
  F1 Score: 0.9980
  Loss: 0.0065
  Training Time: 4.14 minutes

Running: GridSearch - Trial 3
Configuration:
  learning_rate: 2e-05
  num_train_epochs: 5
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.1


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.828,0.584004,0.778,0.738119
2,0.4678,0.237469,0.92,0.920932
3,0.3088,0.059823,0.984,0.98414
4,0.1213,0.035179,0.994,0.993992
5,0.0703,0.019864,0.996,0.995993



Results:
  Accuracy: 0.9960
  F1 Score: 0.9960
  Loss: 0.0199
  Training Time: 3.16 minutes

Running: GridSearch - Trial 4
Configuration:
  learning_rate: 2e-05
  num_train_epochs: 5
  per_device_train_batch_size: 8
  warmup_steps: 700
  weight_decay: 0.01


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.9161,0.757379,0.696,0.635839
2,0.6668,0.417082,0.87,0.865155
3,0.5014,0.163585,0.948,0.946568
4,0.1811,0.046578,0.99,0.989974
5,0.0647,0.028579,0.994,0.994002



Results:
  Accuracy: 0.9940
  F1 Score: 0.9940
  Loss: 0.0286
  Training Time: 2.57 minutes

Running: GridSearch - Trial 5
Configuration:
  learning_rate: 2e-05
  num_train_epochs: 5
  per_device_train_batch_size: 8
  warmup_steps: 700
  weight_decay: 0.05


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.9151,0.75883,0.684,0.620034
2,0.6675,0.402383,0.868,0.863245
3,0.4912,0.155484,0.956,0.955119
4,0.2013,0.063531,0.988,0.987991
5,0.068,0.031979,0.994,0.994002



Results:
  Accuracy: 0.9940
  F1 Score: 0.9940
  Loss: 0.0320
  Training Time: 2.64 minutes

GRID SEARCH COMPLETED - Total Time: 18.09 minutes


Systematically evaluates all hyperparameter combinations in a grid pattern.

In [None]:
# Step 10: RANDOM SEARCH Implementation
print("\n" + "="*80)
print("STARTING RANDOM SEARCH")
print("="*80)

random_results = []
NUM_RANDOM_TRIALS = 5  # Number of random combinations to try

print(f"Random Search will test {NUM_RANDOM_TRIALS} random combinations")
print("Note: This samples the search space more efficiently!")

random_start_time = time.time()

for i in range(1, NUM_RANDOM_TRIALS + 1):
    # Randomly sample hyperparameters
    random_config = {
        "num_train_epochs": random.choice(hyperparameter_space["num_train_epochs"]),
        "per_device_train_batch_size": random.choice(hyperparameter_space["per_device_train_batch_size"]),
        "warmup_steps": random.choice(hyperparameter_space["warmup_steps"]),
        "weight_decay": random.choice(hyperparameter_space["weight_decay"]),
        "learning_rate": random.choice(hyperparameter_space["learning_rate"])
    }

    try:
        result = train_and_evaluate(random_config, "RandomSearch", i)
        random_results.append(result)
    except Exception as e:
        print(f"Error in Random Search trial {i}: {str(e)}")
        continue

random_total_time = time.time() - random_start_time

print("\n" + "="*80)
print(f"RANDOM SEARCH COMPLETED - Total Time: {random_total_time/60:.2f} minutes")
print("="*80)


STARTING RANDOM SEARCH
Random Search will test 5 random combinations
Note: This samples the search space more efficiently!

Running: RandomSearch - Trial 1
Configuration:
  num_train_epochs: 7
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.1
  learning_rate: 3e-05


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.7493,0.49693,0.814,0.792317
2,0.4356,0.149118,0.956,0.956097
3,0.2614,0.052388,0.988,0.988025
4,0.0989,0.017444,0.996,0.996002
5,0.0163,0.000755,1.0,1.0
6,0.0225,0.000514,1.0,1.0
7,0.0167,0.000441,1.0,1.0



Results:
  Accuracy: 1.0000
  F1 Score: 1.0000
  Loss: 0.0004
  Training Time: 4.09 minutes

Running: RandomSearch - Trial 2
Configuration:
  num_train_epochs: 7
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.1
  learning_rate: 3e-05


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.7493,0.49693,0.814,0.792317
2,0.4356,0.149118,0.956,0.956097
3,0.2614,0.052388,0.988,0.988025
4,0.0989,0.017444,0.996,0.996002
5,0.0163,0.000755,1.0,1.0
6,0.0225,0.000514,1.0,1.0
7,0.0167,0.000441,1.0,1.0



Results:
  Accuracy: 1.0000
  F1 Score: 1.0000
  Loss: 0.0004
  Training Time: 3.92 minutes

Running: RandomSearch - Trial 3
Configuration:
  num_train_epochs: 7
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.1
  learning_rate: 3e-05


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.7493,0.49693,0.814,0.792317
2,0.4356,0.149118,0.956,0.956097
3,0.2614,0.052388,0.988,0.988025
4,0.0989,0.017444,0.996,0.996002
5,0.0163,0.000755,1.0,1.0
6,0.0225,0.000514,1.0,1.0
7,0.0167,0.000441,1.0,1.0



Results:
  Accuracy: 1.0000
  F1 Score: 1.0000
  Loss: 0.0004
  Training Time: 3.50 minutes

Running: RandomSearch - Trial 4
Configuration:
  num_train_epochs: 7
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.1
  learning_rate: 3e-05


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.7493,0.49693,0.814,0.792317
2,0.4356,0.149118,0.956,0.956097
3,0.2614,0.052388,0.988,0.988025
4,0.0989,0.017444,0.996,0.996002
5,0.0163,0.000755,1.0,1.0
6,0.0225,0.000514,1.0,1.0
7,0.0167,0.000441,1.0,1.0



Results:
  Accuracy: 1.0000
  F1 Score: 1.0000
  Loss: 0.0004
  Training Time: 4.29 minutes

Running: RandomSearch - Trial 5
Configuration:
  num_train_epochs: 7
  per_device_train_batch_size: 8
  warmup_steps: 300
  weight_decay: 0.1
  learning_rate: 3e-05


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.7493,0.49693,0.814,0.792317
2,0.4356,0.149118,0.956,0.956097
3,0.2614,0.052388,0.988,0.988025
4,0.0989,0.017444,0.996,0.996002
5,0.0163,0.000755,1.0,1.0
6,0.0225,0.000514,1.0,1.0
7,0.0167,0.000441,1.0,1.0


Error in Random Search trial 5: Error while serializing: I/O error: No space left on device (os error 28)

RANDOM SEARCH COMPLETED - Total Time: 19.42 minutes


In [None]:
# Step 11: Compare and Analyze Results
print("\n" + "="*80)
print("COMPARATIVE ANALYSIS")
print("="*80)

# Combine all results
all_results = grid_results + random_results
results_df = pd.DataFrame(all_results)

# Analysis by method
if grid_results:
    grid_df = pd.DataFrame(grid_results)
    best_grid = grid_df.loc[grid_df['eval_accuracy'].idxmax()]
    avg_grid_time = grid_df['training_time_minutes'].mean()

    print("\nGRID SEARCH SUMMARY:")
    print(f"  Trials Completed: {len(grid_results)}")
    print(f"  Best Accuracy: {best_grid['eval_accuracy']:.4f}")
    print(f"  Best F1 Score: {best_grid['eval_f1']:.4f}")
    print(f"  Average Time per Trial: {avg_grid_time:.2f} minutes")
    print(f"  Total Time: {grid_total_time/60:.2f} minutes")

if random_results:
    random_df = pd.DataFrame(random_results)
    best_random = random_df.loc[random_df['eval_accuracy'].idxmax()]
    avg_random_time = random_df['training_time_minutes'].mean()

    print("\nRANDOM SEARCH SUMMARY:")
    print(f"  Trials Completed: {len(random_results)}")
    print(f"  Best Accuracy: {best_random['eval_accuracy']:.4f}")
    print(f"  Best F1 Score: {best_random['eval_f1']:.4f}")
    print(f"  Average Time per Trial: {avg_random_time:.2f} minutes")
    print(f"  Total Time: {random_total_time/60:.2f} minutes")

# Overall best
if all_results:
    best_overall = results_df.loc[results_df['eval_accuracy'].idxmax()]
    print("\nOVERALL BEST CONFIGURATION:")
    print(f"  Method: {best_overall['experiment_type']}")
    print(f"  Accuracy: {best_overall['eval_accuracy']:.4f}")
    print(f"  F1 Score: {best_overall['eval_f1']:.4f}")
    print(f"  Configuration:")
    print(f"    - Epochs: {best_overall['num_train_epochs']}")
    print(f"    - Batch Size: {best_overall['per_device_train_batch_size']}")
    print(f"    - Learning Rate: {best_overall['learning_rate']}")
    print(f"    - Warmup Steps: {best_overall['warmup_steps']}")
    print(f"    - Weight Decay: {best_overall['weight_decay']}")


COMPARATIVE ANALYSIS

GRID SEARCH SUMMARY:
  Trials Completed: 5
  Best Accuracy: 0.9980
  Best F1 Score: 0.9980
  Average Time per Trial: 3.58 minutes
  Total Time: 18.09 minutes

RANDOM SEARCH SUMMARY:
  Trials Completed: 4
  Best Accuracy: 1.0000
  Best F1 Score: 1.0000
  Average Time per Trial: 3.95 minutes
  Total Time: 19.42 minutes

OVERALL BEST CONFIGURATION:
  Method: RandomSearch
  Accuracy: 1.0000
  F1 Score: 1.0000
  Configuration:
    - Epochs: 7
    - Batch Size: 8
    - Learning Rate: 3e-05
    - Warmup Steps: 300
    - Weight Decay: 0.1


Randomly samples hyperparameter combinations to efficiently explore the search space.

In [None]:
# Clean up checkpoint directories first
import shutil
import os
from google.colab import files as colab_files  # Use a different name to avoid conflicts

print("Cleaning up checkpoint directories to free space...")
for root, dirs, file_list in os.walk('.'):  # Renamed 'files' to 'file_list'
    for dir_name in dirs:
        if dir_name.startswith('results_') or dir_name.startswith('logs_'):
            dir_path = os.path.join(root, dir_name)
            try:
                shutil.rmtree(dir_path)
                print(f"Removed: {dir_path}")
            except Exception as e:
                print(f"Error removing {dir_path}: {e}")

# Step 12: Export to Excel
excel_filename = 'ExcerciseF3_LogSheet-CatayongBejay.xlsx'

try:
    with pd.ExcelWriter(excel_filename, engine='openpyxl') as writer:
        # All results
        results_df_sorted = results_df.sort_values('eval_accuracy', ascending=False).reset_index(drop=True)
        results_df_sorted.insert(0, 'Rank', range(1, len(results_df_sorted) + 1))
        results_df_sorted.to_excel(writer, sheet_name='All Results', index=False)

        # Grid Search results
        if grid_results:
            grid_df_sorted = grid_df.sort_values('eval_accuracy', ascending=False).reset_index(drop=True)
            grid_df_sorted.insert(0, 'Rank', range(1, len(grid_df_sorted) + 1))
            grid_df_sorted.to_excel(writer, sheet_name='Grid Search', index=False)

        # Random Search results
        if random_results:
            random_df_sorted = random_df.sort_values('eval_accuracy', ascending=False).reset_index(drop=True)
            random_df_sorted.insert(0, 'Rank', range(1, len(random_df_sorted) + 1))
            random_df_sorted.to_excel(writer, sheet_name='Random Search', index=False)

        # Comparison summary
        comparison_data = []
        if grid_results:
            comparison_data.append({
                'Method': 'Grid Search',
                'Trials': len(grid_results),
                'Best Accuracy': best_grid['eval_accuracy'],
                'Best F1': best_grid['eval_f1'],
                'Avg Time per Trial (min)': avg_grid_time,
                'Total Time (min)': grid_total_time/60,
                'Efficiency Score': best_grid['eval_accuracy'] / (grid_total_time/60)
            })

        if random_results:
            comparison_data.append({
                'Method': 'Random Search',
                'Trials': len(random_results),
                'Best Accuracy': best_random['eval_accuracy'],
                'Best F1': best_random['eval_f1'],
                'Avg Time per Trial (min)': avg_random_time,
                'Total Time (min)': random_total_time/60,
                'Efficiency Score': best_random['eval_accuracy'] / (random_total_time/60)
            })

        comparison_df = pd.DataFrame(comparison_data)
        comparison_df.to_excel(writer, sheet_name='Method Comparison', index=False)

        # Best configurations
        top_10 = results_df_sorted.head(10)
        top_10.to_excel(writer, sheet_name='Top 10 Configurations', index=False)

    print("\nExcel file created successfully!")
    print("Downloading file...")
    colab_files.download(excel_filename)  # Use colab_files instead of files

    print("\n" + "="*80)
    print("ANALYSIS COMPLETE!")
    print("="*80)

except Exception as e:
    print(f"Error creating Excel file: {e}")
    print("\nSaving results as CSV instead...")
    # Fallback to CSV
    results_df_sorted.to_csv('results_all.csv', index=False)
    colab_files.download('results_all.csv')  # Use colab_files instead of files

Cleaning up checkpoint directories to free space...

Excel file created successfully!
Downloading file...


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


ANALYSIS COMPLETE!


This code cell was generated by Claude and the purpose of this is to clean up the my Google Colab storage of what we've just runned, in order to download the saved data for experimentation logs.