## Next Character Prediction - Hyperparameter Optimization with Optuna

Para encontrar la mejor configuración de hiperparámetros que maximize la precisión de la predicción de caracteres, se ha llevado a cabo una optimización Bayesiana utilizando Optuna.

En vez de utilizar Search Grid u otros métodos de búsqueda, se ha optado por utilizar el sampler de Optuna, que es el TPE sampler.



In [1]:
"""
RNN Next Character Prediction - Hyperparameter Optimization with Optuna
========================================================================
This script optimizes a Vanilla RNN for next character prediction using 
Bayesian optimization (Optuna) and then runs 20 independent experiments.

Fixed Constraints:
- Train/Val/Test split: 30%/10%/60%
- Sequence length: 20
- Vanilla RNN architecture
- One-hot encoding (via embedding)

Optimized Hyperparameters:
- Hidden units, epochs, learning rate, optimizer, activation, dropout, 
  clipping, weight init, patience, batch size
"""

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import optuna
from optuna.trial import TrialState
import copy
import json
from pathlib import Path
import gc
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}\n")

# Limpiar caché GPU
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    gc.collect()
    print("GPU cache limpiado")
torch.cuda.memory_summary()


# ============================================================================
# 1. DATA PREPARATION (FIXED CONSTRAINTS)
# ============================================================================

# Load text from file
file_path = "./timemachine.txt"
print(f"Loading text from {file_path}...")
with open(file_path, 'r', encoding='utf-8') as f:
    raw_text = f.read()
full_text = raw_text.lower()

# Text Preprocessing: ASCII, no capitals, no punctuation, 26 letters + space + unk
allowed_chars_set = set('abcdefghijklmnopqrstuvwxyz ')
unk_token = '<unk>'

cleaned_full_text = []
for char in full_text:
    if char in allowed_chars_set:
        cleaned_full_text.append(char)
    else:
        cleaned_full_text.append(unk_token)
full_text = "".join(cleaned_full_text)

# FIXED: Split text into train, val, test (30%, 10%, 60%)
total_length = len(full_text)
train_split_idx = int(0.3 * total_length)
val_split_idx = int(0.1 * total_length) + train_split_idx

train_text = full_text[:train_split_idx]
val_text = full_text[train_split_idx:val_split_idx]
test_text = full_text[val_split_idx:]

print(f"Total text length: {total_length}")
print(f"Train text length: {len(train_text)}")
print(f"Validation text length: {len(val_text)}")
print(f"Test text length: {len(test_text)}")

# Character mapping
chars = sorted(list(set(full_text)))
vocab_size = len(chars)
char2idx = {ch: i for i, ch in enumerate(chars)}
idx2char = {i: ch for i, ch in enumerate(chars)}

print(f"Vocabulary size: {vocab_size}")
print(f"Characters: {chars}\n")

# FIXED: Hyperparameters that cannot be modified
seq_length = 20
future_steps = 1

# Dataset preparation
def create_sequences(text, seq_length, future_steps):
    X, Y = [], []
    for i in range(len(text) - seq_length - future_steps):
        seq = text[i:i+seq_length]
        target = text[i+seq_length:i+seq_length+future_steps]
        X.append([char2idx[ch] for ch in seq])
        Y.append([char2idx[ch] for ch in target])
    return np.array(X), np.array(Y)

print("Creating sequences...")
X_train_seq, Y_train_seq = create_sequences(train_text, seq_length, future_steps)
X_val_seq, Y_val_seq = create_sequences(val_text, seq_length, future_steps)
X_test_seq, Y_test_seq = create_sequences(test_text, seq_length, future_steps)

X_train = torch.tensor(X_train_seq, dtype=torch.long).to(device)
Y_train = torch.tensor(Y_train_seq, dtype=torch.long).to(device)
X_val = torch.tensor(X_val_seq, dtype=torch.long).to(device)
Y_val = torch.tensor(Y_val_seq, dtype=torch.long).to(device)
X_test_final = torch.tensor(X_test_seq, dtype=torch.long).to(device)
Y_test_final = torch.tensor(Y_test_seq, dtype=torch.long).to(device)

print(f"Training sequences: {X_train.shape}")
print(f"Validation sequences: {X_val.shape}")
print(f"Test sequences: {X_test_final.shape}\n")

# ============================================================================
# 2. VANILLA RNN MODEL (WITH OPTIMIZABLE COMPONENTS)
# ============================================================================

class VanillaRNN(nn.Module):
    def __init__(self, vocab_size, hidden_size, future_steps, activation='relu', 
                 dropout_rate=0.0, weight_init='xavier'):
        super(VanillaRNN, self).__init__()
        self.hidden_size = hidden_size
        self.future_steps = future_steps
        self.activation = activation
        self.weight_init = weight_init
        
        # FIXED: Embedding simulating one-hot encoding
        self.embedding = nn.Embedding(vocab_size, vocab_size)
        
        # RNN with configurable activation
        self.rnn = nn.RNN(vocab_size, hidden_size, batch_first=True, 
                         nonlinearity=activation)
        
        # Dropout layer
        self.dropout = nn.Dropout(dropout_rate)
        
        # Output layer
        self.fc = nn.Linear(hidden_size, future_steps * vocab_size)
        
        # Initialize weights
        self.init_weights()

    def init_weights(self):
        """Initialize weights based on selected method"""
        for name, param in self.rnn.named_parameters():
            if 'weight' in name:
                if self.weight_init == 'xavier':
                    nn.init.xavier_uniform_(param)
                elif self.weight_init == 'kaiming':
                    nn.init.kaiming_uniform_(param, nonlinearity=self.activation)
                elif self.weight_init == 'orthogonal':
                    nn.init.orthogonal_(param)
                elif self.weight_init == 'normal':
                    nn.init.normal_(param, mean=0, std=0.01)
            elif 'bias' in name:
                nn.init.zeros_(param)
        
        # Initialize output layer
        if self.weight_init == 'xavier':
            nn.init.xavier_uniform_(self.fc.weight)
        elif self.weight_init == 'kaiming':
            nn.init.kaiming_uniform_(self.fc.weight, nonlinearity='linear')
        elif self.weight_init == 'orthogonal':
            nn.init.orthogonal_(self.fc.weight)
        elif self.weight_init == 'normal':
            nn.init.normal_(self.fc.weight, mean=0, std=0.01)
        nn.init.zeros_(self.fc.bias)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.rnn(x)
        out = out[:, -1, :]  # last time step
        out = self.dropout(out)
        out = self.fc(out)
        out = out.view(-1, self.future_steps, vocab_size)
        return out

# ============================================================================
# 3. TRAINING FUNCTION
# ============================================================================

def train_model(hidden_size, num_epochs, learning_rate, optimizer_name, 
                activation, dropout_rate, clip_value, weight_init, patience,
                batch_size, verbose=False):
    """
    Train the RNN model with given hyperparameters.
    Returns validation accuracy and test accuracy.
    """
    # Create model
    model = VanillaRNN(vocab_size, hidden_size, future_steps, 
                       activation=activation, dropout_rate=dropout_rate,
                       weight_init=weight_init).to(device)
    
    criterion = nn.CrossEntropyLoss()
    
    # Select optimizer
    if optimizer_name == 'adam':
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    elif optimizer_name == 'adamw':
        optimizer = optim.AdamW(model.parameters(), lr=learning_rate)
    elif optimizer_name == 'sgd':
        optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
    elif optimizer_name == 'rmsprop':
        optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)
    
    # Training variables
    best_val_loss = float('inf')
    best_val_acc = 0.0
    patience_counter = 0
    best_model_state = None
    train_losses = []
    val_losses = []
    val_accuracies = []
    
    # Create data loader for batching
    train_dataset = torch.utils.data.TensorDataset(X_train, Y_train)
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        epoch_train_loss = 0.0
        
        for batch_X, batch_Y in train_loader:
            optimizer.zero_grad()
            output = model(batch_X)
            loss = sum(criterion(output[:, t, :], batch_Y[:, t]) for t in range(future_steps))
            loss.backward()
            
            # Gradient clipping
            if clip_value > 0:
                torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)
            
            optimizer.step()
            epoch_train_loss += loss.item()
        
        avg_train_loss = epoch_train_loss / len(train_loader)
        train_losses.append(avg_train_loss)
        
        # Validation
        model.eval()
        with torch.no_grad():
            val_output = model(X_val)
            val_loss = sum(criterion(val_output[:, t, :], Y_val[:, t]) for t in range(future_steps))
            
            # Calculate validation accuracy
            preds = val_output.argmax(dim=2)
            val_acc = (preds == Y_val).float().mean().item()
        
        val_losses.append(val_loss.item())
        val_accuracies.append(val_acc)
        
        if verbose and (epoch + 1) % 10 == 0:
            print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {avg_train_loss:.4f}, "
                  f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc*100:.2f}%")
        
        # Early stopping based on validation loss
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_val_acc = val_acc
            patience_counter = 0
            best_model_state = copy.deepcopy(model.state_dict())
        else:
            patience_counter += 1
            if patience_counter >= patience:
                if verbose:
                    print(f"Early stopping at epoch {epoch+1}")
                break
    
    # Load best model and evaluate on test set
    if best_model_state is not None:
        model.load_state_dict(best_model_state)
    
     # === EVALUACIÓN EN TEST CON DATALOADER ===
    test_dataset = torch.utils.data.TensorDataset(X_test_final, Y_test_final)
    test_loader = torch.utils.data.DataLoader(
        test_dataset,
        batch_size=128,      # si quieres ir aún más seguro, usa 64
        shuffle=False
    )

    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_X, batch_Y in test_loader:
            output = model(batch_X)
            preds = output.argmax(dim=2)
            correct += (preds == batch_Y).float().sum().item()
            total += batch_Y.numel()

    test_acc = correct / total
    
    return best_val_acc, test_acc, train_losses, val_losses, val_accuracies

# ============================================================================
# 4. OPTUNA HYPERPARAMETER OPTIMIZATION
# ============================================================================

def objective(trial):
    """
    Optuna objective function to maximize validation accuracy.
    """
    # Suggest hyperparameters
    hidden_size = trial.suggest_categorical('hidden_size', [256, 288])
    learning_rate = trial.suggest_float('learning_rate', 0.0018, 0.0030, log=True)
    optimizer_name = trial.suggest_categorical('optimizer', ['adam', 'adamw', 'rmsprop'])
    activation = trial.suggest_categorical('activation', ['tanh', 'relu'])
    dropout_rate = trial.suggest_float('dropout_rate', 0.08, 0.4)
    clip_value = trial.suggest_float('clip_value', 0.8, 6)
    weight_init = trial.suggest_categorical('weight_init', ['xavier', 'kaiming', 'orthogonal', 'normal'])
    patience = trial.suggest_int('patience', 5, 20)
    batch_size = trial.suggest_categorical('batch_size', [64, 128])
    num_epochs = trial.suggest_int('num_epochs', 80, 160)
    
    # Train model
    val_acc, test_acc, _, _, _ = train_model(
        hidden_size=hidden_size,
        num_epochs=num_epochs,
        learning_rate=learning_rate,
        optimizer_name=optimizer_name,
        activation=activation,
        dropout_rate=dropout_rate,
        clip_value=clip_value,
        weight_init=weight_init,
        patience=patience,
        batch_size=batch_size,
        verbose=False
    )
    
    # Report intermediate value for pruning
    trial.report(val_acc, step=0)
    
    # Check if trial should be pruned
    if trial.should_prune():
        raise optuna.TrialPruned()
    
    return val_acc  # Maximize validation accuracy

# Run Optuna optimization
print("="*80)
print("STARTING OPTUNA HYPERPARAMETER OPTIMIZATION")
print("="*80)
print("This may take a while depending on n_trials...\n")

study = optuna.create_study(
    direction='maximize',
    sampler=optuna.samplers.TPESampler(seed=42),
    pruner=optuna.pruners.MedianPruner(n_startup_trials=5, n_warmup_steps=10)
)


# Run optimization (adjust n_trials based on computational resources)
n_trials = 100
print(f"Running {n_trials} optimization trials...")
study.optimize(objective, n_trials=n_trials, timeout=None, show_progress_bar=True)

print("\n" + "="*80)
print("OPTIMIZATION COMPLETE")
print("="*80)
print(f"Number of finished trials: {len(study.trials)}")
print(f"Best trial: {study.best_trial.number}")
print(f"Best validation accuracy: {study.best_value*100:.4f}%")
print("\nBest hyperparameters:")
for key, value in study.best_params.items():
    print(f"  {key}: {value}")




Using device: cuda

GPU cache limpiado
Loading text from ./timemachine.txt...
Total text length: 214451
Train text length: 64335
Validation text length: 21445
Test text length: 128671
Vocabulary size: 29
Characters: [' ', '<', '>', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

Creating sequences...


[I 2025-12-07 20:12:22,047] A new study created in memory with name: no-name-9a13da96-183a-45a5-97cf-675765ce6128


Training sequences: torch.Size([64314, 20])
Validation sequences: torch.Size([21424, 20])
Test sequences: torch.Size([128650, 20])

STARTING OPTUNA HYPERPARAMETER OPTIMIZATION
This may take a while depending on n_trials...

Running 100 optimization trials...


  0%|          | 0/100 [00:00<?, ?it/s]

[I 2025-12-07 20:13:16,307] Trial 0 finished with value: 0.5719753503799438 and parameters: {'hidden_size': 288, 'learning_rate': 0.00261616087122516, 'optimizer': 'adam', 'activation': 'relu', 'dropout_rate': 0.27235680375782684, 'clip_value': 4.481977404539436, 'weight_init': 'kaiming', 'patience': 7, 'batch_size': 128, 'num_epochs': 122}. Best is trial 0 with value: 0.5719753503799438.
[I 2025-12-07 20:13:36,743] Trial 1 finished with value: 0.5656273365020752 and parameters: {'hidden_size': 256, 'learning_rate': 0.0024604316486968986, 'optimizer': 'rmsprop', 'activation': 'relu', 'dropout_rate': 0.1438956102906751, 'clip_value': 3.4740190797507804, 'weight_init': 'orthogonal', 'patience': 6, 'batch_size': 128, 'num_epochs': 145}. Best is trial 0 with value: 0.5719753503799438.
[I 2025-12-07 20:14:06,654] Trial 2 finished with value: 0.5603995323181152 and parameters: {'hidden_size': 256, 'learning_rate': 0.0025531054135347884, 'optimizer': 'rmsprop', 'activation': 'relu', 'dropout_

## Entrenamiento Final
Una vez que Optuna devuelve la mejor configuración de hiperparámetros para nuestro problema, se realiza un nuevo entrenamiento añadiendo un scheduler de la tasa de aprendizaje. Este mecanismo hace que el learning rate vaya disminuyendo progresivamente a lo largo de las épocas, lo que suaviza las actualizaciones de los gradientes en las fases finales del entrenamiento y favorece una convergencia más estable y, en muchos casos, una ligera mejora de la accuracy. En este caso, mejora ligeramente el accuracy.

In [2]:
def train_model_scheduler(hidden_size, num_epochs, learning_rate, optimizer_name, 
                activation, dropout_rate, clip_value, weight_init, patience,
                batch_size, use_scheduler=True, verbose=False):
    """
    Train the RNN model with given hyperparameters.
    Returns validation accuracy and test accuracy.
    """
    # Create model
    model = VanillaRNN(vocab_size, hidden_size, future_steps, 
                       activation=activation, dropout_rate=dropout_rate,
                       weight_init=weight_init).to(device)
    
    criterion = nn.CrossEntropyLoss()
    
    # Select optimizer
    if optimizer_name == 'adam':
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    elif optimizer_name == 'adamw':
        optimizer = optim.AdamW(model.parameters(), lr=learning_rate)
    elif optimizer_name == 'sgd':
        optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
    elif optimizer_name == 'rmsprop':
        optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)
    
    # ===== AÑADIDO: SCHEDULER =====
    scheduler = None
    if use_scheduler:
        scheduler = optim.lr_scheduler.CosineAnnealingLR(
            optimizer, T_max=num_epochs, eta_min=learning_rate * 0.01
        )
    
    # Training variables
    best_val_loss = float('inf')
    best_val_acc = 0.0
    patience_counter = 0
    best_model_state = None
    train_losses = []
    val_losses = []
    val_accuracies = []
    
    # Create data loader for batching
    train_dataset = torch.utils.data.TensorDataset(X_train, Y_train)
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        epoch_train_loss = 0.0
        
        for batch_X, batch_Y in train_loader:
            optimizer.zero_grad()
            output = model(batch_X)
            loss = sum(criterion(output[:, t, :], batch_Y[:, t]) for t in range(future_steps))
            loss.backward()
            
            # Gradient clipping
            if clip_value > 0:
                torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)
            
            optimizer.step()
            epoch_train_loss += loss.item()
        
        avg_train_loss = epoch_train_loss / len(train_loader)
        train_losses.append(avg_train_loss)
        
        # ===== AÑADIDO: STEP SCHEDULER =====
        if scheduler is not None:
            scheduler.step()
        
        # Validation
        model.eval()
        with torch.no_grad():
            val_output = model(X_val)
            val_loss = sum(criterion(val_output[:, t, :], Y_val[:, t]) for t in range(future_steps))
            
            # Calculate validation accuracy
            preds = val_output.argmax(dim=2)
            val_acc = (preds == Y_val).float().mean().item()
        
        val_losses.append(val_loss.item())
        val_accuracies.append(val_acc)
        
        if verbose and (epoch + 1) % 10 == 0:
            current_lr = optimizer.param_groups[0]['lr']
            print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {avg_train_loss:.4f}, "
                  f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc*100:.2f}%, LR: {current_lr:.6f}")
        
        # Early stopping based on validation loss
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_val_acc = val_acc
            patience_counter = 0
            best_model_state = copy.deepcopy(model.state_dict())
        else:
            patience_counter += 1
            if patience_counter >= patience:
                if verbose:
                    print(f"Early stopping at epoch {epoch+1}")
                break
    
    # Load best model and evaluate on test set
    if best_model_state is not None:
        model.load_state_dict(best_model_state)
    
    # === EVALUACIÓN EN TEST CON DATALOADER ===
    test_dataset = torch.utils.data.TensorDataset(X_test_final, Y_test_final)
    test_loader = torch.utils.data.DataLoader(
        test_dataset,
        batch_size=128,
        shuffle=False
    )

    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_X, batch_Y in test_loader:
            output = model(batch_X)
            preds = output.argmax(dim=2)
            correct += (preds == batch_Y).float().sum().item()
            total += batch_Y.numel()

    test_acc = correct / total
    
    return best_val_acc, test_acc, train_losses, val_losses, val_accuracies


## Evaluación final con las 20 runs independientes
Agrego 20 épocas más y 5 al patience, para maximizar la efectividad del scheduler.

In [11]:
# ============================================================================
# 5. FINAL EVALUATION WITH 20 INDEPENDENT RUNS
# ============================================================================

# Extract best hyperparameters
best_params = study.best_params

print("\n" + "="*80)
print("RUNNING 20 INDEPENDENT EXPERIMENTS WITH BEST HYPERPARAMETERS")
print("="*80)

# Storage for results
final_test_accuracies = []
final_val_accuracies = []
all_train_losses = []
all_val_losses = []
all_val_accuracies = []

num_runs = 20

for run in range(num_runs):
    print(f"\nRun {run+1}/{num_runs}...")
    
    # Set different random seed for each run
    torch.manual_seed(run)
    np.random.seed(run)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(run)
    
    val_acc, test_acc, train_losses, val_losses, val_accuracies = train_model_scheduler(
        hidden_size=best_params['hidden_size'],
        num_epochs=best_params['num_epochs'] + 20,
        learning_rate=best_params['learning_rate'],
        optimizer_name=best_params['optimizer'],
        activation=best_params['activation'],
        dropout_rate=best_params['dropout_rate'],
        clip_value=best_params['clip_value'],
        weight_init=best_params['weight_init'],
        patience=best_params['patience'] + 5,
        batch_size=best_params['batch_size'],
        verbose=False
    )
    
    final_val_accuracies.append(val_acc)
    final_test_accuracies.append(test_acc)
    all_train_losses.append(train_losses)
    all_val_losses.append(val_losses)
    all_val_accuracies.append(val_accuracies)
    
    print(f"  Val Acc: {val_acc*100:.2f}%, Test Acc: {test_acc*100:.2f}%")

# Calculate statistics
mean_test_acc = np.mean(final_test_accuracies)
std_test_acc = np.std(final_test_accuracies)
mean_val_acc = np.mean(final_val_accuracies)
std_val_acc = np.std(final_val_accuracies)


print("\n" + "="*80)
print("FINAL RESULTS (20 independent runs)")
print("="*80)
print(f"Validation Accuracy: {mean_val_acc*100:.4f}% ± {std_val_acc*100:.4f}%")
print(f"Test Accuracy: {mean_test_acc*100:.4f}% ± {std_test_acc*100:.4f}%")
print(f"\nBest Test Accuracy: {max(final_test_accuracies)*100:.4f}%")
print(f"Worst Test Accuracy: {min(final_test_accuracies)*100:.4f}%")
print(f"Median Test Accuracy: {np.median(final_test_accuracies)*100:.4f}%")



RUNNING 20 INDEPENDENT EXPERIMENTS WITH BEST HYPERPARAMETERS

Run 1/20...
  Val Acc: 57.50%, Test Acc: 58.02%

Run 2/20...
  Val Acc: 57.87%, Test Acc: 58.03%

Run 3/20...
  Val Acc: 57.88%, Test Acc: 58.28%

Run 4/20...
  Val Acc: 57.48%, Test Acc: 57.88%

Run 5/20...
  Val Acc: 57.48%, Test Acc: 57.84%

Run 6/20...
  Val Acc: 57.44%, Test Acc: 57.66%

Run 7/20...
  Val Acc: 57.28%, Test Acc: 57.55%

Run 8/20...
  Val Acc: 56.58%, Test Acc: 56.87%

Run 9/20...
  Val Acc: 57.30%, Test Acc: 57.70%

Run 10/20...
  Val Acc: 57.09%, Test Acc: 57.47%

Run 11/20...
  Val Acc: 57.09%, Test Acc: 57.68%

Run 12/20...
  Val Acc: 57.16%, Test Acc: 57.47%

Run 13/20...
  Val Acc: 57.72%, Test Acc: 58.19%

Run 14/20...
  Val Acc: 57.49%, Test Acc: 57.83%

Run 15/20...
  Val Acc: 57.20%, Test Acc: 57.63%

Run 16/20...
  Val Acc: 57.91%, Test Acc: 58.17%

Run 17/20...
  Val Acc: 57.82%, Test Acc: 57.91%

Run 18/20...
  Val Acc: 57.03%, Test Acc: 57.48%

Run 19/20...
  Val Acc: 57.53%, Test Acc: 57.6

In [13]:

# ============================================================================
# 6. VISUALIZATION AND REPORT GENERATION
# ============================================================================

# ============================================================================
# QUICK FIX: Redefinir variables problemáticas
# ============================================================================

# Forzar conversión a escalares
mean_test_acc = float(np.mean(final_test_accuracies))
std_test_acc = float(np.std(final_test_accuracies))
mean_val_acc = float(np.mean(final_val_accuracies))
std_val_acc = float(np.std(final_val_accuracies))

print("✅ Variables corregidas!")
print(f"mean_test_acc type: {type(mean_test_acc)}")
print(f"mean_val_acc type: {type(mean_val_acc)}")
print(f"\nTest Accuracy: {mean_test_acc*100:.4f}% ± {std_test_acc*100:.4f}%")
print(f"Val Accuracy: {mean_val_acc*100:.4f}% ± {std_val_acc*100:.4f}%")


# Plot learning curves with standard deviation
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Find maximum length for padding
max_len_train = max(len(losses) for losses in all_train_losses)
max_len_val = max(len(losses) for losses in all_val_losses)

# Pad sequences with last value
def pad_sequence(seq, max_len):
    if len(seq) < max_len:
        return seq + [seq[-1]] * (max_len - len(seq))
    return seq

padded_train_losses = [pad_sequence(losses, max_len_train) for losses in all_train_losses]
padded_val_losses = [pad_sequence(losses, max_len_val) for losses in all_val_losses]
padded_val_accuracies = [pad_sequence(accs, max_len_val) for accs in all_val_accuracies]

# Calculate mean and std
mean_train_loss = np.mean(padded_train_losses, axis=0)
std_train_loss = np.std(padded_train_losses, axis=0)
mean_val_loss = np.mean(padded_val_losses, axis=0)
std_val_loss = np.std(padded_val_losses, axis=0)
mean_val_acc_curve = np.mean(padded_val_accuracies, axis=0)
std_val_acc_curve = np.std(padded_val_accuracies, axis=0)

epochs_train = range(1, max_len_train + 1)
epochs_val = range(1, max_len_val + 1)

# Plot Training Loss
axes[0].plot(epochs_train, mean_train_loss, 'b-', linewidth=2, label='Mean')
axes[0].fill_between(epochs_train, 
                     mean_train_loss - std_train_loss, 
                     mean_train_loss + std_train_loss,
                     alpha=0.3, color='blue')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Training Loss')
axes[0].set_title('Training Loss (Mean ± Std)')
axes[0].grid(True, alpha=0.3)
axes[0].legend()

# Plot Validation Loss
axes[1].plot(epochs_val, mean_val_loss, 'r-', linewidth=2, label='Mean')
axes[1].fill_between(epochs_val, 
                     mean_val_loss - std_val_loss, 
                     mean_val_loss + std_val_loss,
                     alpha=0.3, color='red')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Validation Loss')
axes[1].set_title('Validation Loss (Mean ± Std)')
axes[1].grid(True, alpha=0.3)
axes[1].legend()

# Plot Validation Accuracy
axes[2].plot(epochs_val, [acc*100 for acc in mean_val_acc_curve], 'g-', linewidth=2, label='Mean')
axes[2].fill_between(epochs_val, 
                     [(acc-std)*100 for acc, std in zip(mean_val_acc_curve, std_val_acc_curve)], 
                     [(acc+std)*100 for acc, std in zip(mean_val_acc_curve, std_val_acc_curve)],
                     alpha=0.3, color='green')
axes[2].set_xlabel('Epoch')
axes[2].set_ylabel('Validation Accuracy (%)')
axes[2].set_title('Validation Accuracy (Mean ± Std)')
axes[2].grid(True, alpha=0.3)
axes[2].legend()

plt.tight_layout()
plt.savefig('learning_curves_final.png', dpi=300, bbox_inches='tight')
print("\nLearning curves saved to 'learning_curves_final.png'")
plt.close()

# Plot distribution of test accuracies
plt.figure(figsize=(10, 6))
plt.hist(np.array(final_test_accuracies)*100, bins=15, edgecolor='black', alpha=0.7)
plt.axvline(mean_test_acc*100, color='red', linestyle='--', linewidth=2, 
            label=f'Mean: {mean_test_acc*100:.2f}%')
plt.xlabel('Test Accuracy (%)')
plt.ylabel('Frequency')
plt.title('Distribution of Test Accuracies (20 runs)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('test_accuracy_distribution.png', dpi=300, bbox_inches='tight')
print("Test accuracy distribution saved to 'test_accuracy_distribution.png'")
plt.close()

# Create comprehensive summary report
summary_report = f"""
{'='*80}
FINAL EXPERIMENT SUMMARY REPORT
{'='*80}

DATASET INFORMATION:
- Total text length: {total_length}
- Train/Val/Test split: 30%/10%/60% (FIXED)
- Sequence length: {seq_length} (FIXED)
- Vocabulary size: {vocab_size}
- Future steps (prediction): {future_steps}

BEST HYPERPARAMETERS (from Optuna optimization with {n_trials} trials):
"""

for key, value in best_params.items():
    summary_report += f"- {key}: {value}\n"

summary_report += f"""
FINAL RESULTS (20 independent runs with different initializations):
- Mean Test Accuracy: {mean_test_acc*100:.4f}% ± {std_test_acc*100:.4f}%
- Mean Validation Accuracy: {mean_val_acc*100:.4f}% ± {std_val_acc*100:.4f}%
- Best Test Accuracy: {max(final_test_accuracies)*100:.4f}%
- Worst Test Accuracy: {min(final_test_accuracies)*100:.4f}%
- Median Test Accuracy: {np.median(final_test_accuracies)*100:.4f}%

INDIVIDUAL RUN RESULTS:
"""

for i, (val_acc, test_acc) in enumerate(zip(final_val_accuracies, final_test_accuracies), 1):
    summary_report += f"Run {i:2d}: Val Acc = {val_acc*100:.2f}%, Test Acc = {test_acc*100:.2f}%\n"

summary_report += f"""
{'='*80}
END OF REPORT
{'='*80}
"""

print(summary_report)

# Save report to file
with open('final_experiment_report.txt', 'w') as f:
    f.write(summary_report)
print("\nReport saved to 'final_experiment_report.txt'")

# Save best hyperparameters to file
with open('best_hyperparameters.json', 'w') as f:
    json.dump(best_params, f, indent=4)
print("Best hyperparameters saved to 'best_hyperparameters.json'")

# Save all results data
results_data = {
    'best_hyperparameters': best_params,
    'test_accuracies': final_test_accuracies,
    'val_accuracies': final_val_accuracies,
    'statistics': {
        'mean_test_acc': mean_test_acc,
        'std_test_acc': std_test_acc,
        'mean_val_acc': mean_val_acc,
        'std_val_acc': std_val_acc,
        'best_test_acc': max(final_test_accuracies),
        'worst_test_acc': min(final_test_accuracies),
        'median_test_acc': float(np.median(final_test_accuracies))
    }
}

with open('all_results.json', 'w') as f:
    json.dump(results_data, f, indent=4)
print("All results saved to 'all_results.json'")

print("\n" + "="*80)
print("EXPERIMENT COMPLETE!")
print("="*80)


✅ Variables corregidas!
mean_test_acc type: <class 'float'>
mean_val_acc type: <class 'float'>

Test Accuracy: 57.7365% ± 0.3230%
Val Accuracy: 57.4027% ± 0.3323%

Learning curves saved to 'learning_curves_final.png'
Test accuracy distribution saved to 'test_accuracy_distribution.png'

FINAL EXPERIMENT SUMMARY REPORT

DATASET INFORMATION:
- Total text length: 214451
- Train/Val/Test split: 30%/10%/60% (FIXED)
- Sequence length: 20 (FIXED)
- Vocabulary size: 29
- Future steps (prediction): 1

BEST HYPERPARAMETERS (from Optuna optimization with 100 trials):
- hidden_size: 288
- learning_rate: 0.0018800118847900327
- optimizer: adamw
- activation: relu
- dropout_rate: 0.3998466527993291
- clip_value: 1.1518650819354024
- weight_init: kaiming
- patience: 10
- batch_size: 64
- num_epochs: 106

FINAL RESULTS (20 independent runs with different initializations):
- Mean Test Accuracy: 57.7365% ± 0.3230%
- Mean Validation Accuracy: 57.4027% ± 0.3323%
- Best Test Accuracy: 58.2822%
- Worst Test 