**Nota**: Se recomienda ejecutar este notebook en Google Colab para asegurar la compatibilidad y evitar problemas de dependencias. El entrenamiento de los modelos requiere una cantidad significativa de memoria RAM y potencia de cómputo, que puede no estar disponible en todos los entornos locales.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/antoniotrapote/chord-prediction-tfm/blob/main/anexos/notebooks/03_modelado/04_modelo_lstm.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-black?logo=github)](https://github.com/antoniotrapote/chord-prediction-tfm/blob/main/anexos/notebooks/03_modelado/04_modelo_lstm.ipynb)

# Long Short-Term Memory (LSTM) model - PyTorch

Hemos utilizado PyTorch para implementar y entrenar un modelo de red neuronal recurrente basado en Long Short-Term Memory (LSTM) para la predicción de acordes en secuencias musicales.

El último dataset utilizado fue `songdb_funcional_v4`

Contenido del notebook:
1. Entorno (Colab) - comprobación de versiones y semillas
2. Descarga del dataset desde GitHub
3. Carga y tokenización del dataset

## 1) Entorno (Colab)

In [1]:
#@title Semillas y determinismo
import random, os, numpy as np, torch
SEED = 42
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

In [2]:

#@title Comprobar GPU/versions
import sys, torch
print("Python:", sys.version)
print("PyTorch:", torch.__version__)
print("CUDA disponible:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
else:
    print("⚠️ Activa GPU: Runtime ▶ Change runtime type ▶ GPU")


Python: 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]
PyTorch: 2.8.0+cu126
CUDA disponible: True
GPU: Tesla T4


## 2) Traer el CSV desde GitHub
Descargamos directamente el dataset procesado desde el repositorio.

In [None]:
import urllib.request

# Configuración para descargar el dataset desde GitHub
USER = "antoniotrapote"
REPO = "chord-prediction-tfm"
BRANCH = "main"
PATH_IN_REPO = "anexos/data/songdb_funcional_v4.csv"
URL = f"https://raw.githubusercontent.com/{USER}/{REPO}/{BRANCH}/{PATH_IN_REPO}"

# Ruta local donde guardar el archivo
data_path = "/content/songdb_funcional_v4.csv"

# Descargar el archivo CSV desde GitHub
urllib.request.urlretrieve(URL, data_path)
print(f"Dataset descargado en: {data_path}")

Saving songdb_funcional_v4.csv to songdb_funcional_v4.csv
Subido: ['songdb_funcional_v4.csv']


## 3) Cargar CSV y tokenizar (whitespace)

In [None]:
import pandas as pd, ast, re

# Parámetros de filtrado
sequence_col = "funcional_prog"  # Columna que contiene las secuencias de acordes
min_seq_len = 8  # Ignorar secuencias muy cortas

# Cargar el dataset
df = pd.read_csv(data_path)
assert sequence_col in df.columns, f"Columna {sequence_col} no encontrada en el CSV."
print("Filas totales:", len(df))
display(df[[sequence_col]].head(3))

def parse_tokens_simple(s: str):
    """Convierte la secuencia de acordes en una lista de tokens"""
    if isinstance(s, str) and s.strip().startswith("[") and s.strip().endswith("]"):
        try:
            lst = ast.literal_eval(s)
            if isinstance(lst, list):
                return [str(t) for t in lst]
        except Exception:
            pass
    
    # Normaliza separadores de compás y saltos de línea a espacios
    s = str(s).replace("|", " ").replace("\n", " ")
    toks = [t for t in re.findall(r"\S+", s) if t.strip()]
    return toks

# Tokenizar y filtrar secuencias muy cortas
df["_tokens_"] = df[sequence_col].apply(parse_tokens_simple)
df = df[df["_tokens_"].apply(len) >= min_seq_len].reset_index(drop=True)
print(f"Filas tras filtro min_seq_len >= {min_seq_len}:", len(df))

Filas totales: 2613


Unnamed: 0,funcional_prog
0,vi #ivø V/III V/VI vi IV ii V7 iii vi ii V7 I ...
1,VII VII I vi ii V7 VII VII I vi ii V7 I IV #iv...
2,i VI V/V V7 i VI V/V V7 i VI iiø V7 i VI iiø V...


Filas tras filtro min_seq_len: 2612


## 4) Split train/val/test (simple, por filas)

In [None]:
from sklearn.model_selection import train_test_split

# Parámetros de división del dataset
val_size = 0.10     # 10% para validación
test_size = 0.10    # 10% para test
random_state = 42   # Semilla para reproducibilidad

# Dividir en train/val/test
train_df, tmp_df = train_test_split(df, test_size=val_size+test_size, random_state=random_state, shuffle=True)
rel_test = test_size / (val_size + test_size) if (val_size + test_size) > 0 else 0.5
val_df, test_df = train_test_split(tmp_df, test_size=rel_test, random_state=random_state, shuffle=True)

# Extraer las secuencias tokenizadas
train_seqs = train_df["_tokens_"].tolist()
val_seqs   = val_df["_tokens_"].tolist()
test_seqs  = test_df["_tokens_"].tolist()

print(f"Train: {len(train_seqs)}, Val: {len(val_seqs)}, Test: {len(test_seqs)}")

2089 261 262


## 5) Vocabulario y codificación

In [None]:
# Configuración del vocabulario
min_count = 4  # Frecuencia mínima para incluir un token en el vocabulario

# Construir vocabulario desde train_seqs
vocab = build_vocab_from_list(train_seqs, min_count=min_count)

# Añadir tokens especiales
vocab.add_token("[PAD]")  
vocab.add_token("[UNK]")  
vocab.add_token("[SOS]")  
vocab.add_token("[EOS]")  

# Configurar token por defecto
vocab.set_default_index(vocab["[UNK]"])

print(f"Tamaño del vocabulario: {len(vocab)}")

Vocab size: 86


## 6) Dataset (context→next) y DataLoaders

In [9]:
import torch
from torch.utils.data import Dataset, DataLoader
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class NextTokenDataset(Dataset):
    def __init__(self, sequences, seq_len):
        self.samples = []
        for seq in sequences:
            ids = encode(seq, add_bos=True)
            if len(ids) <= seq_len: continue
            for i in range(seq_len, len(ids)):
                self.samples.append((ids[i-seq_len:i], ids[i]))
    def __len__(self): return len(self.samples)
    def __getitem__(self, idx):
        x, y = self.samples[idx]
        return torch.tensor(x, dtype=torch.long), torch.tensor(y, dtype=torch.long)

train_data = NextTokenDataset(train_seqs, cfg.seq_len)
val_data   = NextTokenDataset(val_seqs,   cfg.seq_len)
test_data  = NextTokenDataset(test_seqs,  cfg.seq_len)

train_loader = DataLoader(train_data, batch_size=cfg.batch_size, shuffle=True, drop_last=True)
val_loader   = DataLoader(val_data,   batch_size=cfg.batch_size, shuffle=False)
test_loader  = DataLoader(test_data,  batch_size=cfg.batch_size, shuffle=False)

len(train_data), len(val_data), len(test_data)


(60979, 7044, 7669)

## 7) Modelo LSTM

In [None]:
# Configuración del modelo LSTM
vocab_size = len(vocab)     # Tamaño del vocabulario
embed_dim = 256             # Dimensión de los embeddings
hidden_dim = 512            # Dimensión de la capa oculta del LSTM
num_layers = 2              # Número de capas LSTM
dropout = 0.2               # Dropout para regularización

# Crear el modelo
model = LSTMModel(
    vocab_size=vocab_size,
    embed_dim=embed_dim,
    hidden_dim=hidden_dim,
    num_layers=num_layers,
    dropout=dropout
)

# Mover el modelo a GPU si está disponible
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Mostrar información del modelo
print(f"Modelo creado con {sum(p.numel() for p in model.parameters()):,} parámetros")
print(f"Dispositivo: {device}")
print(model)

## 8) Entrenamiento y métricas (Top@K, MRR, PPL)

In [11]:
import math, time, os, torch
import torch.nn.functional as F

def topk_metrics(logits, targets, ks=(1,3,5)):
    out = {}
    with torch.no_grad():
        for k in ks:
            topk = logits.topk(k, dim=-1).indices
            out[f"Top@{k}"] = (topk == targets.unsqueeze(1)).any(dim=1).float().mean().item()
        ranks = (logits.argsort(dim=-1, descending=True) == targets.unsqueeze(1)).nonzero(as_tuple=False)[:,1] + 1
        out["MRR"] = (1.0 / ranks.float()).mean().item()
    return out

def evaluate(model, loader, criterion):
    model.eval()
    total, n = 0.0, 0
    agg = {"Top@1":0.0,"Top@3":0.0,"Top@5":0.0,"MRR":0.0}
    with torch.no_grad():
        for x,y in loader:
            x,y = x.to(device), y.to(device)
            logits = model(x)
            loss = criterion(logits, y)
            b = x.size(0); total += loss.item()*b; n += b
            m = topk_metrics(logits, y)
            for k in agg: agg[k] += m[k]*b
    for k in agg: agg[k] /= max(1,n)
    return {"loss": total/max(1,n), "ppl": math.exp(total/max(1,n)), **agg}

def train_model(model, train_loader, val_loader, epochs, lr, weight_decay, grad_clip=1.0, amp=True, save_dir=".", save_name="best.pt"):
    os.makedirs(save_dir, exist_ok=True)
    scaler = torch.cuda.amp.GradScaler(enabled=(amp and device.type=='cuda'))
    opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
    crit = torch.nn.CrossEntropyLoss()
    best_mrr, best_path = -1.0, os.path.join(save_dir, save_name)
    for ep in range(1, epochs+1):
        model.train(); t0 = time.time()
        for i,(x,y) in enumerate(train_loader,1):
            x,y = x.to(device), y.to(device)
            opt.zero_grad(set_to_none=True)
            with torch.cuda.amp.autocast(enabled=(amp and device.type=='cuda')):
                logits = model(x); loss = crit(logits,y)
            scaler.scale(loss).backward()
            if grad_clip is not None:
                scaler.unscale_(opt)
                torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
            scaler.step(opt); scaler.update()
            if i % 100 == 0: print(f"Ep{ep} step {i}/{len(train_loader)} loss {loss.item():.4f}")
        valm = evaluate(model, val_loader, crit)
        print(f"Epoch {ep} | val loss {valm['loss']:.4f} ppl {valm['ppl']:.2f} Top@1 {valm['Top@1']:.3f} Top@3 {valm['Top@3']:.3f} Top@5 {valm['Top@5']:.3f} MRR {valm['MRR']:.3f}")
        if valm["MRR"] > best_mrr:
            best_mrr = valm["MRR"]
            torch.save({"model_state": model.state_dict(), "config": dict(vars(cfg)), "stoi": stoi, "itos": itos}, best_path)
            print("🔥 Guardado best ->", best_path, "| MRR:", best_mrr)
    return best_mrr, best_path


## 9) Entrenar LSTM

In [None]:
# Configuración del entrenamiento
seq_len = 24               # Longitud de las secuencias de entrada
batch_size = 128           # Tamaño del batch
epochs = 6                 # Número de épocas
lr = 2e-3                  # Learning rate
weight_decay = 5e-4        # Weight decay para regularización

# Tokens especiales
pad_idx = vocab["[PAD]"]
sos_idx = vocab["[SOS]"]
eos_idx = vocab["[EOS]"]

# Crear los dataloaders
train_loader = create_dataloader(
    train_seqs, vocab, seq_len=seq_len, batch_size=batch_size, 
    pad_idx=pad_idx, sos_idx=sos_idx, eos_idx=eos_idx, shuffle=True
)

val_loader = create_dataloader(
    val_seqs, vocab, seq_len=seq_len, batch_size=batch_size, 
    pad_idx=pad_idx, sos_idx=sos_idx, eos_idx=eos_idx, shuffle=False
)

# Configurar optimizador y función de pérdida
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
criterion = torch.nn.CrossEntropyLoss(ignore_index=pad_idx)

print(f"Configuración del entrenamiento:")
print(f"  - Secuencias de entrada: {seq_len}")
print(f"  - Batch size: {batch_size}")
print(f"  - Épocas: {epochs}")
print(f"  - Learning rate: {lr}")
print(f"  - Weight decay: {weight_decay}")
print(f"  - Train batches: {len(train_loader)}")
print(f"  - Val batches: {len(val_loader)}")

  scaler = torch.cuda.amp.GradScaler(enabled=(amp and device.type=='cuda'))
  with torch.cuda.amp.autocast(enabled=(amp and device.type=='cuda')):


Ep1 step 100/476 loss 2.2384
Ep1 step 200/476 loss 2.3146
Ep1 step 300/476 loss 2.0664
Ep1 step 400/476 loss 2.2425
Epoch 1 | val loss 2.1566 ppl 8.64 Top@1 0.437 Top@3 0.680 Top@5 0.767 MRR 0.584
🔥 Guardado best -> /content/models_lstm_v1/lstm_best.pt | MRR: 0.5841175761837502
Ep2 step 100/476 loss 2.2561
Ep2 step 200/476 loss 1.9753
Ep2 step 300/476 loss 2.1835
Ep2 step 400/476 loss 1.9502
Epoch 2 | val loss 2.0980 ppl 8.15 Top@1 0.453 Top@3 0.691 Top@5 0.774 MRR 0.597
🔥 Guardado best -> /content/models_lstm_v1/lstm_best.pt | MRR: 0.5970214201197713
Ep3 step 100/476 loss 2.3348
Ep3 step 200/476 loss 1.9495
Ep3 step 300/476 loss 1.7481
Ep3 step 400/476 loss 1.7912
Epoch 3 | val loss 2.0810 ppl 8.01 Top@1 0.460 Top@3 0.695 Top@5 0.776 MRR 0.602
🔥 Guardado best -> /content/models_lstm_v1/lstm_best.pt | MRR: 0.6020486079651411
Ep4 step 100/476 loss 1.8587
Ep4 step 200/476 loss 1.7421
Ep4 step 300/476 loss 1.8021
Ep4 step 400/476 loss 2.1285
Epoch 4 | val loss 2.0956 ppl 8.13 Top@1 0.452 

## 10) Evaluación en Test

In [None]:
# Configuración de la evaluación
top_k = 10  # Top-k para las métricas

# Evaluar el modelo entrenado
print("Evaluando modelo en conjunto de test...")

test_loader = create_dataloader(
    test_seqs, vocab, seq_len=seq_len, batch_size=batch_size, 
    pad_idx=pad_idx, sos_idx=sos_idx, eos_idx=eos_idx, shuffle=False
)

# Calcular métricas
test_loss, test_acc, test_top5, test_mrr, test_perplexity = evaluate_model(
    model, test_loader, criterion, device, top_k=top_k
)

print(f"\nResultados en Test:")
print(f"  Loss: {test_loss:.4f}")
print(f"  Top-1 Accuracy: {test_acc:.4f}")
print(f"  Top-5 Accuracy: {test_top5:.4f}")
print(f"  MRR: {test_mrr:.4f}")
print(f"  Perplexity: {test_perplexity:.4f}")

Test: {'loss': 2.169651968660403, 'ppl': 8.755236413779173, 'Top@1': 0.4305646108755591, 'Top@3': 0.6828791240480612, 'Top@5': 0.7668535665390716, 'MRR': 0.580885472159635}


## 11) predict_next(context, k=5)

In [None]:
# Configuración de guardado
save_dir = "/content/models_lstm_v3"        # Directorio para guardar los modelos
model_filename = "lstm_v3_final.pt"        # Nombre del archivo del modelo
tokenizer_filename = "lstm_tokenizer.json" # Nombre del archivo del tokenizer

# Crear directorio si no existe
import os
os.makedirs(save_dir, exist_ok=True)

# Guardar el modelo entrenado
model_path = os.path.join(save_dir, model_filename)
torch.save({
    'model_state_dict': model.state_dict(),
    'vocab_size': vocab_size,
    'embed_dim': embed_dim,
    'hidden_dim': hidden_dim,
    'num_layers': num_layers,
    'dropout': dropout,
    'seq_len': seq_len,
    'test_acc': test_acc,
    'test_mrr': test_mrr,
    'test_perplexity': test_perplexity
}, model_path)

# Guardar el tokenizer
tokenizer_path = os.path.join(save_dir, tokenizer_filename)
with open(tokenizer_path, 'w') as f:
    import json
    # Convertir vocabulario a diccionario serializable
    vocab_dict = {token: idx for token, idx in vocab.get_stoi().items()}
    json.dump({
        'vocab': vocab_dict,
        'default_index': vocab.get_default_index()
    }, f, indent=2)

print(f"Modelo guardado en: {model_path}")
print(f"Tokenizer guardado en: {tokenizer_path}")
print(f"Métricas del modelo:")
print(f"  - Top-1 Accuracy: {test_acc:.4f}")
print(f"  - MRR: {test_mrr:.4f}")
print(f"  - Perplexity: {test_perplexity:.4f}")

Context: ['III', 'III', '#IV', '#IV', 'bII', 'bII', 'natIII', 'natIII', 'III', '#IV', 'IV', 'VI', 'V', 'bVII', 'natVI', 'I', 'II', 'VII', 'VI', 'IV', 'III', 'I', 'VII', 'natVI']
Pred: [('II', 0.24083514511585236), ('VI', 0.11550020426511765), ('VII', 0.09397098422050476), ('natIII', 0.05708475410938263), ('natVI', 0.05385180190205574)]


## 12) Inferencia incremental (quick test)

In [None]:
# Configuración para ejemplos de predicción
num_examples = 3      # Número de ejemplos a mostrar
max_predictions = 5   # Máximo número de predicciones por ejemplo

# Función para mostrar ejemplos de predicción
def show_prediction_examples(model, test_loader, vocab, num_examples=3, max_predictions=5):
    model.eval()
    device = next(model.parameters()).device
    
    # Obtener algunos ejemplos del test loader
    examples_shown = 0
    
    with torch.no_grad():
        for batch_inputs, batch_targets in test_loader:
            if examples_shown >= num_examples:
                break
                
            batch_inputs = batch_inputs.to(device)
            batch_targets = batch_targets.to(device)
            
            # Obtener predicciones
            outputs = model(batch_inputs)
            
            # Procesar cada ejemplo en el batch
            for i in range(min(batch_inputs.size(0), num_examples - examples_shown)):
                input_seq = batch_inputs[i]
                target_seq = batch_targets[i]
                pred_logits = outputs[i]
                
                # Convertir secuencias a tokens
                input_tokens = [vocab.get_itos()[idx.item()] for idx in input_seq if idx.item() != vocab["[PAD]"]]
                target_tokens = [vocab.get_itos()[idx.item()] for idx in target_seq if idx.item() != vocab["[PAD]"]]
                
                print(f"\n--- Ejemplo {examples_shown + 1} ---")
                print(f"Entrada: {' → '.join(input_tokens)}")
                print(f"Target:  {' → '.join(target_tokens)}")
                
                # Mostrar las top-k predicciones para cada posición
                for pos in range(min(len(target_tokens), max_predictions)):
                    if pos < pred_logits.size(0):
                        probs = torch.softmax(pred_logits[pos], dim=0)
                        top_k_probs, top_k_indices = torch.topk(probs, k=min(5, len(vocab)))
                        
                        pred_tokens = [vocab.get_itos()[idx.item()] for idx in top_k_indices]
                        pred_probs = [prob.item() for prob in top_k_probs]
                        
                        print(f"Pos {pos+1}: {target_tokens[pos] if pos < len(target_tokens) else '[PAD]'}")
                        for j, (token, prob) in enumerate(zip(pred_tokens, pred_probs)):
                            marker = "✓" if token == target_tokens[pos] else " "
                            print(f"  {marker} {j+1}. {token}: {prob:.3f}")
                
                examples_shown += 1
                if examples_shown >= num_examples:
                    break

# Mostrar ejemplos de predicción
print("Ejemplos de predicción del modelo LSTM:")
show_prediction_examples(model, test_loader, vocab, num_examples, max_predictions)

In [16]:
sp = StatefulPredictor(model, stoi, itos, device)
sp.suggest(["ii","V"], k=5)

[('V', 0.19276435673236847),
 ('V7', 0.1380607634782791),
 ('v', 0.06758282333612442),
 ('ii', 0.06204349175095558),
 ('bVII', 0.049799397587776184)]

In [17]:
sp.suggest(["i","vi", 'ii'], k=5)

[('V7', 0.27057090401649475),
 ('i', 0.12024112790822983),
 ('vi', 0.11982262134552002),
 ('ii', 0.10807666927576065),
 ('viiø', 0.05500981584191322)]

In [18]:
sp.suggest(["i","bvi"], k=5)

[('VI', 0.09826728701591492),
 ('bII7', 0.09121581166982651),
 ('i', 0.08353113383054733),
 ('bii', 0.07816524803638458),
 ('viø', 0.06509044766426086)]

### Roadmap
- Ajuste de hiperparámetros (Ranadom Search)
- Re-ranking suave para evitar repes y favorecer cadencias.
