# Trasncripción funcional de acordes con método original

Este cuaderno genera el **dataset 'songdb_functional_v3.csv'** (números romanos + tonalidad mayor/menor) y entrena un **baseline** de autocompletado de acordes usando un **modelo trigram** con *stupid backoff*, reportando métricas básicas (Top-1/Top-3) y ejemplos de sugerencias.

> **Cómo usar este notebook**  
> 1. Ejecuta las celdas en orden, de arriba a abajo.  
> 2. Asegúrate de tener el fichero base `songdb.csv` en la ruta indicada (por defecto: `data/songdb.csv`).  
> 3. Al final se guardan los resultados procesados en `data/`.


## 1) Importaciones y utilidades

In [17]:

import re, os, json, math, zipfile, random
from collections import defaultdict
from typing import List, Tuple, Optional

import numpy as np
import pandas as pd

pd.set_option("display.max_colwidth", 120)
print("Versions -> pandas:", pd.__version__, "| numpy:", np.__version__)

Versions -> pandas: 2.3.1 | numpy: 2.3.2


## 2) Carga del dataset y vista previa

In [None]:
df = pd.read_csv('../../data/songdb.csv') # dataset base con progresiones
print("Filas:", len(df))
df.head(5)

Filas: 2613


Unnamed: 0,title,composedby,key,timesig,bars,chordprog
0,Lullaby of Birdland,George Shearing,Ab,4 4,32,Fm7 Dm7b5 | G7b9 C7b9 | Fm7 DbM7 | Bbm7 Eb7 |\nCm7 Fm7 | Bbm7 Eb7b9 | AbM7 Db7 | Gm7b5 C7 |\nFm7 Dm7b5 | G7b9 C7b9 |...
1,It's A Most Unusual Day,Jimmy McHugh and HYarold Adamson,G,3 4,72,F#/G F#/G G | Em7 | Am7 | D7 |\nF#/G F#/G G | Em7 | Am7 | D7 |\nG/B | CM7 | C#o7 | G/D |\nBm7 | Em7 | A7 | D7 |\nF#/...
2,Jump Monk,Charles Mingus,Ab,4 4,54,Fm7 DbM7 | G7b5 C7 | Fm7 DbM7 | G7b5 C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | Fm7/Eb | Db7 ...
3,Nuages,Django Reinhardt and Jacques Larme,G,4 4,32,Bbm7 Eb7 | Am7b5 D7b9 | G6 Am7 | Bm7 |\nBbm7 Eb7 | Am7b5 D7b9 | G6 | G6 |\nF#m7b5 | B7 | Em7 | Em7 |\nA7 Ab7 | A7 | ...
4,Love Me Do,John Lennon and Paul McCartney,G,4 4,48,G | C | G | C |\nG | C | C | C |\nC | G | C | G |\nC | G | C | G |\nC | G | C | C |\nC | C | G | C |\nG | G | D | D ...


## 3) Parser de acordes y normalización a clases base

In [19]:
PITCHES_SHARP = ["C","C#","D","D#","E","F","F#","G","G#","A","A#","B"]
PITCHES_FLAT  = ["C","Db","D","Eb","E","F","Gb","G","Ab","A","Bb","B"]
PITCH_TO_PC = {p:i for i,p in enumerate(PITCHES_SHARP)}
PITCH_TO_PC.update({p:i for i,p in enumerate(PITCHES_FLAT)}) 
# PC = pitch class. Números asignados a cada nota {C:0, C#/Db:1, D:2, ...}

ENHARMONIC_ROOT = {"Cb": "B", "B#": "C", "Fb": "E", "E#": "F"}

QUAL_CANON = {
    "maj7":"maj7","M7":"maj7","Δ":"maj7","maj":"maj","M":"maj",
    "min":"m","m":"m","-":"m","m7":"m7","mMaj7":"mMaj7","mM7":"mMaj7",
    "dim":"dim","o":"dim","o7": "dim7", "dim7":"dim7","aug":"aug","+":"aug",
    "7":"7","9":"7","11":"7","13":"7",
    "ø":"m7b5","m7b5":"m7b5","halfdim":"m7b5",
    "sus2":"sus","sus4":"sus","sus":"sus",
    "6":"maj","69":"maj",
    "add9":"maj"
}

REDUCE_TO_CLASS = {
    "maj7":"maj7","maj":"maj7",
    "m":"m7","m7":"m7","mMaj7":"m7",
    "7":"7",
    "m7b5":"m7b5",
    "dim":"dim7","dim7":"dim7",
    "aug":"7","sus":"7",
}

CHORD_RE = re.compile(r"""^\s*
    (?P<root>[A-Ga-g])(?P<acc>[#b♭♯]?)
    \s*
    (?P<qual>maj7|maj|M7|M|Δ|dim7|dim|m7b5|ø|o7|o|mMaj7|mM7|m7|m|min|aug|\+|7|9|11|13|6|69|sus2|sus4|sus|add9)?
    (?P<rest>.*?)
    \s*$""", re.VERBOSE)

def parse_chord(token: str) -> Optional[Tuple[int, str]]:
    """ Parsea un acorde en formato texto y devuelve 
    su representación como (pitch_class, quality)."""
    t = token.strip()
    if not t:
        return None
    m = CHORD_RE.match(t)
    if not m:
        return None
    root = m.group("root").upper()
    acc = m.group("acc").replace("♭","b").replace("♯","#")
    root_name = root + (acc if acc in ["#","b"] else "")
    if root_name in ENHARMONIC_ROOT:
        root_name = ENHARMONIC_ROOT[root_name]
    if root_name not in PITCH_TO_PC:
        return None
    pc = PITCH_TO_PC[root_name]
    qual = m.group("qual") or ""
    qual = QUAL_CANON.get(qual, qual)
    if qual == "":
        rest = (m.group("rest") or "").lower()
        if "m" in rest and "maj" not in rest:
            qual = "m"
        else:
            qual = "maj"
    qual = QUAL_CANON.get(qual, qual)
    reduced = REDUCE_TO_CLASS.get(qual, "maj7")
    return (pc, reduced)

def split_sequence(raw: str) -> List[str]:
    toks = re.split(r"[,\s;\|]+", str(raw).strip())
    return [t for t in toks if t]

def parse_sequence(raw: str) -> List[Tuple[int,str]]:
    out = []
    for t in split_sequence(raw):
        p = parse_chord(t)
        if p:
            out.append(p)
    return out

# Test rápido
example = df['chordprog'].dropna().astype(str).iloc[0]
print("Ejemplo crudo:", example[:120], "...")
print("Parseado:", parse_sequence(example)[:120], "...")


Ejemplo crudo: Fm7 Dm7b5 | G7b9 C7b9 | Fm7 DbM7 | Bbm7 Eb7 |
Cm7 Fm7 | Bbm7 Eb7b9 | AbM7 Db7 | Gm7b5 C7 |
Fm7 Dm7b5 | G7b9 C7b9 | Fm7 D ...
Parseado: [(5, 'm7'), (2, 'm7b5'), (7, '7'), (0, '7'), (5, 'm7'), (1, 'maj7'), (10, 'm7'), (3, '7'), (0, 'm7'), (5, 'm7'), (10, 'm7'), (3, '7'), (8, 'maj7'), (1, '7'), (7, 'm7b5'), (0, '7'), (5, 'm7'), (2, 'm7b5'), (7, '7'), (0, '7'), (5, 'm7'), (1, 'maj7'), (10, 'm7'), (3, '7'), (0, 'm7'), (5, 'm7'), (10, 'm7'), (3, '7'), (8, 'maj7'), (3, '7'), (8, 'maj7'), (5, '7'), (10, 'm7'), (10, 'm7'), (3, '7'), (8, 'maj7'), (5, '7'), (10, 'm7'), (3, '7'), (8, 'maj7'), (0, '7'), (5, 'm7'), (2, 'm7b5'), (7, '7'), (0, '7'), (5, 'm7'), (1, 'maj7'), (10, 'm7'), (3, '7'), (0, 'm7'), (5, 'm7'), (10, 'm7'), (3, '7'), (8, 'maj7'), (3, '7'), (8, 'maj7'), (0, '7')] ...


In [20]:
tests = ["Cb", "B#", "Fb", "E#", "Fbo7", "Bbo7/F", "C/G", "NC"]
for s in tests:
    print(s, "->", parse_chord(s))

Cb -> (11, 'maj7')
B# -> (0, 'maj7')
Fb -> (4, 'maj7')
E# -> (5, 'maj7')
Fbo7 -> (4, 'dim7')
Bbo7/F -> (10, 'dim7')
C/G -> (0, 'maj7')
NC -> None


## 4) Detección de tonalidad (mayor/menor) y transcripción funcional

In [21]:
MAJOR_SCALE = [0,2,4,5,7,9,11]
MINOR_HARM  = [0,2,3,5,7,8,11]
ROMAN = ["I","II","III","IV","V","VI","VII"]

EXPECTED_QUAL_MAJOR = {0:"maj7", 1:"m7", 2:"m7", 3:"maj7", 4:"7", 5:"m7", 6:"m7b5"}
EXPECTED_QUAL_MINOR = {0:"m7",  1:"m7b5", 2:"maj7", 3:"m7", 4:"7", 5:"maj7", 6:"dim7"}

def degree_index_for_pc(pc: int, tonic: int, mode: str) -> Optional[int]:
    """
    Devuelve el índice del grado para una nota dada (pc) en relación con la tónica y el modo.
    """
    rel = (pc - tonic) % 12
    scale = MAJOR_SCALE if mode=="major" else MINOR_HARM
    return scale.index(rel) if rel in scale else None

def chord_score_for_key(pc: int, cls: str, tonic: int, mode: str) -> float:
    """
    Calcula la puntuación de un acorde en relación con una tonalidad dada.
    """
    deg = degree_index_for_pc(pc, tonic, mode)
    score = 0.0
    if deg is not None:
        exp = EXPECTED_QUAL_MAJOR[deg] if mode=="major" else EXPECTED_QUAL_MINOR[deg]
        if cls == exp:
            score += 2.0
        elif (cls=="maj7" and exp in ["maj7"]) or (cls=="m7" and exp in ["m7","m7b5"]) or (cls=="7" and exp in ["7"]) or (cls=="m7b5" and exp in ["m7b5","dim7"]) or (cls=="dim7" and exp in ["dim7","m7b5"]):
            score += 1.0
        else:
            score += 0.4
    else:
        if cls == "7":
            score += 0.3
    return score

def detect_key_for_sequence(parsed_seq: List[Tuple[int,str]]) -> Tuple[int,str,float]:
    """
    Detecta la tonalidad de una secuencia de acordes.
    """
    best = None
    for tonic in range(12):
        for mode in ["major","minor"]:
            total = 0.0
            for pc, cls in parsed_seq:
                total += chord_score_for_key(pc, cls, tonic, mode)
            # Bonus por cadencia V->I/i al final
            if len(parsed_seq) >= 2:
                pc_prev, cls_prev = parsed_seq[-2]
                pc_last, cls_last = parsed_seq[-1]
                if degree_index_for_pc(pc_last, tonic, mode) == 0:
                    rel_prev = (pc_prev - tonic) % 12
                    if rel_prev == 7:  # V
                        total += 1.5
            key = (tonic, mode, total)
            if best is None or total > best[2]:
                best = key
    return best

def roman_for_chord(pc: int, cls: str, tonic: int, mode: str) -> str:
    """
    Devuelve la representación en números romanos de un acorde dado.
    """
    rel = (pc - tonic) % 12
    scale = MAJOR_SCALE if mode=="major" else MINOR_HARM
    if rel in scale:
        deg = scale.index(rel)
        base = ROMAN[deg]
        if cls in ["m7","m7b5","dim7"]:
            rn = base.lower()
            if cls=="m7b5":
                rn += "ø"
            elif cls=="dim7":
                rn += "o"
        elif cls == "7":
            rn = base + "7"
        else:
            rn = base
        return rn
    # Cromáticos comunes: bII, bIII, bVI, bVII
    if rel in [1,3,8,10]:
        mapping = {1:"bII", 3:"bIII", 8:"bVI", 10:"bVII"}
        rn = mapping[rel]
        if cls in ["m7","m7b5","dim7"]:
            rn = rn.lower()
        elif cls=="7":
            rn += "7"
        return rn
    return f"({rel})"

def sequence_to_roman(parsed_seq: List[Tuple[int,str]], tonic: int, mode: str) -> List[str]:
    return [roman_for_chord(pc, cls, tonic, mode) for pc, cls in parsed_seq]


In [22]:
# Demostración sobre una fila
parsed = parse_sequence(df['chordprog'].dropna().astype(str).iloc[0])
tonic, mode, score = detect_key_for_sequence(parsed)
romans = sequence_to_roman(parsed, tonic, mode)
print("Key detectada -> tonic_pc:", tonic, "| mode:", mode, "| score:", round(score,2))
romans[:12]

Key detectada -> tonic_pc: 8 | mode: major | score: 88.8


['vi', '(6)', 'VII7', 'III7', 'vi', 'IV', 'ii', 'V7', 'iii', 'vi', 'ii', 'V7']

## 5) Creamos la columna 'funcional_prog'

In [23]:
rows = []
for idx, row in df.iterrows():
    raw = row.get("chordprog")
    if not isinstance(raw, str) or not raw.strip():
        continue
    parsed = parse_sequence(raw)
    if len(parsed) < 2:
        continue
    tonic, mode, score = detect_key_for_sequence(parsed)
    roman_seq = sequence_to_roman(parsed, tonic, mode)

    rows.append({
        # columnas originales
        "title": row.get("title"),
        "composedby": row.get("composedby"),
        "key": row.get("key"),
        "timesig": row.get("timesig"),
        "bars": row.get("bars"),
        "chordprog": raw,
        # columnas nuevas
        "key_tonic_pc": tonic,
        "key_mode": mode,
        "key_score": score,
        "num_tokens": len(roman_seq),
        "funcional_prog": " ".join(roman_seq),
    })

proc = pd.DataFrame(rows)
print("Filas procesadas:", len(proc))
proc.head(5)

Filas procesadas: 2613


Unnamed: 0,title,composedby,key,timesig,bars,chordprog,key_tonic_pc,key_mode,key_score,num_tokens,funcional_prog
0,Lullaby of Birdland,George Shearing,Ab,4 4,32,Fm7 Dm7b5 | G7b9 C7b9 | Fm7 DbM7 | Bbm7 Eb7 |\nCm7 Fm7 | Bbm7 Eb7b9 | AbM7 Db7 | Gm7b5 C7 |\nFm7 Dm7b5 | G7b9 C7b9 |...,8,major,88.8,57,vi (6) VII7 III7 vi IV ii V7 iii vi ii V7 I IV7 viiø III7 vi (6) VII7 III7 vi IV ii V7 iii vi ii V7 I V7 I VI7 ii ii...
1,It's A Most Unusual Day,Jimmy McHugh and HYarold Adamson,G,3 4,72,F#/G F#/G G | Em7 | Am7 | D7 |\nF#/G F#/G G | Em7 | Am7 | D7 |\nG/B | CM7 | C#o7 | G/D |\nBm7 | Em7 | A7 | D7 |\nF#/...,7,major,137.4,92,VII VII I vi ii V7 VII VII I vi ii V7 I IV (6) I iii vi II7 V7 VII VII I vi ii V7 VII VII I vi ii V7 I IV (6) I ii V...
2,Jump Monk,Charles Mingus,Ab,4 4,54,Fm7 DbM7 | G7b5 C7 | Fm7 DbM7 | G7b5 C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | Fm7/Eb | Db7 ...,5,minor,85.0,58,i VI II7 V7 i VI II7 V7 i VI iiø V7 i VI iiø V7 i i VI7 V7 iv iv iiø bII7 iv bII vø I7 iv bII vø I7 iv bII vø I7 iv ...
3,Nuages,Django Reinhardt and Jacques Larme,G,4 4,32,Bbm7 Eb7 | Am7b5 D7b9 | G6 Am7 | Bm7 |\nBbm7 Eb7 | Am7b5 D7b9 | G6 | G6 |\nF#m7b5 | B7 | Em7 | Em7 |\nA7 Ab7 | A7 | ...,7,major,55.2,49,biii bVI7 iiø V7 I ii iii biii bVI7 iiø V7 I I viiø III7 vi vi II7 bII7 II7 V7 ii V7 biii bVI7 iiø V7 I I bvi bII7 v...
4,Love Me Do,John Lennon and Paul McCartney,G,4 4,48,G | C | G | C |\nG | C | C | C |\nC | G | C | G |\nC | G | C | G |\nC | G | C | C |\nC | C | G | C |\nG | G | D | D ...,7,major,93.6,50,I IV I IV I IV IV IV IV I IV I IV I IV I IV I IV IV IV IV I IV I I V V IV I I V V IV I I I IV I IV I IV IV IV IV I I...


## 6) Añadimos la columna 'key_estimada'

In [24]:
# Creamos la columna 'key_estimada' en el dataset
tonicpc_to_key = {0: 'C', 1: 'Db', 2: 'D', 3: 'Eb', 4: 'E', 5: 'F', 6: 'Gb', 7: 'G', 8: 'Ab', 9: 'A', 10: 'Bb', 11: 'B'}

import numpy as np
proc["key_estimada"] = np.where(
    proc["key_mode"].eq("minor"),
    proc["key_tonic_pc"].map(tonicpc_to_key).astype(str) + "m",
    proc["key_tonic_pc"].map(tonicpc_to_key).astype(str)
)

# Echamos un vistazo al resultado
proc[['key', 'key_tonic_pc', 'key_mode', 'key_estimada']].head()

Unnamed: 0,key,key_tonic_pc,key_mode,key_estimada
0,Ab,8,major,Ab
1,G,7,major,G
2,Ab,5,minor,Fm
3,G,7,major,G
4,G,7,major,G


## 7) Añadimos una columna con el conteo de acordes 'raros'

In [25]:
def acordes_unicos(df, columna="progresion_funcional"):
    """
    Devuelve el conjunto de acordes únicos encontrados en la columna especificada,
    ignorando el símbolo de barra de compás '|'.
    """
    progresiones = df[columna].dropna().tolist()
    tokens = set()
    for prog in progresiones:
        for t in prog.split():
            if t != '|':
                tokens.add(t)
    print(f"Total acordes distintos: {len(tokens)}")
    print(sorted(tokens))
    #return tokens

In [26]:
print("Acordes únicos en la transcripcion funcional con método original:")
acordes_unicos(proc, columna='funcional_prog')

Acordes únicos en la transcripcion funcional con método original:
Total acordes distintos: 50
['(4)', '(6)', '(9)', 'I', 'I7', 'II', 'II7', 'III', 'III7', 'IV', 'IV7', 'V', 'V7', 'VI', 'VI7', 'VII', 'VII7', 'bII', 'bII7', 'bIII', 'bIII7', 'bVI', 'bVI7', 'bVII', 'bVII7', 'bii', 'biii', 'bvi', 'bvii', 'i', 'ii', 'iii', 'iiio', 'iiiø', 'iio', 'iiø', 'io', 'iv', 'ivo', 'ivø', 'iø', 'v', 'vi', 'vii', 'viio', 'viiø', 'vio', 'viø', 'vo', 'vø']


In [27]:
# Crear la columna 'rare_chords' que cuenta ocurrencias de '(4)', '(6)' o '(9)' en cada progresión (paréntesis literales)
proc['rare_chords'] = (
    proc['funcional_prog']
        .astype(str)
        .str.count(r'\(4\)|\(6\)|\(9\)')
)
# (opcional) vista rápida
proc[['funcional_prog','rare_chords']].head()

Unnamed: 0,funcional_prog,rare_chords
0,vi (6) VII7 III7 vi IV ii V7 iii vi ii V7 I IV7 viiø III7 vi (6) VII7 III7 vi IV ii V7 iii vi ii V7 I V7 I VI7 ii ii...,3
1,VII VII I vi ii V7 VII VII I vi ii V7 I IV (6) I iii vi II7 V7 VII VII I vi ii V7 VII VII I vi ii V7 I IV (6) I ii V...,3
2,i VI II7 V7 i VI II7 V7 i VI iiø V7 i VI iiø V7 i i VI7 V7 iv iv iiø bII7 iv bII vø I7 iv bII vø I7 iv bII vø I7 iv ...,0
3,biii bVI7 iiø V7 I ii iii biii bVI7 iiø V7 I I viiø III7 vi vi II7 bII7 II7 V7 ii V7 biii bVI7 iiø V7 I I bvi bII7 v...,0
4,I IV I IV I IV IV IV IV I IV I IV I IV I IV I IV IV IV IV I IV I I V V IV I I V V IV I I I IV I IV I IV IV IV IV I I...,0


In [None]:
# Exportamos el DataFrame procesado a un archivo CSV
proc.to_csv('../../data/songdb_funcional_v3.csv', index=False, encoding='utf-8')
print("Guardado:", '../../data/songdb_funcional_v3.csv')

Guardado: data/songdb_funcional_v3.csv


## 6) Baseline 0: modelo trigram con backoff simple

In [12]:
class StupidBackoffLM:
    def __init__(self, n:int=3, alpha:float=0.4):
        self.n = n
        self.alpha = alpha
        self.counts = [defaultdict(int) for _ in range(n)]
        self.context_counts = [defaultdict(int) for _ in range(n)]
        self.vocab = set()

    def fit(self, sequences: List[List[str]]):
        self.counts = [defaultdict(int) for _ in range(self.n)]
        self.context_counts = [defaultdict(int) for _ in range(self.n)]
        self.vocab = set()
        for seq in sequences:
            tokens = ["<s>"]*(self.n-1) + seq + ["</s>"]
            self.vocab.update(tokens)
            for i in range(len(tokens)):
                for k in range(1, self.n+1):
                    if i-k+1 < 0: 
                        continue
                    ngram = tuple(tokens[i-k+1:i+1])
                    context = ngram[:-1]
                    self.counts[k-1][ngram] += 1
                    self.context_counts[k-1][context] += 1

    def predict_next(self, context_tokens: List[str], top_k:int=5) -> List[Tuple[str,float]]:
        ctx = tuple(["<s>"]*(self.n-1) + context_tokens)[-(self.n-1):]
        candidates = [t for t in self.vocab if t not in {"<s>","</s>"}]
        scores = []
        for tok in candidates:
            weight = 1.0
            score = 0.0
            for k in range(self.n, 0, -1):
                subctx = ctx[-(k-1):] if k>1 else tuple()
                ngram = subctx + (tok,)
                c = self.counts[k-1].get(ngram, 0)
                if c > 0:
                    denom = self.context_counts[k-1].get(subctx, 1)
                    score += weight * (c / denom)
                    break
                else:
                    weight *= self.alpha
            if score > 0:
                scores.append((tok, score))
        scores.sort(key=lambda x: x[1], reverse=True)
        return scores[:top_k]

# Preparar secuencias y entrenar
seqs = []
for s in proc["funcional_prog"].dropna().astype(str):
    toks = s.split()
    if len(toks) >= 2:
        seqs.append(toks)

lm = StupidBackoffLM(n=3, alpha=0.4)
lm.fit(seqs)
print("Vocab size:", len(lm.vocab))


Vocab size: 50


## 7) Evaluación: Top-1 / Top-3 y MRR

In [13]:
def topk_accuracy(seqs: List[List[str]], k:int=3) -> float:
    correct = total = 0
    for seq in seqs:
        for i in range(1, len(seq)):
            ctx = seq[max(0, i-2):i]
            gold = seq[i]
            preds = lm.predict_next(ctx, top_k=k)
            pred_tokens = [p[0] for p in preds]
            if gold in pred_tokens:
                correct += 1
            total += 1
    return correct / total if total>0 else 0.0

def mean_reciprocal_rank(seqs: List[List[str]]) -> float:
    s = 0.0
    n = 0
    for seq in seqs:
        for i in range(1, len(seq)):
            ctx = seq[max(0, i-2):i]
            gold = seq[i]
            preds = lm.predict_next(ctx, top_k=50)
            ranks = [j+1 for j,(t,_) in enumerate(preds) if t==gold]
            s += (1.0 / ranks[0]) if ranks else 0.0
            n += 1
    return s/n if n>0 else 0.0

acc1 = topk_accuracy(seqs, k=1)
acc3 = topk_accuracy(seqs, k=3)
mrr = mean_reciprocal_rank(seqs)

print(f"Top-1: {acc1:.3f} | Top-3: {acc3:.3f} | MRR: {mrr:.3f}")

Top-1: 0.435 | Top-3: 0.693 | MRR: 0.593


## 8) Demostración de autocompletado (sugerencias)

In [16]:

examples = []
for s in random.sample(seqs, min(5, len(seqs))):
    context = s[:2] if len(s)>=2 else s[:1]
    preds = lm.predict_next(context, top_k=5)
    examples.append({
        "context": " ".join(context),
        "suggestions": " | ".join([f"{tok} ({score:.3f})" for tok,score in preds])
    })
pd.DataFrame(examples)


Unnamed: 0,context,suggestions
0,I I7,IV (0.528) | I7 (0.125) | I (0.103) | IV7 (0.084) | ii (0.029)
1,I IV,I (0.377) | IV (0.160) | iii (0.076) | V7 (0.069) | V (0.051)
2,I I,I (0.343) | ii (0.086) | V7 (0.069) | IV (0.061) | I7 (0.048)
3,I VI,III (0.235) | VI (0.235) | I (0.118) | ii (0.118) | VII7 (0.059)
4,I I,I (0.343) | ii (0.086) | V7 (0.069) | IV (0.061) | I7 (0.048)


## 9) Roadmap (siguientes mejoras)

- **Segmentación local de tonalidad** con HMM/Viterbi (24 llaves) para detectar tramos en menor y modulaciones.
- **Secundarios y sustituciones**: etiquetas explícitas (V/ii, V/V, tritono) para enriquecer el vocabulario funcional.
