# Trasncripción funcional de acordes

Este cuaderno genera el **dataset 'songdb_functional_v4.csv'** (números romanos + tonalidad mayor/menor).

1) Importación de librerías 
2) Carga del dataset original
3) Parseo de acordes y normalización de tipos (m7, maj7, 7, m7b5, dim7)
4) Detección de tonalidad y transcripción funcional (Dm, G7, C > iim7, V7, I)
5) Construcción y exportación del dataset *songdb_functional_v4.csv*


> **Cómo usar este notebook**  
> 1. Ejecuta las celdas en orden, de arriba a abajo.  
> 2. Asegúrate de tener el fichero base `songdb.csv` en la ruta indicada (por defecto: `data/songdb.csv`).  
> 3. Al final se guardan los resultados procesados en `data/`.


## 1) Importacion de librerías

In [1]:

import re, os, json, math, zipfile, random
from collections import defaultdict
from typing import List, Tuple, Optional

import numpy as np
import pandas as pd

pd.set_option("display.max_colwidth", 120)
print("Versions -> pandas:", pd.__version__, "| numpy:", np.__version__)

Versions -> pandas: 2.3.2 | numpy: 2.3.2


## 2) Carga del dataset original

In [2]:
df = pd.read_csv('../../data/songdb.csv') # dataset base con progresiones
print("Filas:", len(df))
df.head(5)

Filas: 2613


Unnamed: 0,title,composedby,key,timesig,bars,chordprog
0,Lullaby of Birdland,George Shearing,Ab,4 4,32,Fm7 Dm7b5 | G7b9 C7b9 | Fm7 DbM7 | Bbm7 Eb7 |\nCm7 Fm7 | Bbm7 Eb7b9 | AbM7 Db7 | Gm7b5 C7 |\nFm7 Dm7b5 | G7b9 C7b9 |...
1,It's A Most Unusual Day,Jimmy McHugh and HYarold Adamson,G,3 4,72,F#/G F#/G G | Em7 | Am7 | D7 |\nF#/G F#/G G | Em7 | Am7 | D7 |\nG/B | CM7 | C#o7 | G/D |\nBm7 | Em7 | A7 | D7 |\nF#/...
2,Jump Monk,Charles Mingus,Ab,4 4,54,Fm7 DbM7 | G7b5 C7 | Fm7 DbM7 | G7b5 C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | Fm7/Eb | Db7 ...
3,Nuages,Django Reinhardt and Jacques Larme,G,4 4,32,Bbm7 Eb7 | Am7b5 D7b9 | G6 Am7 | Bm7 |\nBbm7 Eb7 | Am7b5 D7b9 | G6 | G6 |\nF#m7b5 | B7 | Em7 | Em7 |\nA7 Ab7 | A7 | ...
4,Love Me Do,John Lennon and Paul McCartney,G,4 4,48,G | C | G | C |\nG | C | C | C |\nC | G | C | G |\nC | G | C | G |\nC | G | C | C |\nC | C | G | C |\nG | G | D | D ...


## 3) Parser de acordes y normalización de tipos

In [3]:
PITCHES_SHARP = ["C","C#","D","D#","E","F","F#","G","G#","A","A#","B"]
PITCHES_FLAT  = ["C","Db","D","Eb","E","F","Gb","G","Ab","A","Bb","B"]
PITCH_TO_PC = {p:i for i,p in enumerate(PITCHES_SHARP)}
PITCH_TO_PC.update({p:i for i,p in enumerate(PITCHES_FLAT)}) 
# PC = pitch class. Números asignados a cada nota {C:0, C#/Db:1, D:2, ...}

ENHARMONIC_ROOT = {"Cb": "B", "B#": "C", "Fb": "E", "E#": "F"}

QUAL_CANON = {
    "maj7":"maj7","M7":"maj7","Δ":"maj7","maj":"maj","M":"maj",
    "min":"m","m":"m","-":"m","m7":"m7","mMaj7":"mMaj7","mM7":"mMaj7",
    "dim":"dim","o":"dim","o7": "dim7", "dim7":"dim7","aug":"aug","+":"aug",
    "7":"7","9":"7","11":"7","13":"7",
    "ø":"m7b5","m7b5":"m7b5","halfdim":"m7b5",
    "sus2":"sus","sus4":"sus","sus":"sus",
    "6":"maj","69":"maj",
    "add9":"maj"
}

REDUCE_TO_CLASS = {
    "maj7":"maj7","maj":"maj7",
    "m":"m7","m7":"m7","mMaj7":"m7",
    "7":"7",
    "m7b5":"m7b5",
    "dim":"dim7","dim7":"dim7",
    "aug":"7","sus":"7",
}

CHORD_RE = re.compile(r"""^\s*
    (?P<root>[A-Ga-g])(?P<acc>[#b♭♯]?)
    \s*
    (?P<qual>maj7|maj|M7|M|Δ|dim7|dim|m7b5|ø|o7|o|mMaj7|mM7|m7|m|min|aug|\+|7|9|11|13|6|69|sus2|sus4|sus|add9)?
    (?P<rest>.*?)
    \s*$""", re.VERBOSE)

def parse_chord(token: str) -> Optional[Tuple[int, str]]:
    """ Parsea un acorde en formato texto y devuelve 
    su representación como (pitch_class, quality)."""
    t = token.strip()
    if not t:
        return None
    m = CHORD_RE.match(t)
    if not m:
        return None
    root = m.group("root").upper()
    acc = m.group("acc").replace("♭","b").replace("♯","#")
    root_name = root + (acc if acc in ["#","b"] else "")
    if root_name in ENHARMONIC_ROOT:
        root_name = ENHARMONIC_ROOT[root_name]
    if root_name not in PITCH_TO_PC:
        return None
    pc = PITCH_TO_PC[root_name]
    qual = m.group("qual") or ""
    qual = QUAL_CANON.get(qual, qual)
    if qual == "":
        rest = (m.group("rest") or "").lower()
        if "m" in rest and "maj" not in rest:
            qual = "m"
        else:
            qual = "maj"
    qual = QUAL_CANON.get(qual, qual)
    reduced = REDUCE_TO_CLASS.get(qual, "maj7")
    return (pc, reduced)

def split_sequence(raw: str) -> List[str]:
    toks = re.split(r"[,\s;\|]+", str(raw).strip())
    return [t for t in toks if t]

def parse_sequence(raw: str) -> List[Tuple[int,str]]:
    out = []
    for t in split_sequence(raw):
        p = parse_chord(t)
        if p:
            out.append(p)
    return out


In [4]:
#@title Tests de parsing de acordes
tests = ["Cb", "B#", "Fb", "E#", "Fbo7", "Bbo7/F", "C/G", "NC"]
for s in tests:
    print(s, "->", parse_chord(s))

Cb -> (11, 'maj7')
B# -> (0, 'maj7')
Fb -> (4, 'maj7')
E# -> (5, 'maj7')
Fbo7 -> (4, 'dim7')
Bbo7/F -> (10, 'dim7')
C/G -> (0, 'maj7')
NC -> None


## 4) Detección de tonalidad (mayor/menor) y transcripción funcional

In [5]:
MAJOR_SCALE = [0,2,4,5,7,9,11]
MINOR_HARM  = [0,2,3,5,7,8,11]
ROMAN = ["I","II","III","IV","V","VI","VII"]

EXPECTED_QUAL_MAJOR = {0:"maj7", 1:"m7", 2:"m7", 3:"maj7", 4:"7", 5:"m7", 6:"m7b5"}
EXPECTED_QUAL_MINOR = {0:"m7",  1:"m7b5", 2:"maj7", 3:"m7", 4:"7", 5:"maj7", 6:"dim7"}

def degree_index_for_pc(pc: int, tonic: int, mode: str) -> Optional[int]:
    """
    Devuelve el índice del grado para una nota dada (pc) en relación con la tónica y el modo.
    """
    rel = (pc - tonic) % 12
    scale = MAJOR_SCALE if mode=="major" else MINOR_HARM
    return scale.index(rel) if rel in scale else None

def chord_score_for_key(pc: int, cls: str, tonic: int, mode: str) -> float:
    """
    Calcula la puntuación de un acorde en relación con una tonalidad dada.
    """
    deg = degree_index_for_pc(pc, tonic, mode)
    score = 0.0
    if deg is not None:
        exp = EXPECTED_QUAL_MAJOR[deg] if mode=="major" else EXPECTED_QUAL_MINOR[deg]
        if cls == exp:
            score += 2.0
        elif (cls=="maj7" and exp in ["maj7"]) or (cls=="m7" and exp in ["m7","m7b5"]) or (cls=="7" and exp in ["7"]) or (cls=="m7b5" and exp in ["m7b5","dim7"]) or (cls=="dim7" and exp in ["dim7","m7b5"]):
            score += 1.0
        else:
            score += 0.4
    else:
        if cls == "7":
            score += 0.3
    return score

def detect_key_for_sequence(parsed_seq: List[Tuple[int,str]]) -> Tuple[int,str,float]:
    """
    Detecta la tonalidad de una secuencia de acordes.
    """
    best = None
    for tonic in range(12):
        for mode in ["major","minor"]:
            total = 0.0
            for pc, cls in parsed_seq:
                total += chord_score_for_key(pc, cls, tonic, mode)
            # Bonus por cadencia V->I/i al final
            if len(parsed_seq) >= 2:
                pc_prev, cls_prev = parsed_seq[-2]
                pc_last, cls_last = parsed_seq[-1]
                if degree_index_for_pc(pc_last, tonic, mode) == 0:
                    rel_prev = (pc_prev - tonic) % 12
                    if rel_prev == 7:  # V
                        total += 1.5
            key = (tonic, mode, total)
            if best is None or total > best[2]:
                best = key
    return best

def roman_for_chord(pc: int, cls: str, tonic: int, mode: str) -> str:
    """
    Devuelve la representación en números romanos (grado funcional) 
    de un acorde dado.
    """
    rel = (pc - tonic) % 12
    scale = MAJOR_SCALE if mode=="major" else MINOR_HARM
    if rel in scale:
        deg = scale.index(rel)
        base = ROMAN[deg]
        if cls in ["m7","m7b5","dim7"]:
            rn = base.lower()
            if cls=="m7b5":
                rn += "ø"
            elif cls=="dim7":
                rn += "o"
        elif cls == "7":
            rn = base + "7"
        else:
            rn = base
        return rn
    # Cromáticos comunes: 
    # bII, bIII, #IV, bVI, bVII
    mapping = {1:"bII", 3:"bIII", 6:"#IV", 8:"bVI", 10:"bVII"}
    if mode == "minor":
        # prestamos del modo mayor: tercera y sexta naturales
        mapping.update({4:"natIII", 9:"natVI"})
    if rel in mapping:
        rn = mapping[rel]
        if cls in ["m7","m7b5","dim7"]:
            rn = rn.lower() # OJO! No deberíamos pasar los cromáticos a minúsculas ??
            if cls == "m7b5":rn += "ø"
            elif cls=="dim7":rn += "o"
        elif cls=="7":
            rn += "7"
        return rn
    return f"({rel})"

def _roman_secondary_or_sub(pc:int, cls:str, next_pc:Optional[int], tonic:int, mode:str) -> Optional[str]:
    """
    Si el acorde actual (pc, cls) es un dominante o su sustituto por tritono
    que RESUELVE en el siguiente acorde diatónico, devuelve 'V/XX' o 'Vsub/XX'.
    En caso contrario devuelve None.
    """
    if next_pc is None or cls != "7":
        return None
    scale = MAJOR_SCALE if mode == "major" else MINOR_HARM
    rel_cur = (pc - tonic) % 12
    # grado diatónico del acorde de resolución
    t = degree_index_for_pc(next_pc, tonic, mode)
    if t is None:
        return None
    # no marcamos V/I (ya lo cubres como V7); solo secundarios (ii, iii, IV, V, vi, vii°)
    if t == 0:
        return None
    # V del grado t: (escala[t] + 7) % 12
    rel_V   = (scale[t] + 7) % 12
    # Sustituto por tritono del V: (escala[t] + 1) % 12   --> bII para I, bVI para V, bIII para ii, etc.
    rel_sub = (scale[t] + 1) % 12
    if rel_cur == rel_V:
        return f"V/{ROMAN[t]}"
    if rel_cur == rel_sub:
        return f"Vsub/{ROMAN[t]}"
    return None

def sequence_to_roman(parsed_seq: List[Tuple[int,str]], tonic: int, mode: str) -> List[str]:
    out = []
    n = len(parsed_seq)
    for i, (pc, cls) in enumerate(parsed_seq):
        next_pc = parsed_seq[i+1][0] if i+1 < n else None
        tag = _roman_secondary_or_sub(pc, cls, next_pc, tonic, mode)
        if tag is not None:
            out.append(tag)
        else:
            out.append(roman_for_chord(pc, cls, tonic, mode))
    return out

In [6]:
# Demostración sobre una fila
parsed = parse_sequence(df['chordprog'].dropna().astype(str).iloc[0])
tonic, mode, score = detect_key_for_sequence(parsed)
romans = sequence_to_roman(parsed, tonic, mode)
print("Key detectada -> tonic_pc:", tonic, "| mode:", mode, "| score:", round(score,2))
romans[:12]

Key detectada -> tonic_pc: 8 | mode: major | score: 88.8


['vi',
 '#ivø',
 'V/III',
 'V/VI',
 'vi',
 'IV',
 'ii',
 'V7',
 'iii',
 'vi',
 'ii',
 'V7']

## 5) Construcción del dataset *songdb_functional_v4.csv*

In [52]:
rows = []
for idx, row in df.iterrows():
    raw = row.get("chordprog")
    if not isinstance(raw, str) or not raw.strip():
        continue
    parsed = parse_sequence(raw)
    if len(parsed) < 2:
        continue
    tonic, mode, score = detect_key_for_sequence(parsed)
    roman_seq = sequence_to_roman(parsed, tonic, mode)

    rows.append({
        # columnas originales
        "title": row.get("title"),
        "composedby": row.get("composedby"),
        "key": row.get("key"),
        "timesig": row.get("timesig"),
        "bars": row.get("bars"),
        "chordprog": raw,
        # columnas nuevas
        "key_tonic_pc": tonic,
        "key_mode": mode,
        "key_score": score,
        "num_tokens": len(roman_seq),
        "funcional_prog": " ".join(roman_seq),
    })

proc = pd.DataFrame(rows)
print("Filas procesadas:", len(proc))

Filas procesadas: 2613


In [53]:
proc.head(5)

Unnamed: 0,title,composedby,key,timesig,bars,chordprog,key_tonic_pc,key_mode,key_score,num_tokens,funcional_prog
0,Lullaby of Birdland,George Shearing,Ab,4 4,32,Fm7 Dm7b5 | G7b9 C7b9 | Fm7 DbM7 | Bbm7 Eb7 |\nCm7 Fm7 | Bbm7 Eb7b9 | AbM7 Db7 | Gm7b5 C7 |\nFm7 Dm7b5 | G7b9 C7b9 |...,8,major,88.8,57,vi #ivø V/III V/VI vi IV ii V7 iii vi ii V7 I IV7 viiø V/VI vi #ivø V/III V/VI vi IV ii V7 iii vi ii V7 I V7 I V/II ...
1,It's A Most Unusual Day,Jimmy McHugh and HYarold Adamson,G,3 4,72,F#/G F#/G G | Em7 | Am7 | D7 |\nF#/G F#/G G | Em7 | Am7 | D7 |\nG/B | CM7 | C#o7 | G/D |\nBm7 | Em7 | A7 | D7 |\nF#/...,7,major,137.4,92,VII VII I vi ii V7 VII VII I vi ii V7 I IV #ivo I iii vi V/V V7 VII VII I vi ii V7 VII VII I vi ii V7 I IV #ivo I ii...
2,Jump Monk,Charles Mingus,Ab,4 4,54,Fm7 DbM7 | G7b5 C7 | Fm7 DbM7 | G7b5 C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | Fm7/Eb | Db7 ...,5,minor,85.0,58,i VI V/V V7 i VI V/V V7 i VI iiø V7 i VI iiø V7 i i Vsub/V V7 iv iv iiø bII7 iv bII vø V/IV iv bII vø V/IV iv bII vø...
3,Nuages,Django Reinhardt and Jacques Larme,G,4 4,32,Bbm7 Eb7 | Am7b5 D7b9 | G6 Am7 | Bm7 |\nBbm7 Eb7 | Am7b5 D7b9 | G6 | G6 |\nF#m7b5 | B7 | Em7 | Em7 |\nA7 Ab7 | A7 | ...,7,major,55.2,49,biii bVI7 iiø V7 I ii iii biii bVI7 iiø V7 I I viiø V/VI vi vi II7 bII7 V/V V7 ii V7 biii bVI7 iiø V7 I I bvi bII7 v...
4,Love Me Do,John Lennon and Paul McCartney,G,4 4,48,G | C | G | C |\nG | C | C | C |\nC | G | C | G |\nC | G | C | G |\nC | G | C | C |\nC | C | G | C |\nG | G | D | D ...,7,major,93.6,50,I IV I IV I IV IV IV IV I IV I IV I IV I IV I IV IV IV IV I IV I I V V IV I I V V IV I I I IV I IV I IV IV IV IV I I...


In [54]:
# Creamos la columna 'key_estimada' en el dataset
tonicpc_to_key = {0: 'C', 1: 'Db', 2: 'D', 3: 'Eb', 4: 'E', 5: 'F', 6: 'Gb', 7: 'G', 8: 'Ab', 9: 'A', 10: 'Bb', 11: 'B'}

import numpy as np
proc["key_estimada"] = np.where(
    proc["key_mode"].eq("minor"),
    proc["key_tonic_pc"].map(tonicpc_to_key).astype(str) + "m",
    proc["key_tonic_pc"].map(tonicpc_to_key).astype(str)
)

# Echamos un vistazo al resultado
proc[['key', 'key_tonic_pc', 'key_mode', 'key_estimada']].head()

Unnamed: 0,key,key_tonic_pc,key_mode,key_estimada
0,Ab,8,major,Ab
1,G,7,major,G
2,Ab,5,minor,Fm
3,G,7,major,G
4,G,7,major,G


In [55]:
songdb_funcional = proc[['title', 'composedby', 'key', 'timesig', 'bars', 'chordprog',
       'key_tonic_pc', 'key_mode', 'key_estimada', 'key_score', 'num_tokens', 
       'funcional_prog']].copy()

In [56]:
songdb_funcional.head(5)

Unnamed: 0,title,composedby,key,timesig,bars,chordprog,key_tonic_pc,key_mode,key_estimada,key_score,num_tokens,funcional_prog
0,Lullaby of Birdland,George Shearing,Ab,4 4,32,Fm7 Dm7b5 | G7b9 C7b9 | Fm7 DbM7 | Bbm7 Eb7 |\nCm7 Fm7 | Bbm7 Eb7b9 | AbM7 Db7 | Gm7b5 C7 |\nFm7 Dm7b5 | G7b9 C7b9 |...,8,major,Ab,88.8,57,vi #ivø V/III V/VI vi IV ii V7 iii vi ii V7 I IV7 viiø V/VI vi #ivø V/III V/VI vi IV ii V7 iii vi ii V7 I V7 I V/II ...
1,It's A Most Unusual Day,Jimmy McHugh and HYarold Adamson,G,3 4,72,F#/G F#/G G | Em7 | Am7 | D7 |\nF#/G F#/G G | Em7 | Am7 | D7 |\nG/B | CM7 | C#o7 | G/D |\nBm7 | Em7 | A7 | D7 |\nF#/...,7,major,G,137.4,92,VII VII I vi ii V7 VII VII I vi ii V7 I IV #ivo I iii vi V/V V7 VII VII I vi ii V7 VII VII I vi ii V7 I IV #ivo I ii...
2,Jump Monk,Charles Mingus,Ab,4 4,54,Fm7 DbM7 | G7b5 C7 | Fm7 DbM7 | G7b5 C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | DbM7 | Gm7b5 | C7 |\nFm7 | Fm7/Eb | Db7 ...,5,minor,Fm,85.0,58,i VI V/V V7 i VI V/V V7 i VI iiø V7 i VI iiø V7 i i Vsub/V V7 iv iv iiø bII7 iv bII vø V/IV iv bII vø V/IV iv bII vø...
3,Nuages,Django Reinhardt and Jacques Larme,G,4 4,32,Bbm7 Eb7 | Am7b5 D7b9 | G6 Am7 | Bm7 |\nBbm7 Eb7 | Am7b5 D7b9 | G6 | G6 |\nF#m7b5 | B7 | Em7 | Em7 |\nA7 Ab7 | A7 | ...,7,major,G,55.2,49,biii bVI7 iiø V7 I ii iii biii bVI7 iiø V7 I I viiø V/VI vi vi II7 bII7 V/V V7 ii V7 biii bVI7 iiø V7 I I bvi bII7 v...
4,Love Me Do,John Lennon and Paul McCartney,G,4 4,48,G | C | G | C |\nG | C | C | C |\nC | G | C | G |\nC | G | C | G |\nC | G | C | C |\nC | C | G | C |\nG | G | D | D ...,7,major,G,93.6,50,I IV I IV I IV IV IV IV I IV I IV I IV I IV I IV IV IV IV I IV I I V V IV I I V V IV I I I IV I IV I IV IV IV IV I I...


In [None]:
# Exportar la base de datos funcional
songdb_funcional.to_csv('../../data/songdb_funcional_v4.csv', index=False, encoding='utf-8')
print("Guardado:", '../../data/songdb_funcional_v4.csv')

Guardado: data/songdb_funcional_v4.csv
