# SmashChords Heuristic Experiments

Exploring matchmaking heuristics using a Taylor Swift toy dataset.

- **Experiment 1:** Find snippets that share the same key *and* same Roman-numeral progression (direct mashup candidates)
- **Experiment 2:** Find snippets that share the same Roman-numeral progression but are in *different* keys (transposition-based mashup candidates)

In [17]:
import ast
import pandas as pd
from itertools import combinations

## Load Data

In [18]:
df = pd.read_csv("toy_smashchords_tswift.csv")

# Parse string-encoded lists into actual Python lists
df["roman_progression"] = df["roman_progression"].apply(ast.literal_eval)
df["chord_progression_raw"] = df["chord_progression_raw"].apply(ast.literal_eval)

# Create a tuple version of the progression for use as a dict/groupby key
df["progression_key"] = df["roman_progression"].apply(tuple)

# Combined key identity (e.g. "A minor" vs "A major" are distinct)
df["full_key"] = df["key"] + " " + df["key_mode"]

print(f"Loaded {len(df)} snippets across {df['song_title'].nunique()} songs")
df.head()

Loaded 1163 snippets across 234 songs


Unnamed: 0,snippet_id,song_title,section,artist,key,key_mode,roman_progression,chord_progression_raw,progression_key,full_key
0,TS_001,seven,Verse,Taylor Swift,G,major,"[I, v, IV]","[G, Dm, C]","(I, v, IV)",G major
1,TS_002,seven,Chorus,Taylor Swift,D,minor,"[i, VII, III, IV]","[Dm, C, F, G]","(i, VII, III, IV)",D minor
2,TS_003,seven,Bridge,Taylor Swift,A,minor,"[i, III, VII]","[Am, C, G]","(i, III, VII)",A minor
3,TS_004,seven,Outro,Taylor Swift,A,minor,"[i, III, VII, III]","[Am, C, G, C]","(i, III, VII, III)",A minor
4,TS_005,it's nice to have a friend,Intro,Taylor Swift,A,minor,"[i, VII]","[Am, G]","(i, VII)",A minor


---
## Experiment 1: Same Key + Same Progression

These snippets are the most directly compatible — they share both tonal center and harmonic structure, making them ideal candidates for a direct mashup with no transposition needed.

In [19]:
# Group by (full key, progression) and keep groups with 2+ snippets
same_key_same_prog = (
    df.groupby(["full_key", "progression_key"])
    .filter(lambda g: len(g) >= 2)
    .sort_values(["full_key", "progression_key"])
)

print(f"Snippets that share key AND progression: {len(same_key_same_prog)}")
print(f"Distinct (key, progression) groups: {same_key_same_prog.groupby(['full_key','progression_key']).ngroups}\n")

Snippets that share key AND progression: 697
Distinct (key, progression) groups: 196



In [20]:
# Show each matching group with its candidate pairs
for (full_key, prog), group in same_key_same_prog.groupby(["full_key", "progression_key"]):
    print(f"Key: {full_key}  |  Progression: {list(prog)}")
    print(f"  Matching snippets ({len(group)}):")
    for _, row in group.iterrows():
        print(f"    [{row['snippet_id']}] {row['song_title']} — {row['section']}")
        print(f"           Chords: {row['chord_progression_raw']}")
    
    # List all pairwise mashup candidates
    pairs = list(combinations(group["snippet_id"].tolist(), 2))
    print(f"  Mashup pairs: {pairs}")
    print()

Key: A major  |  Progression: ['I']
  Matching snippets (4):
    [TS_181] imgonnagetyouback — Intro
           Chords: ['A']
    [TS_401] this is me trying — Outro
           Chords: ['A']
    [TS_648] the lucky one (taylor's version) — Pre-Chorus
           Chords: ['A']
    [TS_1147] cruel summer — Intro
           Chords: ['A']
  Mashup pairs: [('TS_181', 'TS_401'), ('TS_181', 'TS_648'), ('TS_181', 'TS_1147'), ('TS_401', 'TS_648'), ('TS_401', 'TS_1147'), ('TS_648', 'TS_1147')]

Key: A major  |  Progression: ['I', 'II', 'V', 'iii']
  Matching snippets (3):
    [TS_274] starlight (taylor's version) — Chorus
           Chords: ['A', 'B', 'E', 'C#m']
    [TS_1144] you're on your own kid — Pre-Chorus
           Chords: ['A', 'B', 'E', 'C#m']
    [TS_1146] you're on your own kid — Bridge
           Chords: ['A', 'B', 'E', 'C#m']
  Mashup pairs: [('TS_274', 'TS_1144'), ('TS_274', 'TS_1146'), ('TS_1144', 'TS_1146')]

Key: A major  |  Progression: ['I', 'IV', 'vi', 'IV']
  Matching snippets 

In [21]:
# Songs (not snippets) that qualify under Experiment 1
exp1_songs = sorted(same_key_same_prog["song_title"].unique().tolist())
print("Qualifying songs:", exp1_songs)
print()
print("Matching snippets:")
print(same_key_same_prog[["song_title", "snippet_id", "section", "full_key", "roman_progression"]].to_string(index=False))


Qualifying songs: ["'slut!' (taylor's version) (from the vault)", "22 (taylor's version)", 'a perfectly good heart', "all too well (10 minute version) (taylor's version) (from the vault)", "all too well (taylor's version)", "all you had to do was stay (taylor's version)", 'anti-hero', 'august', "babe (taylor's version) (from the vault)", "back to december (taylor's version)", "bad blood (taylor's version)", "begin again (taylor's version)", 'bejeweled', "better man (taylor's version) (from the vault)", "better than revenge (taylor's version)", 'betty', 'bigger than the whole sky', "blank space (taylor's version)", "breathe (taylor's version)", 'but daddy i love him', "bye bye baby (taylor's version) (from the vault)", 'call it what you want', 'cassandra', "castles crumbling taylor's version) (from the vault)", 'champagne problems', "change (taylor's version)", 'chloe or sam or sophia or marcus', 'clara bow', "clean (taylor's version)", 'closure', 'cold as you', "come back be here (tayl

---
## Experiment 2: Different Key, Same Progression

These snippets share the same Roman-numeral structure but are in different keys. A DJ/producer would transpose one snippet to match the other before mashing them up.

In [22]:
# Group by progression only, then keep groups that span at least 2 distinct keys
diff_key_same_prog = (
    df.groupby("progression_key")
    .filter(lambda g: g["full_key"].nunique() >= 2)
    .sort_values("progression_key")
)

print(f"Snippets that share a progression across different keys: {len(diff_key_same_prog)}")
print(f"Distinct progressions: {diff_key_same_prog['progression_key'].nunique()}\n")

Snippets that share a progression across different keys: 766
Distinct progressions: 99



In [23]:
# Songs (not snippets) that qualify under Experiment 2
exp2_songs = sorted(diff_key_same_prog["song_title"].unique().tolist())
print("Qualifying songs:", exp2_songs)
print()
print("Matching snippets:")
print(diff_key_same_prog[["song_title", "snippet_id", "section", "full_key", "roman_progression"]].to_string(index=False))


Qualifying songs: ["'slut!' (taylor's version) (from the vault)", "22 (taylor's version)", 'a perfectly good heart', 'afterglow', "all too well (10 minute version) (taylor's version) (from the vault)", "all too well (taylor's version)", "all you had to do was stay (taylor's version)", 'anti-hero', 'august', "babe (taylor's version) (from the vault)", "back to december (taylor's version)", "bad blood (taylor's version)", "begin again (taylor's version)", 'bejeweled', "better man (taylor's version) (from the vault)", "better than revenge (taylor's version)", 'betty', 'bigger than the whole sky', "blank space (taylor's version)", "breathe (taylor's version)", 'but daddy i love him', "bye bye baby (taylor's version) (from the vault)", 'call it what you want', 'cardigan', 'cassandra', "castles crumbling taylor's version) (from the vault)", 'champagne problems', "change (taylor's version)", 'chloe or sam or sophia or marcus', "clean (taylor's version)", 'cold as you', "come back be here (tay

In [24]:
# Show each progression group, organized by key sub-group, with cross-key pairs
for prog, prog_group in diff_key_same_prog.groupby("progression_key"):
    keys_present = sorted(prog_group["full_key"].unique())
    print(f"Progression: {list(prog)}")
    print(f"  Keys present: {keys_present}")

    for full_key, key_group in prog_group.groupby("full_key"):
        for _, row in key_group.iterrows():
            print(f"    [{row['snippet_id']}] {row['song_title']} — {row['section']}  (key: {full_key})")
            print(f"           Chords: {row['chord_progression_raw']}")

    # Cross-key pairs only (skip pairs that share the same key)
    snippet_rows = prog_group[["snippet_id", "full_key"]].values.tolist()
    cross_key_pairs = [
        (a[0], b[0])
        for a, b in combinations(snippet_rows, 2)
        if a[1] != b[1]
    ]
    print(f"  Cross-key mashup pairs: {cross_key_pairs}")
    print()

Progression: ['I']
  Keys present: ['A major', 'B major', 'C major', 'D major', 'D# major', 'E major', 'F major', 'F# major', 'G major', 'G# major']
    [TS_1147] cruel summer — Intro  (key: A major)
           Chords: ['A']
    [TS_181] imgonnagetyouback — Intro  (key: A major)
           Chords: ['A']
    [TS_401] this is me trying — Outro  (key: A major)
           Chords: ['A']
    [TS_648] the lucky one (taylor's version) — Pre-Chorus  (key: A major)
           Chords: ['A']
    [TS_995] fortnight (featuring post malone) — Intro  (key: B major)
           Chords: ['B']
    [TS_266] so it goes… — Intro  (key: B major)
           Chords: ['Bsus4']
    [TS_203] so high school — Outro  (key: B major)
           Chords: ['BRIDGE']
    [TS_1152] cruel summer — Outro  (key: B major)
           Chords: ['BRIDGE']
    [TS_110] would've could've should’ve — Outro  (key: B major)
           Chords: ['BRIDGE']
    [TS_368] london boy — Outro  (key: B major)
           Chords: ['BRIDGE']
    [