# üéµ Split Liked Songs into Monthly & Genre Playlists

Create **monthly playlists**, **genre-split playlists**, and **top played playlists** from your Spotify data.

**Prerequisites:** Run `01_sync_data.ipynb` first to download your library.

**This notebook creates:**
- üìÖ Monthly playlists (e.g., "AJFindsDec25")
- üé∏ Genre-split monthly playlists:
  - `AJFindsHipHopDec25` - Hip Hop / Rap tracks
  - `AJFindsDanceDec25` - Dance / Electronic / EDM tracks  
  - `AJFindsOtherDec25` - All other genres (Pop, Rock, Indie, R&B, etc.)
- üéµ Optional: Master genre playlists (all-time, broken down by broad genre)
- üéß **NEW:** Most Listened monthly playlists (e.g., "AJTopDec25") - requires streaming history export

**Note:** The sum of HipHop + Dance + Other = total monthly playlist tracks

## ‚öôÔ∏è Configuration

**Edit the values below to customize your playlist names!**

In [24]:
# ============================================================================
# üé® CUSTOMIZE YOUR PLAYLIST NAMES
# ============================================================================

# Your name/prefix for playlists
OWNER_NAME = "AJ"
PREFIX = "Finds"  # e.g., "AJFinds"

# ============================================================================
# üìÖ MONTHLY PLAYLIST SETTINGS
# ============================================================================

# Monthly playlist naming template
# Available variables:
#   {owner}  - your name (e.g., "AJ")
#   {prefix} - prefix (e.g., "Finds") 
#   {mon}    - short month name (e.g., "Dec")
#   {year}   - 2-digit year (e.g., "25")
#
# Examples:
#   "{owner}{prefix} - {mon}{year}"     ‚Üí "AJFinds - Dec25"
#   "{owner} {mon}'{year}"              ‚Üí "AJ Dec'25"
#   "New Music {mon}{year}"             ‚Üí "New Music Dec25"
MONTHLY_NAME_TEMPLATE = "{owner}{prefix}{mon}{year}"

# ============================================================================
# üé∏ GENRE-SPLIT MONTHLY PLAYLISTS (NEW!)
# ============================================================================

# Enable genre-split monthly playlists
ENABLE_GENRE_SPLIT = True

# Genres to split into: HipHop, Dance, and Other (everything else)
# Note: Sum of all three = total monthly playlist
SPLIT_GENRES = ["HipHop", "Dance", "Other"]

# Genre monthly naming template
# Available variables: {owner}, {prefix}, {genre}, {mon}, {year}
#
# Examples:
#   "{owner}{prefix} {genre} - {mon}{year}"  ‚Üí "AJFinds HipHop - Dec25"
#   "{genre} {mon}'{year}"                   ‚Üí "HipHop Dec'25"
GENRE_MONTHLY_TEMPLATE = "{genre}{prefix}{mon}{year}"

# ============================================================================
# üéõÔ∏è OTHER SETTINGS
# ============================================================================

# ‚ö†Ô∏è Set to False when ready to actually create playlists!
DRY_RUN = True

# Also create master genre playlists (all-time, not monthly)?
CREATE_MASTER_GENRE_PLAYLISTS = True

# Master genre playlist naming template (if enabled)
GENRE_NAME_TEMPLATE = "{owner}am{genre}"

# Max genre playlists to create
MAX_GENRE_PLAYLISTS = 19

# Minimum tracks needed to create a genre playlist
MIN_TRACKS_FOR_GENRE = 20


# ============================================================================
# üîÑ INCREMENTAL UPDATE SETTINGS (prevents duplicates on re-run)
# ============================================================================

# Only process recent months (set to 0 to process all)
# e.g., ONLY_RECENT_MONTHS = 3 will only process last 3 months
ONLY_RECENT_MONTHS = 0

# Skip creating playlists that already exist (just update tracks)
# When True: existing playlists get new tracks added, but won't be recreated
SKIP_EXISTING_PLAYLISTS = False

# Only process current month (for daily automation)
# When True: only creates/updates playlist for the current month
CURRENT_MONTH_ONLY = False

# ============================================================================
print("‚úÖ Configuration loaded!")
print(f"   Owner: {OWNER_NAME}")
print(f"   Prefix: {PREFIX}")
print(f"   Monthly: {MONTHLY_NAME_TEMPLATE}")
print(f"   Genre split enabled: {ENABLE_GENRE_SPLIT}")
print(f"   Genre monthly: {GENRE_MONTHLY_TEMPLATE}")
print(f"   Dry run: {DRY_RUN}")
print(f"   Recent months only: {ONLY_RECENT_MONTHS if ONLY_RECENT_MONTHS else 'All'}")
print(f"   Current month only: {CURRENT_MONTH_ONLY}")

‚úÖ Configuration loaded!
   Owner: AJ
   Prefix: Finds
   Monthly: {owner}{prefix}{mon}{year}
   Genre split enabled: True
   Genre monthly: {genre}{prefix}{mon}{year}
   Dry run: True


## 1Ô∏è‚É£ Setup

In [25]:
# Install dependencies
%pip install -q pandas pyarrow tqdm spotipy

# Add project to path
import sys
from pathlib import Path
PROJECT_ROOT = Path("..").resolve()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))
print(f"‚úÖ Project root: {PROJECT_ROOT}")

Note: you may need to restart the kernel to use updated packages.
‚úÖ Project root: /Users/aryamaan/Desktop/Projects/spotifyframes_repo


In [26]:
import os
from pathlib import Path
import pandas as pd
from tqdm.auto import tqdm
import json
from collections import Counter

DATA_DIR = Path("..") / "data"

print(f"üìÅ Data directory: {DATA_DIR.resolve()}")

üìÅ Data directory: /Users/aryamaan/Desktop/Projects/spotifyframes_repo/data


In [27]:
# Connect to Spotify (required for creating playlists)
try:
    sf
    print("‚úÖ Found existing SpotifyFrames connection.")
except NameError:
    try:
        from spotifyframes import SpotifyFrames
        from spotifyframes.catalog import CacheConfig
        
        # Set credentials (update these!)
        os.environ.setdefault("SPOTIPY_CLIENT_ID", "YOUR_CLIENT_ID")
        os.environ.setdefault("SPOTIPY_CLIENT_SECRET", "YOUR_CLIENT_SECRET")
        os.environ.setdefault("SPOTIPY_REDIRECT_URI", "http://127.0.0.1:8888/callback")
        
        DATA_DIR.mkdir(exist_ok=True)
        sf = SpotifyFrames.from_env(progress=True, cache=CacheConfig(dir=DATA_DIR))
        print("‚úÖ Connected to Spotify!")
    except Exception as e:
        print(f"‚ö†Ô∏è Could not connect to Spotify: {e}")
        print("   Playlist creation will be disabled, but you can still preview.")
        sf = None

‚úÖ Found existing SpotifyFrames connection.


## 2Ô∏è‚É£ Load Data (Owned Playlists Only)

In [28]:
# Load data from parquet
try:
    library
    print("‚úÖ Found existing `library` in memory.")
except NameError:
    playlists_all = pd.read_parquet(DATA_DIR / "playlists.parquet")
    playlist_tracks_all = pd.read_parquet(DATA_DIR / "playlist_tracks.parquet")
    
    # Filter to owned playlists only
    playlists = playlists_all[playlists_all["is_owned"] == True].copy()
    owned_ids = set(playlists["playlist_id"])
    
    library = playlist_tracks_all[playlist_tracks_all["playlist_id"].isin(owned_ids)].copy()
    library = library.merge(playlists[["playlist_id", "name"]], on="playlist_id", how="left")
    library = library.rename(columns={"name": "playlist_name"})
    
    print(f"‚úÖ Loaded library (OWNED PLAYLISTS ONLY)")
    print(f"   ‚Ä¢ {len(playlists)} owned playlists")
    print(f"   ‚Ä¢ {len(library):,} playlist-track links")

try:
    tracks
    print("‚úÖ Found existing `tracks` in memory.")
except NameError:
    tracks = pd.read_parquet(DATA_DIR / "tracks.parquet")
    print(f"‚úÖ Loaded {len(tracks):,} tracks")

try:
    artists
    print("‚úÖ Found existing `artists` in memory.")
except NameError:
    artists = pd.read_parquet(DATA_DIR / "artists.parquet")
    print(f"‚úÖ Loaded {len(artists):,} artists")

‚úÖ Found existing `library` in memory.
‚úÖ Found existing `tracks` in memory.
‚úÖ Found existing `artists` in memory.


## 3Ô∏è‚É£ Extract Liked Songs

In [29]:
# Helper functions
def _pick_col(df, candidates):
    for c in candidates:
        if c in df.columns:
            return c
    return None

def _chunked(seq, n=100):
    for i in range(0, len(seq), n):
        yield seq[i:i+n]

def _to_uri(x):
    x = str(x)
    if x.startswith("spotify:track:"):
        return x
    if len(x) >= 20 and ":" not in x:
        return f"spotify:track:{x}"
    return x

In [30]:
from spotifyframes import LIKED_SONGS_PLAYLIST_ID

# Find columns
playlist_id_col = _pick_col(library, ["playlist_id", "playlistId"])
track_id_col = _pick_col(library, ["track_id", "trackId", "uri"])
added_at_col = _pick_col(library, ["added_at", "playlist_added_at", "track_added_at", "liked_at"])

if not all([playlist_id_col, track_id_col, added_at_col]):
    raise KeyError(f"Missing columns. Found: {list(library.columns)}")

# Filter to Liked Songs
liked = library[library[playlist_id_col].astype(str).str.contains(str(LIKED_SONGS_PLAYLIST_ID))].copy()

if liked.empty:
    name_col = _pick_col(library, ["playlist_name", "playlistName"])
    if name_col:
        liked = library[library[name_col].str.lower().str.contains("liked")].copy()

if liked.empty:
    raise ValueError("‚ùå No Liked Songs found! Make sure to run sync with include_liked_songs=True")

# Parse timestamps
liked[added_at_col] = pd.to_datetime(liked[added_at_col], errors="coerce", utc=True)
liked["_uri"] = liked[track_id_col].map(_to_uri)
liked = liked.sort_values(added_at_col)
liked["month"] = liked[added_at_col].dt.to_period("M").astype(str)

# Build month -> tracks mapping
month_to_tracks = {}
for m, g in liked.groupby("month", sort=True):
    uris = g["_uri"].dropna().tolist()
    seen = set()
    unique = [u for u in uris if not (u in seen or seen.add(u))]
    month_to_tracks[m] = unique

liked_uris = [u for m in sorted(month_to_tracks) for u in month_to_tracks[m]]

print(f"‚úÖ Found {len(liked):,} liked songs")
print(f"üìÖ Spanning {len(month_to_tracks)} months")
print(f"\nüìä Recent months:")
for m in list(month_to_tracks.keys())[-5:]:
    print(f"   {m}: {len(month_to_tracks[m])} tracks")

‚úÖ Found 5,077 liked songs
üìÖ Spanning 51 months

üìä Recent months:
   2025-08: 153 tracks
   2025-09: 69 tracks
   2025-10: 89 tracks
   2025-11: 228 tracks
   2025-12: 43 tracks


  liked["month"] = liked[added_at_col].dt.to_period("M").astype(str)


## 4Ô∏è‚É£ Create Monthly Playlists

In [31]:
# Month name mapping
MONTH_NAMES = {
    "01": "Jan", "02": "Feb", "03": "Mar", "04": "Apr",
    "05": "May", "06": "Jun", "07": "Jul", "08": "Aug",
    "09": "Sep", "10": "Oct", "11": "Nov", "12": "Dec"
}

class MonthlySplitter:
    """Creates monthly playlists with configurable naming."""
    
    def __init__(self, sf, owner_name, prefix, name_template):
        self.sp = sf.sp
        self.user_id = self.sp.current_user()["id"]
        self.owner_name = owner_name
        self.prefix = prefix
        self.name_template = name_template

    def _get_existing(self):
        mapping = {}
        offset = 0
        while True:
            page = self.sp.current_user_playlists(limit=50, offset=offset)
            for item in page.get("items", []):
                mapping[item["name"]] = item["id"]
            if not page.get("next"):
                break
            offset += 50
        return mapping

    def _get_playlist_tracks(self, pid):
        uris = set()
        offset = 0
        while True:
            page = self.sp.playlist_items(pid, fields="items(track(uri)),next", limit=100, offset=offset)
            for it in page.get("items", []):
                if it.get("track", {}).get("uri"):
                    uris.add(it["track"]["uri"])
            if not page.get("next"):
                break
            offset += 100
        return uris

    def _format_name(self, month_str):
        """Format playlist name. month_str is like '2025-12'."""
        parts = month_str.split("-")
        full_year = parts[0] if len(parts) >= 1 else ""
        month_num = parts[1] if len(parts) >= 2 else ""
        
        # Get short month name and 2-digit year
        mon = MONTH_NAMES.get(month_num, month_num)
        year = full_year[2:] if len(full_year) == 4 else full_year
        
        return self.name_template.format(
            owner=self.owner_name,
            prefix=self.prefix,
            month=month_str,  # Full format like "2025-12"
            mon=mon,          # Short name like "Dec"
            year=year         # 2-digit year like "25"
        )

    def run(self, month_to_tracks, dry_run=True):
        existing = self._get_existing()
        print(f"üë§ User: {self.user_id}")
        print(f"üìã Template: {self.name_template}")
        print(f"   Example: {self._format_name('2025-12')}")
        print()
        
        for month, uris in tqdm(sorted(month_to_tracks.items()), desc="Months"):
            if not uris:
                continue
            
            name = self._format_name(month)
            
            if dry_run:
                print(f"[DRY RUN] {name} ‚Üí {len(uris)} tracks")
                continue
            
            if name in existing:
                pid = existing[name]
            else:
                pl = self.sp.user_playlist_create(
                    self.user_id, name, public=False,
                    description=f"Liked songs from {month}"
                )
                pid = pl["id"]
            
            already = self._get_playlist_tracks(pid)
            to_add = [u for u in uris if u not in already]
            
            for chunk in _chunked(to_add, 100):
                self.sp.playlist_add_items(pid, chunk)
        
        print("\n‚úÖ Monthly playlists done!")

In [32]:
# Create monthly playlists
from datetime import datetime

def filter_months(month_dict, only_recent=0, current_only=False):
    """Filter months based on settings."""
    months = sorted(month_dict.keys())
    
    if current_only:
        # Only current month
        current = datetime.now().strftime("%Y-%m")
        return {m: month_dict[m] for m in months if m == current}
    
    if only_recent > 0:
        # Only last N months
        months = months[-only_recent:]
    
    return {m: month_dict[m] for m in months}

# Apply filters
filtered_months = filter_months(
    month_to_tracks,
    only_recent=ONLY_RECENT_MONTHS,
    current_only=CURRENT_MONTH_ONLY
)

print(f"üìÖ Processing {len(filtered_months)} months (out of {len(month_to_tracks)} total)")
if ONLY_RECENT_MONTHS:
    print(f"   (Limited to last {ONLY_RECENT_MONTHS} months)")
if CURRENT_MONTH_ONLY:
    print(f"   (Current month only mode)")
print()

if sf is None:
    print("‚ö†Ô∏è Spotify not connected. Showing preview only...")
    print(f"\nüìã Would create {len(filtered_months)} monthly playlists:")
    for month, uris in list(filtered_months.items())[-5:]:
        parts = month.split("-")
        mon = {"01":"Jan","02":"Feb","03":"Mar","04":"Apr","05":"May","06":"Jun",
               "07":"Jul","08":"Aug","09":"Sep","10":"Oct","11":"Nov","12":"Dec"}.get(parts[1], parts[1])
        year = parts[0][2:] if len(parts[0]) == 4 else parts[0]
        name = MONTHLY_NAME_TEMPLATE.format(owner=OWNER_NAME, prefix=PREFIX, mon=mon, year=year)
        print(f"   {name} ‚Üí {len(uris)} tracks")
else:
    splitter = MonthlySplitter(sf, OWNER_NAME, PREFIX, MONTHLY_NAME_TEMPLATE)
    splitter.run(filtered_months, dry_run=DRY_RUN)


üë§ User: 31iol2qamank24owygxo7kpq533y
üìã Template: {owner}{prefix}{mon}{year}
   Example: AJFindsDec25



Months:   0%|          | 0/51 [00:00<?, ?it/s]

[DRY RUN] AJFindsSep21 ‚Üí 1 tracks
[DRY RUN] AJFindsOct21 ‚Üí 1030 tracks
[DRY RUN] AJFindsNov21 ‚Üí 149 tracks
[DRY RUN] AJFindsDec21 ‚Üí 130 tracks
[DRY RUN] AJFindsJan22 ‚Üí 129 tracks
[DRY RUN] AJFindsFeb22 ‚Üí 5 tracks
[DRY RUN] AJFindsMar22 ‚Üí 72 tracks
[DRY RUN] AJFindsApr22 ‚Üí 53 tracks
[DRY RUN] AJFindsMay22 ‚Üí 72 tracks
[DRY RUN] AJFindsJun22 ‚Üí 106 tracks
[DRY RUN] AJFindsJul22 ‚Üí 45 tracks
[DRY RUN] AJFindsAug22 ‚Üí 185 tracks
[DRY RUN] AJFindsSep22 ‚Üí 21 tracks
[DRY RUN] AJFindsOct22 ‚Üí 43 tracks
[DRY RUN] AJFindsNov22 ‚Üí 6 tracks
[DRY RUN] AJFindsDec22 ‚Üí 413 tracks
[DRY RUN] AJFindsJan23 ‚Üí 206 tracks
[DRY RUN] AJFindsFeb23 ‚Üí 192 tracks
[DRY RUN] AJFindsMar23 ‚Üí 318 tracks
[DRY RUN] AJFindsApr23 ‚Üí 125 tracks
[DRY RUN] AJFindsMay23 ‚Üí 77 tracks
[DRY RUN] AJFindsJun23 ‚Üí 30 tracks
[DRY RUN] AJFindsJul23 ‚Üí 46 tracks
[DRY RUN] AJFindsAug23 ‚Üí 93 tracks
[DRY RUN] AJFindsSep23 ‚Üí 48 tracks
[DRY RUN] AJFindsOct23 ‚Üí 26 tracks
[DRY RUN] AJFindsNov23 ‚Üí 36

## 5Ô∏è‚É£ Create Genre-Split Monthly Playlists

Split each month's liked songs into **HipHop**, **Dance**, and **Other** playlists.

The sum of all three genre playlists = total monthly playlist.

Example output:
- `AJFindsHipHopDec25` - Hip hop/rap tracks added in December 2025
- `AJFindsDanceDec25` - Dance/Electronic/EDM tracks added in December 2025
- `AJFindsOtherDec25` - All other genres (Pop, Rock, Indie, R&B, etc.) added in December 2025

In [33]:
# Genre mapping for split (HipHop and Dance - everything else goes to Other)
GENRE_SPLIT_RULES = {
    "HipHop": ["hip hop", "rap", "trap", "drill", "grime", "crunk", "phonk", 
               "boom bap", "dirty south", "gangsta", "uk drill", "melodic rap",
               "conscious hip hop", "underground hip hop", "southern hip hop"],
    "Dance": ["electronic", "edm", "house", "techno", "trance", "dubstep", 
              "drum and bass", "ambient", "garage", "deep house", "minimal",
              "synthwave", "future bass", "electro", "dance", "electronica",
              "uk garage", "breakbeat", "hardstyle", "progressive house"]
}
# Other genres (rock, pop, indie, r&b, etc.) will go to "Other" category

def get_split_genre(genre_list, include_other=True):
    """Map artist genres to HipHop, Dance, or Other.
    
    Args:
        genre_list: List of genre strings from artist
        include_other: If True, return "Other" for unmatched; else return None
    
    Returns:
        "HipHop" - for hip hop, rap, trap, drill, etc.
        "Dance" - for electronic, EDM, house, techno, etc.
        "Other" - for everything else (rock, pop, indie, r&b, etc.)
    """
    if not genre_list:
        return "Other" if include_other else None
    combined = " ".join(genre_list).lower()
    for genre_name, keywords in GENRE_SPLIT_RULES.items():
        if any(kw in combined for kw in keywords):
            return genre_name
    # Track has genres but doesn't match HipHop or Dance -> goes to Other
    return "Other" if include_other else None

class GenreMonthlySplitter:
    """Creates genre-split monthly playlists (e.g., AJFinds HipHop - Dec25)."""
    
    def __init__(self, sf, owner_name, prefix, template, split_genres):
        self.sp = sf.sp
        self.user_id = self.sp.current_user()["id"]
        self.owner_name = owner_name
        self.prefix = prefix
        self.template = template
        self.split_genres = split_genres

    def _get_existing(self):
        mapping = {}
        offset = 0
        while True:
            page = self.sp.current_user_playlists(limit=50, offset=offset)
            for item in page.get("items", []):
                mapping[item["name"]] = item["id"]
            if not page.get("next"):
                break
            offset += 50
        return mapping

    def _get_playlist_tracks(self, pid):
        uris = set()
        offset = 0
        while True:
            page = self.sp.playlist_items(pid, fields="items(track(uri)),next", limit=100, offset=offset)
            for it in page.get("items", []):
                if it.get("track", {}).get("uri"):
                    uris.add(it["track"]["uri"])
            if not page.get("next"):
                break
            offset += 100
        return uris

    def _format_name(self, month_str, genre):
        parts = month_str.split("-")
        full_year = parts[0] if len(parts) >= 1 else ""
        month_num = parts[1] if len(parts) >= 2 else ""
        mon = MONTH_NAMES.get(month_num, month_num)
        year = full_year[2:] if len(full_year) == 4 else full_year
        
        return self.template.format(
            owner=self.owner_name,
            prefix=self.prefix,
            genre=genre,
            mon=mon,
            year=year
        )

    def run(self, month_to_tracks, track_to_genre_map, dry_run=True):
        existing = self._get_existing()
        print(f"üë§ User: {self.user_id}")
        print(f"üìã Template: {self.template}")
        print(f"   Example: {self._format_name('2025-12', 'HipHop')}")
        print(f"   Genres: {self.split_genres}")
        print()
        
        for month, uris in tqdm(sorted(month_to_tracks.items()), desc="Months"):
            if not uris:
                continue
            
            for genre in self.split_genres:
                # Filter to tracks matching this genre
                genre_uris = [u for u in uris if track_to_genre_map.get(u) == genre]
                
                if not genre_uris:
                    continue
                
                name = self._format_name(month, genre)
                
                if dry_run:
                    print(f"[DRY RUN] {name} ‚Üí {len(genre_uris)} tracks")
                    continue
                
                if name in existing:
                    pid = existing[name]
                else:
                    pl = self.sp.user_playlist_create(
                        self.user_id, name, public=False,
                        description=f"{genre} tracks from {month}"
                    )
                    pid = pl["id"]
                
                already = self._get_playlist_tracks(pid)
                to_add = [u for u in genre_uris if u not in already]
                
                for chunk in _chunked(to_add, 100):
                    self.sp.playlist_add_items(pid, chunk)
        
        print("\n‚úÖ Genre-split monthly playlists done!")

In [34]:
if ENABLE_GENRE_SPLIT:
    print("üîÑ Building track-to-genre mapping...")
    
    # Load track artists
    track_artists = pd.read_parquet(DATA_DIR / "track_artists.parquet")
    
    # Get artist genres
    artist_genres_map = artists.set_index("artist_id")["genres"].to_dict()
    
    # Build track -> split genre mapping
    track_to_split_genre = {}
    liked_track_ids = set(liked["track_id"])
    
    for _, row in track_artists[track_artists["track_id"].isin(liked_track_ids)].iterrows():
        tid = row["track_id"]
        aid = row["artist_id"]
        
        if tid in track_to_split_genre:
            continue  # Already assigned
        
        artist_genres = artist_genres_map.get(aid, [])
        if isinstance(artist_genres, str):
            try:
                import ast
                artist_genres = ast.literal_eval(artist_genres)
            except:
                artist_genres = [artist_genres]
        
        import numpy as np
        if isinstance(artist_genres, np.ndarray):
            artist_genres = list(artist_genres)
        
        split_genre = get_split_genre(artist_genres if artist_genres else [])
        if split_genre:
            # Map track_id to URI
            track_uri = f"spotify:track:{tid}"
            track_to_split_genre[track_uri] = split_genre
    
    print(f"   HipHop tracks: {sum(1 for g in track_to_split_genre.values() if g == 'HipHop')}")
    print(f"   Dance tracks: {sum(1 for g in track_to_split_genre.values() if g == 'Dance')}")
    print(f"   Other tracks: {sum(1 for g in track_to_split_genre.values() if g == 'Other')}")
    
    # Preview or run the genre-split splitter
    if sf is None:
        print("\n‚ö†Ô∏è Spotify not connected. Showing preview only...")
        for month in list(filtered_months.keys())[-3:]:
            for genre in SPLIT_GENRES:
                parts = month.split("-")
                mon = MONTH_NAMES.get(parts[1], parts[1])
                year = parts[0][2:] if len(parts[0]) == 4 else parts[0]
                name = GENRE_MONTHLY_TEMPLATE.format(owner=OWNER_NAME, prefix=PREFIX, genre=genre, mon=mon, year=year)
                genre_uris = [u for u in filtered_months[month] if track_to_split_genre.get(u) == genre]
                if genre_uris:
                    print(f"   {name} ‚Üí {len(genre_uris)} tracks")
    else:
        genre_splitter = GenreMonthlySplitter(
            sf, OWNER_NAME, PREFIX, GENRE_MONTHLY_TEMPLATE, SPLIT_GENRES
        )
        genre_splitter.run(filtered_months, track_to_split_genre, dry_run=DRY_RUN)
else:
    print("‚è≠Ô∏è Genre-split monthly playlists disabled. Set ENABLE_GENRE_SPLIT = True to enable.")

üîÑ Building track-to-genre mapping...
   HipHop tracks: 1270
   Dance tracks: 285
   Other tracks: 3522
üë§ User: 31iol2qamank24owygxo7kpq533y
üìã Template: {genre}{prefix}{mon}{year}
   Example: HipHopFindsDec25
   Genres: ['HipHop', 'Dance', 'Other']



Months:   0%|          | 0/51 [00:00<?, ?it/s]

[DRY RUN] OtherFindsSep21 ‚Üí 1 tracks
[DRY RUN] HipHopFindsOct21 ‚Üí 415 tracks
[DRY RUN] DanceFindsOct21 ‚Üí 21 tracks
[DRY RUN] OtherFindsOct21 ‚Üí 594 tracks
[DRY RUN] HipHopFindsNov21 ‚Üí 17 tracks
[DRY RUN] DanceFindsNov21 ‚Üí 7 tracks
[DRY RUN] OtherFindsNov21 ‚Üí 125 tracks
[DRY RUN] HipHopFindsDec21 ‚Üí 43 tracks
[DRY RUN] DanceFindsDec21 ‚Üí 3 tracks
[DRY RUN] OtherFindsDec21 ‚Üí 84 tracks
[DRY RUN] HipHopFindsJan22 ‚Üí 23 tracks
[DRY RUN] DanceFindsJan22 ‚Üí 19 tracks
[DRY RUN] OtherFindsJan22 ‚Üí 87 tracks
[DRY RUN] HipHopFindsFeb22 ‚Üí 2 tracks
[DRY RUN] OtherFindsFeb22 ‚Üí 3 tracks
[DRY RUN] HipHopFindsMar22 ‚Üí 7 tracks
[DRY RUN] DanceFindsMar22 ‚Üí 4 tracks
[DRY RUN] OtherFindsMar22 ‚Üí 61 tracks
[DRY RUN] HipHopFindsApr22 ‚Üí 11 tracks
[DRY RUN] DanceFindsApr22 ‚Üí 1 tracks
[DRY RUN] OtherFindsApr22 ‚Üí 41 tracks
[DRY RUN] HipHopFindsMay22 ‚Üí 27 tracks
[DRY RUN] DanceFindsMay22 ‚Üí 1 tracks
[DRY RUN] OtherFindsMay22 ‚Üí 44 tracks
[DRY RUN] HipHopFindsJun22 ‚Üí 38 trac

## 6Ô∏è‚É£ Create Master Genre Playlists (Optional)

Creates all-time genre playlists (not monthly).

**Note:** Set `CREATE_MASTER_GENRE_PLAYLISTS = True` in config to enable.

In [35]:
# Broad genre mapping
GENRE_RULES = [
    # Hip-Hop / Rap
    (["hip hop", "rap", "trap", "drill", "grime", "crunk", "bounce", "gangsta", 
      "boom bap", "dirty south", "phonk", "chopped and screwed"], "Hip-Hop"),
    
    # R&B / Soul
    (["r&b", "rnb", "soul", "neo soul", "funk", "quiet storm", "new jack swing", 
      "contemporary r&b", "urban contemporary", "motown", "disco"], "R&B/Soul"),
    
    # Electronic / Dance
    (["electronic", "edm", "house", "techno", "trance", "dubstep", "drum and bass",
      "ambient", "idm", "downtempo", "garage", "breakbeat", "hardstyle",
      "electro", "synthwave", "future bass", "deep house", "minimal"], "Electronic"),
    
    # Rock
    (["rock", "alternative", "grunge", "punk", "emo", "post-punk", "new wave",
      "shoegaze", "psychedelic", "prog", "classic rock", "hard rock", "garage rock"], "Rock"),
    
    # Metal
    (["metal", "heavy metal", "death metal", "black metal", "thrash", "metalcore",
      "nu metal", "doom", "power metal"], "Metal"),
    
    # Indie
    (["indie", "indie rock", "indie pop", "indie folk", "bedroom", "lo-fi", "lofi",
      "dream pop", "art pop", "chamber pop"], "Indie"),
    
    # Pop
    (["pop", "dance pop", "synth pop", "electropop", "teen pop", "bubblegum",
      "adult contemporary"], "Pop"),
    
    # Latin
    (["latin", "reggaeton", "salsa", "bachata", "merengue", "cumbia", "latin pop",
      "urbano latino", "dembow", "latin trap", "bossa nova"], "Latin"),
    
    # World / International
    (["afrobeat", "afrobeats", "afropop", "k-pop", "kpop", "j-pop", "reggae",
      "dancehall", "dub", "ska", "world", "african", "caribbean", "bollywood"], "World"),
    
    # Jazz
    (["jazz", "smooth jazz", "bebop", "swing", "big band", "fusion", "acid jazz"], "Jazz"),
    
    # Classical
    (["classical", "orchestra", "symphony", "opera", "baroque", "romantic",
      "contemporary classical", "neo-classical", "piano"], "Classical"),
    
    # Country / Folk
    (["country", "folk", "americana", "bluegrass", "singer-songwriter", "acoustic",
      "outlaw country", "alt-country", "celtic"], "Country/Folk"),
]

def get_broad_genre(genre_list):
    """Map artist genres to a single broad category."""
    if not genre_list:
        return None
    combined = " ".join(genre_list).lower()
    for keywords, category in GENRE_RULES:
        if any(kw in combined for kw in keywords):
            return category
    return None

print("‚úÖ Genre mapping loaded")
print(f"   Categories: {[r[1] for r in GENRE_RULES]}")

‚úÖ Genre mapping loaded
   Categories: ['Hip-Hop', 'R&B/Soul', 'Electronic', 'Rock', 'Metal', 'Indie', 'Pop', 'Latin', 'World', 'Jazz', 'Classical', 'Country/Folk']


In [36]:
class GenrePlaylistBuilder:
    """Creates master genre playlists (all-time, not monthly) from liked songs."""
    
    def __init__(self, sf, owner_name, name_template, track_artists_df, artists_df):
        self.sp = sf.sp
        self.user_id = self.sp.current_user()["id"]
        self.owner_name = owner_name
        self.name_template = name_template
        self.track_artists_df = track_artists_df
        self.artists_df = artists_df

    def _get_existing(self):
        mapping = {}
        offset = 0
        while True:
            page = self.sp.current_user_playlists(limit=50, offset=offset)
            for item in page.get("items", []):
                mapping[item["name"]] = item["id"]
            if not page.get("next"):
                break
            offset += 50
        return mapping

    def _get_playlist_tracks(self, pid):
        uris = set()
        offset = 0
        while True:
            page = self.sp.playlist_items(pid, fields="items(track(uri)),next", limit=100, offset=offset)
            for it in page.get("items", []):
                if it.get("track", {}).get("uri"):
                    uris.add(it["track"]["uri"])
            if not page.get("next"):
                break
            offset += 100
        return uris

    def build(self, liked_track_ids, liked_uris, max_genres=10, min_tracks=20, dry_run=True):
        print("üîÑ Analyzing track genres for master playlists...")
        
        # Get artist genres from artists_df
        artist_genres_map = self.artists_df.set_index("artist_id")["genres"].to_dict()
        
        # Build track -> broad genre mapping using track_artists
        track_to_genre = {}
        liked_set = set(liked_track_ids)
        
        for _, row in self.track_artists_df[self.track_artists_df["track_id"].isin(liked_set)].iterrows():
            tid = row["track_id"]
            aid = row["artist_id"]
            
            if tid in track_to_genre:
                continue  # Already assigned
            
            artist_genres = artist_genres_map.get(aid, [])
            if isinstance(artist_genres, str):
                try:
                    import ast
                    artist_genres = ast.literal_eval(artist_genres)
                except:
                    artist_genres = [artist_genres]
            
            import numpy as np
            if isinstance(artist_genres, np.ndarray):
                artist_genres = list(artist_genres)
            
            broad_genre = get_broad_genre(artist_genres if artist_genres else [])
            if broad_genre:
                track_to_genre[tid] = broad_genre
        
        # Build URI -> genre mapping
        uri_to_genre = {f"spotify:track:{tid}": g for tid, g in track_to_genre.items()}
        
        # Count and select genres
        genre_counts = Counter([g for g in uri_to_genre.values() if g])
        selected = [g for g, n in genre_counts.most_common(max_genres) if n >= min_tracks]
        
        print(f"\nüìä Genre distribution:")
        for g, n in genre_counts.most_common(15):
            marker = "‚úì" if g in selected else " "
            print(f"   {marker} {g}: {n} tracks")
        
        print(f"\nüéØ Creating {len(selected)} master genre playlists...\n")
        
        existing = self._get_existing()
        
        for genre in selected:
            uris = [u for u in liked_uris if uri_to_genre.get(u) == genre]
            if not uris:
                continue
            
            name = self.name_template.format(owner=self.owner_name, genre=genre)
            
            if dry_run:
                print(f"[DRY RUN] {name} ‚Üí {len(uris)} tracks")
                continue
            
            if name in existing:
                pid = existing[name]
            else:
                pl = self.sp.user_playlist_create(
                    self.user_id, name, public=False,
                    description=f"All liked songs - {genre}"
                )
                pid = pl["id"]
            
            already = self._get_playlist_tracks(pid)
            to_add = [u for u in uris if u not in already]
            
            for chunk in _chunked(to_add, 100):
                self.sp.playlist_add_items(pid, chunk)
        
        print("\n‚úÖ Master genre playlists done!")

In [37]:
# Create master genre playlists (all-time)
if CREATE_MASTER_GENRE_PLAYLISTS:
    if sf is None:
        print("‚ö†Ô∏è Spotify not connected. Cannot create master genre playlists.")
    else:
        # Load track_artists if not already loaded
        try:
            track_artists
        except NameError:
            track_artists = pd.read_parquet(DATA_DIR / "track_artists.parquet")
        
        builder = GenrePlaylistBuilder(
            sf, OWNER_NAME, GENRE_NAME_TEMPLATE,
            track_artists_df=track_artists,
            artists_df=artists
        )
        builder.build(
            liked_track_ids=set(liked["track_id"]),
            liked_uris=liked_uris,
            max_genres=MAX_GENRE_PLAYLISTS,
            min_tracks=MIN_TRACKS_FOR_GENRE,
            dry_run=DRY_RUN
        )
else:
    print("‚è≠Ô∏è Master genre playlists disabled. Set CREATE_MASTER_GENRE_PLAYLISTS = True to enable.")

üîÑ Analyzing track genres for master playlists...

üìä Genre distribution:
   ‚úì Hip-Hop: 1552 tracks
   ‚úì R&B/Soul: 458 tracks
   ‚úì Electronic: 447 tracks
   ‚úì Indie: 231 tracks
   ‚úì Pop: 228 tracks
   ‚úì Rock: 154 tracks
   ‚úì Latin: 34 tracks
     Jazz: 16 tracks
     World: 12 tracks
     Classical: 7 tracks
     Country/Folk: 6 tracks
     Metal: 3 tracks

üéØ Creating 7 master genre playlists...

[DRY RUN] AJamHip-Hop ‚Üí 1552 tracks
[DRY RUN] AJamR&B/Soul ‚Üí 458 tracks
[DRY RUN] AJamElectronic ‚Üí 447 tracks
[DRY RUN] AJamIndie ‚Üí 231 tracks
[DRY RUN] AJamPop ‚Üí 228 tracks
[DRY RUN] AJamRock ‚Üí 154 tracks
[DRY RUN] AJamLatin ‚Üí 34 tracks

‚úÖ Master genre playlists done!


## 7Ô∏è‚É£ Create "Most Listened" Monthly Playlists

Create playlists of your **most played songs each month** based on Spotify streaming history.

**Requirements:**
1. Request your Spotify data export at: https://www.spotify.com/account/privacy/
2. Download the "Extended streaming history" (takes ~30 days)
3. Place the JSON files in the `data/streaming_history/` folder

**Output:**
- `AJTopDec25` - Your most played songs in December 2025
- Configurable: top N tracks per month, minimum play threshold

In [None]:
# ============================================================================
# üéß MOST LISTENED MONTHLY PLAYLISTS CONFIG
# ============================================================================

# Enable this feature
ENABLE_TOP_PLAYED = True

# Top N tracks per month
TOP_N_PER_MONTH = 25

# Minimum plays to include a track
MIN_PLAYS = 3

# Minimum seconds played to count as a "play" (skip short plays)
MIN_SECONDS_PLAYED = 30

# Naming template for top played playlists
# Available: {owner}, {prefix}, {mon}, {year}, {n} (top N)
TOP_PLAYED_TEMPLATE = "{owner}Top{mon}{year}"

# Path to streaming history JSON files
STREAMING_HISTORY_DIR = DATA_DIR / "streaming_history"

print(f"‚úÖ Top Played config loaded")
print(f"   Top {TOP_N_PER_MONTH} tracks per month")
print(f"   Min plays: {MIN_PLAYS}")
print(f"   Template: {TOP_PLAYED_TEMPLATE}")

In [None]:
import glob

def load_streaming_history(history_dir):
    """Load Spotify streaming history from JSON files.
    
    Spotify exports files like:
    - StreamingHistory0.json, StreamingHistory1.json, ... (basic)
    - endsong_0.json, endsong_1.json, ... (extended)
    """
    all_streams = []
    
    if not history_dir.exists():
        return None
    
    # Try extended format first (more detailed)
    extended_files = list(history_dir.glob("endsong*.json"))
    basic_files = list(history_dir.glob("StreamingHistory*.json"))
    
    files = extended_files if extended_files else basic_files
    
    if not files:
        return None
    
    for f in files:
        with open(f, 'r', encoding='utf-8') as fp:
            try:
                data = json.load(fp)
                all_streams.extend(data)
            except json.JSONDecodeError:
                print(f"‚ö†Ô∏è Could not parse {f.name}")
    
    if not all_streams:
        return None
    
    # Convert to DataFrame
    df = pd.DataFrame(all_streams)
    
    # Handle both extended and basic formats
    if 'ts' in df.columns:
        # Extended format
        df['end_time'] = pd.to_datetime(df['ts'], utc=True)
        df['ms_played'] = df.get('ms_played', 0)
        df['track_uri'] = df.get('spotify_track_uri', '')
        df['track_name'] = df.get('master_metadata_track_name', '')
        df['artist_name'] = df.get('master_metadata_album_artist_name', '')
    else:
        # Basic format
        df['end_time'] = pd.to_datetime(df['endTime'], utc=True)
        df['ms_played'] = df.get('msPlayed', 0)
        df['track_name'] = df.get('trackName', '')
        df['artist_name'] = df.get('artistName', '')
        df['track_uri'] = ''  # Basic format doesn't have URIs
    
    return df


class TopPlayedMonthlyBuilder:
    """Creates monthly playlists of most played songs."""
    
    def __init__(self, sf, owner_name, template, tracks_df=None):
        self.sp = sf.sp
        self.user_id = self.sp.current_user()["id"]
        self.owner_name = owner_name
        self.template = template
        self.tracks_df = tracks_df
    
    def _get_existing(self):
        mapping = {}
        offset = 0
        while True:
            page = self.sp.current_user_playlists(limit=50, offset=offset)
            for item in page.get("items", []):
                mapping[item["name"]] = item["id"]
            if not page.get("next"):
                break
            offset += 50
        return mapping
    
    def _get_playlist_tracks(self, pid):
        uris = set()
        offset = 0
        while True:
            page = self.sp.playlist_items(pid, fields="items(track(uri)),next", limit=100, offset=offset)
            for it in page.get("items", []):
                if it.get("track", {}).get("uri"):
                    uris.add(it["track"]["uri"])
            if not page.get("next"):
                break
            offset += 100
        return uris
    
    def _search_track(self, track_name, artist_name):
        """Search for a track URI by name and artist."""
        try:
            q = f'track:{track_name} artist:{artist_name}'
            results = self.sp.search(q, type='track', limit=1)
            items = results.get('tracks', {}).get('items', [])
            if items:
                return items[0]['uri']
        except:
            pass
        return None
    
    def _format_name(self, month_str, top_n):
        parts = month_str.split("-")
        full_year = parts[0] if len(parts) >= 1 else ""
        month_num = parts[1] if len(parts) >= 2 else ""
        mon = MONTH_NAMES.get(month_num, month_num)
        year = full_year[2:] if len(full_year) == 4 else full_year
        
        return self.template.format(
            owner=self.owner_name,
            prefix=PREFIX,
            mon=mon,
            year=year,
            n=top_n
        )
    
    def build(self, streams_df, top_n=25, min_plays=3, min_seconds=30, dry_run=True):
        """Build top played monthly playlists."""
        print(f"üéß Building Top {top_n} Played monthly playlists...")
        print(f"   Min plays: {min_plays}, Min seconds: {min_seconds}\n")
        
        # Filter to valid plays
        valid = streams_df[
            (streams_df['ms_played'] >= min_seconds * 1000) &
            (streams_df['track_name'].notna()) &
            (streams_df['track_name'] != '')
        ].copy()
        
        # Add month column
        valid['month'] = valid['end_time'].dt.to_period('M').astype(str)
        
        # Group by month and track
        monthly_counts = valid.groupby(['month', 'track_name', 'artist_name', 'track_uri']).size().reset_index(name='plays')
        
        # Filter to min plays
        monthly_counts = monthly_counts[monthly_counts['plays'] >= min_plays]
        
        # Get existing playlists
        existing = self._get_existing() if not dry_run else {}
        
        months = sorted(monthly_counts['month'].unique())
        print(f"üìÖ Found {len(months)} months with streaming data\n")
        
        for month in tqdm(months, desc="Building playlists"):
            month_data = monthly_counts[monthly_counts['month'] == month]
            top_tracks = month_data.nlargest(top_n, 'plays')
            
            if len(top_tracks) == 0:
                continue
            
            name = self._format_name(month, top_n)
            
            # Get URIs for tracks
            uris = []
            for _, row in top_tracks.iterrows():
                uri = row['track_uri']
                
                # If no URI, search for it
                if not uri or not str(uri).startswith('spotify:track:'):
                    if not dry_run:
                        uri = self._search_track(row['track_name'], row['artist_name'])
                    else:
                        uri = None
                
                if uri:
                    uris.append(uri)
            
            if dry_run:
                print(f"[DRY RUN] {name} ‚Üí {len(top_tracks)} tracks")
                # Show top 3 for preview
                for _, row in top_tracks.head(3).iterrows():
                    print(f"           #{row['plays']} plays: {row['artist_name']} - {row['track_name'][:30]}")
                continue
            
            if not uris:
                continue
            
            # Create or get playlist
            if name in existing:
                pid = existing[name]
            else:
                pl = self.sp.user_playlist_create(
                    self.user_id, name, public=False,
                    description=f"Top {top_n} most played songs in {month}"
                )
                pid = pl["id"]
            
            # Add tracks
            already = self._get_playlist_tracks(pid)
            to_add = [u for u in uris if u not in already]
            
            for chunk in _chunked(to_add, 100):
                self.sp.playlist_add_items(pid, chunk)
        
        print("\n‚úÖ Top Played monthly playlists done!")

In [None]:
# Run the Top Played monthly playlist builder
if ENABLE_TOP_PLAYED:
    print(f"üìÇ Looking for streaming history in: {STREAMING_HISTORY_DIR}")
    
    streaming_df = load_streaming_history(STREAMING_HISTORY_DIR)
    
    if streaming_df is None:
        print("\n‚ö†Ô∏è No streaming history found!")
        print("\nüìã To enable this feature:")
        print("   1. Go to https://www.spotify.com/account/privacy/")
        print("   2. Request 'Extended streaming history'")
        print("   3. Wait for email (up to 30 days)")
        print("   4. Download and extract to data/streaming_history/")
        print("\n   Expected files:")
        print("   - endsong_0.json, endsong_1.json, ... (extended)")
        print("   - OR StreamingHistory0.json, ... (basic)")
    else:
        print(f"\n‚úÖ Loaded {len(streaming_df):,} streaming records")
        print(f"   Date range: {streaming_df['end_time'].min()} to {streaming_df['end_time'].max()}")
        
        if sf is None:
            print("\n‚ö†Ô∏è Spotify not connected. Showing preview only...")
            # Just show stats
            valid = streaming_df[streaming_df['ms_played'] >= MIN_SECONDS_PLAYED * 1000]
            valid['month'] = valid['end_time'].dt.to_period('M').astype(str)
            print(f"\nüìä Preview of recent months:")
            for month in sorted(valid['month'].unique())[-5:]:
                count = len(valid[valid['month'] == month])
                print(f"   {month}: {count} valid plays")
        else:
            builder = TopPlayedMonthlyBuilder(
                sf, OWNER_NAME, TOP_PLAYED_TEMPLATE,
                tracks_df=tracks if 'tracks' in dir() else None
            )
            builder.build(
                streaming_df,
                top_n=TOP_N_PER_MONTH,
                min_plays=MIN_PLAYS,
                min_seconds=MIN_SECONDS_PLAYED,
                dry_run=DRY_RUN
            )
else:
    print("‚è≠Ô∏è Top Played monthly playlists disabled. Set ENABLE_TOP_PLAYED = True to enable.")

## 7Ô∏è‚É£ Create "Most Listened" Monthly Playlists

Create playlists of your **most played songs each month** based on Spotify streaming history.

**Requirements:**
1. Request your Spotify data export at: https://www.spotify.com/account/privacy/
2. Download the "Extended streaming history" (takes ~30 days)
3. Place the JSON files in the `data/streaming_history/` folder

**Output:**
- `AJTopDec25` - Your most played songs in December 2025
- Configurable: top N tracks per month, minimum play threshold


---

## ‚úÖ Ready to Create?

If the dry run looks good, go back to the **Configuration** cell and set:
```python
DRY_RUN = False
```

Then re-run the cells to actually create the playlists!

**Tip:** You can also customize the naming templates:
- Monthly: `"{year} ‚Äî {mon}"` ‚Üí "2024 ‚Äî 01"
- Genre: `"üé∏ {owner}'s {genre}"` ‚Üí "üé∏ AJ's Rock"