# üóëÔ∏è Identify Redundant Playlists & Reorganization

Analyze your playlists to identify **redundant playlists** that can be safely deleted or merged without losing information.

**What this notebook does:**
- üîç Finds playlists with high track overlap (duplicates/near-duplicates)
- üìä Identifies playlists that are subsets of other playlists
- üéØ Suggests playlists safe to delete
- üìã Proposes a reorganized library structure
- üí° Recommends consolidation strategies

**Prerequisites:** Run `01_sync_data.ipynb` first to download your library.


## 1Ô∏è‚É£ Setup


In [15]:
# Install dependencies
%pip install -q pandas pyarrow tqdm

# Add project to path
import sys
from pathlib import Path
PROJECT_ROOT = Path("..").resolve()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print(f"‚úÖ Project root: {PROJECT_ROOT}")


Note: you may need to restart the kernel to use updated packages.
‚úÖ Project root: /Users/aryamaan/Desktop/Projects/spotim8


In [16]:
import pandas as pd
import numpy as np
from pathlib import Path
from collections import defaultdict, Counter
from typing import Dict, List, Set, Tuple
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

from spotim8.analysis import LibraryAnalyzer, PlaylistSimilarityEngine

# Data directory
DATA_DIR = PROJECT_ROOT / "data"
print(f"üìÅ Data directory: {DATA_DIR.resolve()}")


üìÅ Data directory: /Users/aryamaan/Desktop/Projects/spotim8/data


## 2Ô∏è‚É£ Load Library Data


In [17]:
# Load all data (owned playlists only for analysis)
analyzer = LibraryAnalyzer(DATA_DIR).load()

# Get owned playlists
playlists = analyzer.playlists_all[analyzer.playlists_all['is_owned'] == True].copy()
playlist_tracks = analyzer.playlist_tracks_all[
    analyzer.playlist_tracks_all['playlist_id'].isin(playlists['playlist_id'])
].copy()

print(f"‚úÖ Loaded {len(playlists)} owned playlists")
print(f"üìä Total playlist-track links: {len(playlist_tracks):,}")

# Exclude Liked Songs from redundancy analysis (it's the master playlist)
liked_id = analyzer.liked_songs_id
if liked_id:
    playlists = playlists[playlists['playlist_id'] != liked_id].copy()
    print(f"   (Excluded Liked Songs from analysis)")

print(f"\nüìã Analyzing {len(playlists)} playlists for redundancy...")


‚úÖ Loaded 650 playlists, 5,280 tracks
‚úÖ Loaded 221 owned playlists
üìä Total playlist-track links: 41,889
   (Excluded Liked Songs from analysis)

üìã Analyzing 220 playlists for redundancy...


## 3Ô∏è‚É£ Build Playlist Track Sets


In [18]:
# Build track sets for each playlist
playlist_track_sets: Dict[str, Set[str]] = {}
playlist_info = {}

for pid in tqdm(playlists['playlist_id'], desc="Building track sets"):
    tracks = set(playlist_tracks[playlist_tracks['playlist_id'] == pid]['track_id'].unique())
    playlist_track_sets[pid] = tracks
    
    info = playlists[playlists['playlist_id'] == pid].iloc[0]
    playlist_info[pid] = {
        'name': info.get('name', 'Unknown'),
        'track_count': len(tracks),
        'is_liked_songs': info.get('is_liked_songs', False),
    }

print(f"‚úÖ Built track sets for {len(playlist_track_sets)} playlists")
print(f"üìä Total unique tracks across all playlists: {len(set().union(*playlist_track_sets.values())):,}")


Building track sets: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 220/220 [00:00<00:00, 356.17it/s]

‚úÖ Built track sets for 218 playlists
üìä Total unique tracks across all playlists: 5,278





## 4Ô∏è‚É£ Find Redundant Playlists

We'll identify redundancy using multiple criteria:
1. **Exact duplicates** - Same tracks
2. **Subsets** - All tracks in one playlist are in another
3. **High overlap** - Very similar track sets (>90% overlap)
4. **Near-duplicates** - High similarity with minor differences


In [19]:
def jaccard_similarity(set1: Set, set2: Set) -> float:
    """Calculate Jaccard similarity (intersection / union)."""
    if not set1 and not set2:
        return 1.0
    if not set1 or not set2:
        return 0.0
    intersection = len(set1 & set2)
    union = len(set1 | set2)
    return intersection / union if union > 0 else 0.0

def overlap_ratio(set1: Set, set2: Set) -> Tuple[float, float]:
    """Calculate overlap ratios in both directions.
    
    Returns:
        (ratio of set1 in set2, ratio of set2 in set1)
    """
    if not set1 or not set2:
        return (0.0, 0.0)
    intersection = len(set1 & set2)
    return (intersection / len(set1), intersection / len(set2))

# Find redundant playlists
redundant_groups = []
exact_duplicates = []
subsets = []  # (subset_playlist, superset_playlist)
high_overlap = []  # (>90% similarity)
near_duplicates = []  # (80-90% similarity)

playlist_ids = list(playlist_track_sets.keys())

print("üîç Analyzing playlist pairs...")
for i in tqdm(range(len(playlist_ids)), desc="Comparing playlists"):
    pid1 = playlist_ids[i]
    set1 = playlist_track_sets[pid1]
    
    if not set1:  # Skip empty playlists
        continue
    
    for j in range(i + 1, len(playlist_ids)):
        pid2 = playlist_ids[j]
        set2 = playlist_track_sets[pid2]
        
        if not set2:  # Skip empty playlists
            continue
        
        # Check for exact duplicates
        if set1 == set2:
            exact_duplicates.append((pid1, pid2))
            continue
        
        # Check for subsets
        if set1.issubset(set2):
            subsets.append((pid1, pid2, len(set1), len(set2)))
        elif set2.issubset(set1):
            subsets.append((pid2, pid1, len(set2), len(set1)))
        
        # Calculate similarity
        jaccard = jaccard_similarity(set1, set2)
        overlap1, overlap2 = overlap_ratio(set1, set2)
        
        if jaccard > 0.9:
            high_overlap.append((pid1, pid2, jaccard, overlap1, overlap2))
        elif jaccard > 0.8:
            near_duplicates.append((pid1, pid2, jaccard, overlap1, overlap2))

print(f"\n‚úÖ Analysis complete!")
print(f"   Exact duplicates: {len(exact_duplicates)}")
print(f"   Subsets: {len(subsets)}")
print(f"   High overlap (>90%): {len(high_overlap)}")
print(f"   Near duplicates (80-90%): {len(near_duplicates)}")


üîç Analyzing playlist pairs...


Comparing playlists: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 218/218 [00:00<00:00, 653.13it/s]


‚úÖ Analysis complete!
   Exact duplicates: 2
   Subsets: 138
   High overlap (>90%): 0
   Near duplicates (80-90%): 2





In [20]:
# Display exact duplicates
if exact_duplicates:
    print("=" * 80)
    print("üîÑ EXACT DUPLICATES (Same tracks, can delete one)")
    print("=" * 80)
    for pid1, pid2 in exact_duplicates:
        info1 = playlist_info[pid1]
        info2 = playlist_info[pid2]
        print(f"\nüìã {info1['name']} ({info1['track_count']} tracks)")
        print(f"   ‚ö° Duplicate of: {info2['name']} ({info2['track_count']} tracks)")
        print(f"   üí° Recommendation: Delete one (keep the one with better name)")
else:
    print("‚úÖ No exact duplicates found")


üîÑ EXACT DUPLICATES (Same tracks, can delete one)

üìã OtherFindsDec24 (1 tracks)
   ‚ö° Duplicate of: AJFindsDec24 (1 tracks)
   üí° Recommendation: Delete one (keep the one with better name)

üìã OtherFindsOct24 (3 tracks)
   ‚ö° Duplicate of: AJFindsOct24 (3 tracks)
   üí° Recommendation: Delete one (keep the one with better name)


In [21]:
# Display subsets (playlists fully contained in others)
if subsets:
    print("\n" + "=" * 80)
    print("üì¶ SUBSETS (Fully contained in another playlist - SAFE TO DELETE)")
    print("=" * 80)
    
    subset_df = []
    for subset_pid, superset_pid, subset_size, superset_size in subsets:
        subset_info = playlist_info[subset_pid]
        superset_info = playlist_info[superset_pid]
        subset_df.append({
            'Subset Playlist': subset_info['name'],
            'Subset Tracks': subset_size,
            'Contained In': superset_info['name'],
            'Superset Tracks': superset_size,
            'Coverage': f"{subset_size}/{superset_size} ({100*subset_size/superset_size:.1f}%)"
        })
    
    df = pd.DataFrame(subset_df)
    df = df.sort_values('Subset Tracks', ascending=False)
    print(f"\nüìä Found {len(df)} playlists that are subsets of others:\n")
    print(df.to_string(index=False))
    
    # Identify unique subset playlists (safe to delete)
    safe_to_delete = set([subset_pid for subset_pid, _, _, _ in subsets])
    print(f"\n‚úÖ {len(safe_to_delete)} playlists can be safely deleted (all tracks are in other playlists)")
else:
    print("\n‚úÖ No subset playlists found")



üì¶ SUBSETS (Fully contained in another playlist - SAFE TO DELETE)

üìä Found 138 playlists that are subsets of others:

 Subset Playlist  Subset Tracks        Contained In  Superset Tracks        Coverage
 OtherFindsNov25            200        AJFindsNov25              228 200/228 (87.7%)
 OtherFindsOct25             71        AJFindsOct25               89   71/89 (79.8%)
 OtherFindsAug25             67        AJFindsAug25              153  67/153 (43.8%)
HipHopFindsJun24             59         AJamHip-Hop             1557  59/1557 (3.8%)
HipHopFindsJun24             59        AJFindsJun24              113  59/113 (52.2%)
 DanceFindsAug25             52      AJamElectronic              481  52/481 (10.8%)
 DanceFindsAug25             52        AJFindsAug25              153  52/153 (34.0%)
 OtherFindsMay25             48        AJFindsMay25               81   48/81 (59.3%)
 OtherFindsFeb24             48        AJFindsFeb24               88   48/88 (54.5%)
 OtherFindsJun24          

In [22]:
# Display high overlap playlists
if high_overlap:
    print("\n" + "=" * 80)
    print("üéØ HIGH OVERLAP (>90% similarity - Likely redundant)")
    print("=" * 80)
    
    overlap_df = []
    for pid1, pid2, jaccard, overlap1, overlap2 in high_overlap:
        info1 = playlist_info[pid1]
        info2 = playlist_info[pid2]
        overlap_df.append({
            'Playlist 1': info1['name'],
            'Tracks 1': info1['track_count'],
            'Playlist 2': info2['name'],
            'Tracks 2': info2['track_count'],
            'Similarity': f"{jaccard*100:.1f}%",
            'P1 in P2': f"{overlap1*100:.1f}%",
            'P2 in P1': f"{overlap2*100:.1f}%",
        })
    
    df = pd.DataFrame(overlap_df)
    df = df.sort_values('Similarity', ascending=False)
    print(f"\nüìä Found {len(df)} playlist pairs with >90% similarity:\n")
    print(df.to_string(index=False))
    
    print("\nüí° Recommendations:")
    print("   - If one playlist is much smaller, consider merging into the larger one")
    print("   - If playlists serve different purposes, keep both but remove duplicate tracks")
else:
    print("\n‚úÖ No high-overlap playlists found")



‚úÖ No high-overlap playlists found


In [23]:
# Display near-duplicates
if near_duplicates:
    print("\n" + "=" * 80)
    print("üîó NEAR-DUPLICATES (80-90% similarity - Review for consolidation)")
    print("=" * 80)
    
    near_df = []
    for pid1, pid2, jaccard, overlap1, overlap2 in near_duplicates[:20]:  # Show top 20
        info1 = playlist_info[pid1]
        info2 = playlist_info[pid2]
        near_df.append({
            'Playlist 1': info1['name'],
            'Tracks 1': info1['track_count'],
            'Playlist 2': info2['name'],
            'Tracks 2': info2['track_count'],
            'Similarity': f"{jaccard*100:.1f}%",
            'P1 in P2': f"{overlap1*100:.1f}%",
            'P2 in P1': f"{overlap2*100:.1f}%",
        })
    
    df = pd.DataFrame(near_df)
    df = df.sort_values('Similarity', ascending=False)
    print(f"\nüìä Top {len(df)} near-duplicate pairs (showing first 20):\n")
    print(df.to_string(index=False))
    
    if len(near_duplicates) > 20:
        print(f"\n   ... and {len(near_duplicates) - 20} more pairs")
    
    print("\nüí° Recommendations:")
    print("   - Review these pairs manually")
    print("   - Consider merging if they serve the same purpose")
    print("   - Keep separate if they have distinct purposes")
else:
    print("\n‚úÖ No near-duplicate playlists found")



üîó NEAR-DUPLICATES (80-90% similarity - Review for consolidation)

üìä Top 2 near-duplicate pairs (showing first 20):

     Playlist 1  Tracks 1   Playlist 2  Tracks 2 Similarity P1 in P2 P2 in P1
OtherFindsNov25       200 AJFindsNov25       228      87.7%   100.0%    87.7%
OtherFindsMar24         5 AJFindsMar24         6      83.3%   100.0%    83.3%

üí° Recommendations:
   - Review these pairs manually
   - Consider merging if they serve the same purpose
   - Keep separate if they have distinct purposes


## 6Ô∏è‚É£ Comprehensive Redundancy Analysis

Now let's identify ALL playlists that can be safely deleted or consolidated.


In [24]:
# Build comprehensive list of playlists safe to delete
safe_to_delete = set()
consolidation_suggestions = []

# 1. Exact duplicates - keep the first one, delete others
for pid1, pid2 in exact_duplicates:
    # Keep the one with more descriptive name or more tracks
    info1 = playlist_info[pid1]
    info2 = playlist_info[pid2]
    if len(info1['name']) >= len(info2['name']):
        safe_to_delete.add(pid2)
        consolidation_suggestions.append({
            'action': 'delete',
            'playlist_id': pid2,
            'playlist_name': info2['name'],
            'reason': f'Exact duplicate of "{info1["name"]}"',
            'tracks_lost': 0,
            'alternative': info1['name']
        })
    else:
        safe_to_delete.add(pid1)
        consolidation_suggestions.append({
            'action': 'delete',
            'playlist_id': pid1,
            'playlist_name': info1['name'],
            'reason': f'Exact duplicate of "{info2["name"]}"',
            'tracks_lost': 0,
            'alternative': info2['name']
        })

# 2. Subsets - safe to delete (all tracks are in superset)
for subset_pid, superset_pid, subset_size, superset_size in subsets:
    if subset_pid not in safe_to_delete:
        subset_info = playlist_info[subset_pid]
        superset_info = playlist_info[superset_pid]
        safe_to_delete.add(subset_pid)
        consolidation_suggestions.append({
            'action': 'delete',
            'playlist_id': subset_pid,
            'playlist_name': subset_info['name'],
            'reason': f'All {subset_size} tracks are in "{superset_info["name"]}" ({superset_size} tracks)',
            'tracks_lost': 0,
            'alternative': superset_info['name']
        })

# 3. High overlap - suggest merging
for pid1, pid2, jaccard, overlap1, overlap2 in high_overlap:
    info1 = playlist_info[pid1]
    info2 = playlist_info[pid2]
    
    # Determine which to keep (prefer larger or better name)
    if info1['track_count'] > info2['track_count']:
        keep_pid, delete_pid = pid1, pid2
        keep_info, delete_info = info1, info2
        missing_tracks = len(playlist_track_sets[pid2] - playlist_track_sets[pid1])
    elif info2['track_count'] > info1['track_count']:
        keep_pid, delete_pid = pid2, pid1
        keep_info, delete_info = info2, info1
        missing_tracks = len(playlist_track_sets[pid1] - playlist_track_sets[pid2])
    else:
        # Same size, keep the one with longer name (usually more descriptive)
        if len(info1['name']) >= len(info2['name']):
            keep_pid, delete_pid = pid1, pid2
            keep_info, delete_info = info1, info2
            missing_tracks = len(playlist_track_sets[pid2] - playlist_track_sets[pid1])
        else:
            keep_pid, delete_pid = pid2, pid1
            keep_info, delete_info = info2, info1
            missing_tracks = len(playlist_track_sets[pid1] - playlist_track_sets[pid2])
    
    if delete_pid not in safe_to_delete:
        safe_to_delete.add(delete_pid)
        consolidation_suggestions.append({
            'action': 'merge',
            'playlist_id': delete_pid,
            'playlist_name': delete_info['name'],
            'reason': f'{jaccard*100:.1f}% similar to "{keep_info["name"]}"',
            'tracks_lost': missing_tracks,
            'alternative': f'Merge into "{keep_info["name"]}" (add {missing_tracks} missing tracks)'
        })

print(f"‚úÖ Identified {len(safe_to_delete)} playlists that can be safely deleted/merged")
print(f"üìä Total consolidation suggestions: {len(consolidation_suggestions)}")


‚úÖ Identified 66 playlists that can be safely deleted/merged
üìä Total consolidation suggestions: 66


In [25]:
# Display comprehensive deletion/consolidation recommendations
if consolidation_suggestions:
    print("\n" + "=" * 80)
    print("üìã CONSOLIDATION RECOMMENDATIONS")
    print("=" * 80)
    
    df = pd.DataFrame(consolidation_suggestions)
    df = df.sort_values(['tracks_lost', 'playlist_name'])
    
    # Separate by action type
    delete_actions = df[df['action'] == 'delete']
    merge_actions = df[df['action'] == 'merge']
    
    if len(delete_actions) > 0:
        print(f"\nüóëÔ∏è  SAFE TO DELETE ({len(delete_actions)} playlists - 0 tracks lost):")
        print("-" * 80)
        for _, row in delete_actions.iterrows():
            print(f"   ‚Ä¢ {row['playlist_name']}")
            print(f"     ‚Üí {row['reason']}")
            print(f"     ‚Üí Keep: {row['alternative']}")
            print()
    
    if len(merge_actions) > 0:
        print(f"\nüîÄ MERGE RECOMMENDATIONS ({len(merge_actions)} playlists):")
        print("-" * 80)
        for _, row in merge_actions.iterrows():
            print(f"   ‚Ä¢ {row['playlist_name']}")
            print(f"     ‚Üí {row['reason']}")
            if row['tracks_lost'] > 0:
                print(f"     ‚Üí ‚ö†Ô∏è  {row['tracks_lost']} unique tracks would need to be added to {row['alternative']}")
            else:
                print(f"     ‚Üí ‚úÖ No tracks lost - safe to merge")
            print()
    
    # Summary statistics
    total_tracks_lost = df['tracks_lost'].sum()
    zero_loss = len(df[df['tracks_lost'] == 0])
    
    print("\n" + "=" * 80)
    print("üìä SUMMARY")
    print("=" * 80)
    print(f"   Total playlists recommended for deletion/merge: {len(df)}")
    print(f"   Playlists with zero track loss: {zero_loss}")
    print(f"   Total unique tracks that would need to be added: {total_tracks_lost}")
    print(f"   Current total playlists: {len(playlists)}")
    print(f"   After consolidation: {len(playlists) - len(df)} playlists")
    print(f"   Reduction: {len(df)} playlists ({100*len(df)/len(playlists):.1f}%)")
else:
    print("\n‚úÖ No consolidation recommendations - your library is well organized!")



üìã CONSOLIDATION RECOMMENDATIONS

üóëÔ∏è  SAFE TO DELETE (66 playlists - 0 tracks lost):
--------------------------------------------------------------------------------
   ‚Ä¢ AJFindsDec24
     ‚Üí Exact duplicate of "OtherFindsDec24"
     ‚Üí Keep: OtherFindsDec24

   ‚Ä¢ AJFindsOct24
     ‚Üí Exact duplicate of "OtherFindsOct24"
     ‚Üí Keep: OtherFindsOct24

   ‚Ä¢ DanceFindsApr24
     ‚Üí All 1 tracks are in "AJamPop" (356 tracks)
     ‚Üí Keep: AJamPop

   ‚Ä¢ DanceFindsApr25
     ‚Üí All 19 tracks are in "AJamElectronic" (481 tracks)
     ‚Üí Keep: AJamElectronic

   ‚Ä¢ DanceFindsAug24
     ‚Üí All 4 tracks are in "AJamElectronic" (481 tracks)
     ‚Üí Keep: AJamElectronic

   ‚Ä¢ DanceFindsAug25
     ‚Üí All 52 tracks are in "AJamElectronic" (481 tracks)
     ‚Üí Keep: AJamElectronic

   ‚Ä¢ DanceFindsDec25
     ‚Üí All 27 tracks are in "AJamElectronic" (481 tracks)
     ‚Üí Keep: AJamElectronic

   ‚Ä¢ DanceFindsFeb24
     ‚Üí All 10 tracks are in "AJFindsFeb24" (88 trac

## 7Ô∏è‚É£ Detailed Track-Level Analysis

For merge recommendations, let's see exactly which tracks would need to be added.


In [26]:
# For merge actions, show which tracks need to be added
merge_details = []

for suggestion in consolidation_suggestions:
    if suggestion['action'] == 'merge' and suggestion['tracks_lost'] > 0:
        delete_pid = suggestion['playlist_id']
        delete_tracks = playlist_track_sets[delete_pid]
        
        # Find the playlist to merge into
        keep_name = suggestion['alternative'].split('"')[1] if '"' in suggestion['alternative'] else suggestion['alternative']
        keep_pid = None
        for pid, info in playlist_info.items():
            if info['name'] == keep_name:
                keep_pid = pid
                break
        
        if keep_pid:
            keep_tracks = playlist_track_sets[keep_pid]
            missing_tracks = delete_tracks - keep_tracks
            
            if missing_tracks:
                # Get track names
                missing_track_ids = list(missing_tracks)[:10]  # Show first 10
                tracks_df = analyzer.tracks_all[analyzer.tracks_all['track_id'].isin(missing_track_ids)]
                track_names = tracks_df[['name']].values.flatten().tolist() if len(tracks_df) > 0 else []
                
                merge_details.append({
                    'Delete': suggestion['playlist_name'],
                    'Merge Into': keep_name,
                    'Missing Tracks': len(missing_tracks),
                    'Sample Tracks': ', '.join(track_names[:5]) if track_names else 'N/A'
                })

if merge_details:
    print("üìã Detailed Merge Analysis (tracks that need to be added):")
    print("=" * 80)
    df = pd.DataFrame(merge_details)
    for _, row in df.iterrows():
        print(f"\nüóëÔ∏è  Delete: {row['Delete']}")
        print(f"   ‚Üí Merge into: {row['Merge Into']}")
        print(f"   ‚Üí Add {row['Missing Tracks']} tracks")
        if row['Sample Tracks'] != 'N/A':
            print(f"   ‚Üí Sample: {row['Sample Tracks']}...")
else:
    print("‚úÖ All merge recommendations have zero track loss!")


‚úÖ All merge recommendations have zero track loss!


## 8Ô∏è‚É£ Reorganization Strategy

Based on the analysis, here's a suggested reorganization of your library.


In [27]:
# Build reorganization plan
reorganization_plan = {
    'delete': [],
    'merge': [],
    'keep': []
}

# Categorize all playlists
all_playlist_ids = set(playlist_track_sets.keys())
to_delete_ids = safe_to_delete
to_keep_ids = all_playlist_ids - to_delete_ids

# Build merge groups
merge_groups = defaultdict(list)
for suggestion in consolidation_suggestions:
    if suggestion['action'] == 'merge':
        keep_name = suggestion['alternative'].split('"')[1] if '"' in suggestion['alternative'] else suggestion['alternative']
        merge_groups[keep_name].append(suggestion['playlist_name'])

# Organize recommendations
for suggestion in consolidation_suggestions:
    if suggestion['action'] == 'delete':
        reorganization_plan['delete'].append({
            'name': suggestion['playlist_name'],
            'reason': suggestion['reason'],
            'alternative': suggestion['alternative']
        })
    elif suggestion['action'] == 'merge':
        reorganization_plan['merge'].append({
            'name': suggestion['playlist_name'],
            'reason': suggestion['reason'],
            'merge_into': suggestion['alternative'],
            'tracks_to_add': suggestion['tracks_lost']
        })

# Keep all others
for pid in to_keep_ids:
    info = playlist_info[pid]
    reorganization_plan['keep'].append({
        'name': info['name'],
        'track_count': info['track_count']
    })

print("=" * 80)
print("üìã REORGANIZATION PLAN")
print("=" * 80)

print(f"\nüóëÔ∏è  DELETE ({len(reorganization_plan['delete'])} playlists):")
print("-" * 80)
for item in reorganization_plan['delete']:
    print(f"   ‚Ä¢ {item['name']}")
    print(f"     ‚Üí {item['reason']}")
    print(f"     ‚Üí Keep: {item['alternative']}")
    print()

print(f"\nüîÄ MERGE ({len(reorganization_plan['merge'])} playlists):")
print("-" * 80)
for item in reorganization_plan['merge']:
    print(f"   ‚Ä¢ {item['name']}")
    print(f"     ‚Üí {item['reason']}")
    if item['tracks_to_add'] > 0:
        print(f"     ‚Üí ‚ö†Ô∏è  Add {item['tracks_to_add']} tracks to: {item['merge_into']}")
    else:
        print(f"     ‚Üí ‚úÖ {item['merge_into']}")
    print()

print(f"\n‚úÖ KEEP ({len(reorganization_plan['keep'])} playlists):")
print("-" * 80)
# Sort by track count
keep_sorted = sorted(reorganization_plan['keep'], key=lambda x: x['track_count'], reverse=True)
for item in keep_sorted[:20]:  # Show top 20
    print(f"   ‚Ä¢ {item['name']} ({item['track_count']} tracks)")
if len(keep_sorted) > 20:
    print(f"   ... and {len(keep_sorted) - 20} more playlists")

print("\n" + "=" * 80)
print("üìä REORGANIZATION SUMMARY")
print("=" * 80)
print(f"   Current playlists: {len(playlists)}")
print(f"   Delete: {len(reorganization_plan['delete'])}")
print(f"   Merge: {len(reorganization_plan['merge'])}")
print(f"   Keep: {len(reorganization_plan['keep'])}")
print(f"   Final count: {len(reorganization_plan['keep']) + len(merge_groups)}")
print(f"   Reduction: {len(reorganization_plan['delete']) + len(reorganization_plan['merge'])} playlists")


üìã REORGANIZATION PLAN

üóëÔ∏è  DELETE (66 playlists):
--------------------------------------------------------------------------------
   ‚Ä¢ AJFindsDec24
     ‚Üí Exact duplicate of "OtherFindsDec24"
     ‚Üí Keep: OtherFindsDec24

   ‚Ä¢ AJFindsOct24
     ‚Üí Exact duplicate of "OtherFindsOct24"
     ‚Üí Keep: OtherFindsOct24

   ‚Ä¢ Pepper ü´ë 
     ‚Üí All 3 tracks are in "AJFinds23" (1233 tracks)
     ‚Üí Keep: AJFinds23

   ‚Ä¢ DanceFindsApr24
     ‚Üí All 1 tracks are in "AJamPop" (356 tracks)
     ‚Üí Keep: AJamPop

   ‚Ä¢ OtherFindsApr24
     ‚Üí All 1 tracks are in "AJamR&B/Soul" (479 tracks)
     ‚Üí Keep: AJamR&B/Soul

   ‚Ä¢ DanceFindsNov25
     ‚Üí All 19 tracks are in "AJamElectronic" (481 tracks)
     ‚Üí Keep: AJamElectronic

   ‚Ä¢ DanceFindsOct25
     ‚Üí All 1 tracks are in "AJamElectronic" (481 tracks)
     ‚Üí Keep: AJamElectronic

   ‚Ä¢ DanceFindsSep25
     ‚Üí All 24 tracks are in "AJamElectronic" (481 tracks)
     ‚Üí Keep: AJamElectronic

   ‚Ä¢ DanceFin

In [28]:
# Export recommendations to CSV
export_df = pd.DataFrame(consolidation_suggestions)

# Add track counts using playlist_id
export_df['track_count'] = export_df['playlist_id'].apply(
    lambda pid: playlist_info.get(pid, {}).get('track_count', 0)
)

# Reorder columns
export_df = export_df[['playlist_name', 'track_count', 'action', 'reason', 'tracks_lost', 'alternative']]

# Save to CSV
output_file = DATA_DIR / 'playlist_consolidation_recommendations.csv'
export_df.to_csv(output_file, index=False)

print(f"‚úÖ Recommendations exported to: {output_file}")
print(f"\nüìä Summary:")
print(f"   Total recommendations: {len(export_df)}")
print(f"   Safe deletions (0 tracks lost): {len(export_df[export_df['tracks_lost'] == 0])}")
print(f"   Merge recommendations: {len(export_df[export_df['action'] == 'merge'])}")
print(f"\nüí° Next steps:")
print(f"   1. Review the CSV file: {output_file}")
print(f"   2. Manually verify recommendations")
print(f"   3. Delete/merge playlists in Spotify")
print(f"   4. Re-run sync to update your library")


‚úÖ Recommendations exported to: /Users/aryamaan/Desktop/Projects/spotim8/data/playlist_consolidation_recommendations.csv

üìä Summary:
   Total recommendations: 66
   Safe deletions (0 tracks lost): 66
   Merge recommendations: 0

üí° Next steps:
   1. Review the CSV file: /Users/aryamaan/Desktop/Projects/spotim8/data/playlist_consolidation_recommendations.csv
   2. Manually verify recommendations
   3. Delete/merge playlists in Spotify
   4. Re-run sync to update your library


## ‚úÖ Done!

**Summary:**
- ‚úÖ Identified redundant playlists using multiple similarity metrics
- ‚úÖ Found playlists safe to delete (zero track loss)
- ‚úÖ Suggested merge strategies for high-overlap playlists
- ‚úÖ Created reorganization plan
- ‚úÖ Exported recommendations to CSV

**Next Steps:**
1. Review the recommendations in the CSV file
2. Manually verify each suggestion
3. Delete/merge playlists in Spotify
4. Re-run `01_sync_data.ipynb` to update your library
5. Re-run this notebook to verify cleanup

**Note:** Always verify recommendations manually before deleting playlists!
