# üóëÔ∏è Identify Redundant Playlists & Aggressive Reorganization

Analyze your **user-created playlists** to identify **redundant playlists** with **AGGRESSIVE thresholds** to maximize consolidation without losing information.

**What this notebook does (AGGRESSIVE MODE):**
- üîç Finds playlists with high track overlap (>50% similarity - lowered thresholds)
- üìä Identifies playlists that are subsets of other playlists
- üéØ Suggests playlists safe to delete (zero track loss)
- üìã Proposes aggressive reorganization strategies
- üí° Recommends consolidation merges (add missing tracks, zero loss after merge)
- üì¶ Identifies groups of small playlists that can merge into larger ones
- üîÄ **FORCES suggestions** for maximum playlist reduction without information loss
- ‚úÖ **EXCLUDES auto-generated "AJ" playlists** - these are managed by the sync script

**Key Changes (Aggressive Mode):**
- **Auto-generated exclusion**: All playlists starting with "AJ" prefix are excluded from analysis
- Lowered similarity thresholds: >70% (was >90%) for high overlap, >50% (was >80%) for near-duplicates
- Size-based merge candidates: Small playlists (3x+ smaller) with >50% overlap ‚Üí merge into larger
- Group consolidations: Multiple small playlists can merge into a single larger playlist
- **Consolidation strategies**: New strategies for similar playlists (40-50% similarity) with merge/combine/review recommendations
- **Zero information loss**: All suggestions preserve all tracks via merge operations

**Prerequisites:** 
- Run `01_sync_data.ipynb` to download your library
- Run `04_analyze_listening_history.ipynb` (optional) to enable listening-based redundancy detection


## 1Ô∏è‚É£ Setup


In [1]:
# Install dependencies
%pip install -q pandas pyarrow tqdm

# Setup project - this adds project root to path
from pathlib import Path
from notebook_helpers import setup_project

PROJECT_ROOT = setup_project(Path("../..").resolve())
DATA_DIR = PROJECT_ROOT / "data"


Note: you may need to restart the kernel to use updated packages.
‚úÖ Project root: /Users/aryamaan/Desktop/Projects/SPOTIM8


In [2]:
import pandas as pd
import numpy as np
from pathlib import Path
from collections import defaultdict, Counter
from typing import Dict, List, Set, Tuple
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

from spotim8.analysis import LibraryAnalyzer, PlaylistSimilarityEngine

# Data directory
DATA_DIR = PROJECT_ROOT / "data"
print(f"üìÅ Data directory: {DATA_DIR.resolve()}")


üìÅ Data directory: /Users/aryamaan/Desktop/Projects/SPOTIM8/data


## 2Ô∏è‚É£ Load Library Data


In [3]:
# Identify redundant playlists using helper functions
from notebook_helpers import identify_redundant_playlists

# This function handles all the logic: loading data, filtering auto-generated playlists,
# building track sets, and analyzing redundancy
redundancy_results = identify_redundant_playlists(DATA_DIR, exclude_auto_generated=True)

# Extract results for use in subsequent cells
playlist_track_sets = redundancy_results['playlist_track_sets']
playlist_info = redundancy_results['playlist_info']
exact_duplicates = redundancy_results['exact_duplicates']
subsets = redundancy_results['subsets']
high_overlap = redundancy_results['high_overlap']
near_duplicates = redundancy_results['near_duplicates']
merge_candidates = redundancy_results['merge_candidates']
similar_playlists = redundancy_results['similar_playlists']
excluded_count = redundancy_results['excluded_count']

print(f"\n‚úÖ Redundancy analysis complete!")
print(f"   Found {len(exact_duplicates)} exact duplicates")
print(f"   Found {len(subsets)} subset relationships")
print(f"   Found {len(high_overlap)} high overlap pairs")
print(f"   Found {len(near_duplicates)} near-duplicate pairs")
print(f"   Found {len(merge_candidates)} merge candidates")
print(f"   Found {len(similar_playlists)} similar playlist pairs")


‚úÖ Loaded 560 playlists, 5,548 tracks
üîç Analyzing 64 playlists...


Comparing playlists:   0%|          | 0/64 [00:00<?, ?it/s]


‚úÖ Redundancy analysis complete!
   Found 0 exact duplicates
   Found 0 subset relationships
   Found 0 high overlap pairs
   Found 2 near-duplicate pairs
   Found 149 merge candidates
   Found 28 similar playlist pairs


## 3Ô∏è‚É£ Redundancy Analysis Results

The redundancy analysis has been completed using helper functions. Results are available below.


In [4]:
# Track sets and playlist info are already built by identify_redundant_playlists()
# Display summary
print(f"‚úÖ Track sets built for {len(playlist_track_sets)} user-created playlists")
print(f"   (Excluded {excluded_count} auto-generated 'AJ' playlists)")
print(f"üìä Total unique tracks across all analyzed playlists: {len(set().union(*playlist_track_sets.values())):,}")

# Display sample playlist info
print(f"\nüìã Sample playlist info (first 5):")
for i, (pid, info) in enumerate(list(playlist_info.items())[:5]):
    print(f"   {i+1}. {info['name']}: {info['track_count']} tracks")


‚úÖ Track sets built for 64 user-created playlists
   (Excluded 46 auto-generated 'AJ' playlists)
üìä Total unique tracks across all analyzed playlists: 3,804

üìã Sample playlist info (first 5):
   1. OtherFinds25: 281 tracks
   2. Jan26: 92 tracks
   3. Trapsoul: 57 tracks
   4. DarkThots (ALLdTIME): 48 tracks
   5. Pop ig dont deep it: 225 tracks


## 4Ô∏è‚É£ Find Redundant Playlists

We'll identify redundancy using multiple criteria with **aggressive thresholds** to maximize consolidation without loss:
1. **Exact duplicates** - Same tracks
2. **Subsets** - All tracks in one playlist are in another
3. **High overlap** - Very similar track sets (>70% overlap - lowered from 90%)
4. **Near-duplicates** - High similarity with moderate differences (>50% overlap)
5. **Consolidation candidates** - Playlists that can be merged into larger playlists
6. **Group merges** - Multiple small playlists that can be merged together


In [5]:
# Analysis is already complete from identify_redundant_playlists() in Cell 5
# Results are available in the variables extracted from redundancy_results
# This cell provides summary statistics

print("=" * 80)
print("üìä REDUNDANCY ANALYSIS SUMMARY")
print("=" * 80)
print(f"\n‚úÖ Analysis complete!")
print(f"   Exact duplicates: {len(exact_duplicates)}")
print(f"   Subsets: {len(subsets)}")
print(f"   High overlap (>70%): {len(high_overlap)}")
print(f"   Near duplicates (50-70%): {len(near_duplicates)}")
print(f"   Merge candidates (size-based): {len(merge_candidates)}")
print(f"   Similar playlists (40-50%): {len(similar_playlists)}")
print(f"\n   Total user-created playlists analyzed: {len(playlist_info)}")
print(f"   Auto-generated 'AJ' playlists excluded: {excluded_count}")

# Note: The actual analysis was performed by identify_redundant_playlists() helper function
# All similarity calculations and comparisons are handled by the helper module


üìä REDUNDANCY ANALYSIS SUMMARY

‚úÖ Analysis complete!
   Exact duplicates: 0
   Subsets: 0
   High overlap (>70%): 0
   Near duplicates (50-70%): 2
   Merge candidates (size-based): 149
   Similar playlists (40-50%): 28

   Total user-created playlists analyzed: 64
   Auto-generated 'AJ' playlists excluded: 46


In [6]:
# Display exact duplicates
if exact_duplicates:
    print("=" * 80)
    print("üîÑ EXACT DUPLICATES (Same tracks, can delete one)")
    print("=" * 80)
    for pid1, pid2 in exact_duplicates:
        info1 = playlist_info[pid1]
        info2 = playlist_info[pid2]
        print(f"\nüìã {info1['name']} ({info1['track_count']} tracks)")
        print(f"   ‚ö° Duplicate of: {info2['name']} ({info2['track_count']} tracks)")
        print(f"   üí° Recommendation: Delete one (keep the one with better name)")
else:
    print("‚úÖ No exact duplicates found")


‚úÖ No exact duplicates found


In [7]:
# Display subsets (playlists fully contained in others)
if subsets:
    print("\n" + "=" * 80)
    print("üì¶ SUBSETS (Fully contained in another playlist - SAFE TO DELETE)")
    print("=" * 80)
    
    subset_df = []
    for subset_pid, superset_pid, subset_size, superset_size in subsets:
        subset_info = playlist_info[subset_pid]
        superset_info = playlist_info[superset_pid]
        subset_df.append({
            'Subset Playlist': subset_info['name'],
            'Subset Tracks': subset_size,
            'Contained In': superset_info['name'],
            'Superset Tracks': superset_size,
            'Coverage': f"{subset_size}/{superset_size} ({100*subset_size/superset_size:.1f}%)"
        })
    
    df = pd.DataFrame(subset_df)
    df = df.sort_values('Subset Tracks', ascending=False)
    print(f"\nüìä Found {len(df)} playlists that are subsets of others:\n")
    print(df.to_string(index=False))
    
    # Identify unique subset playlists (safe to delete)
    safe_to_delete = set([subset_pid for subset_pid, _, _, _ in subsets])
    print(f"\n‚úÖ {len(safe_to_delete)} playlists can be safely deleted (all tracks are in other playlists)")
else:
    print("\n‚úÖ No subset playlists found")



‚úÖ No subset playlists found


In [8]:
# Display high overlap playlists
if high_overlap:
    print("\n" + "=" * 80)
    print("üéØ HIGH OVERLAP (>70% similarity - Strong merge candidates)")
    print("=" * 80)
    
    overlap_df = []
    for pid1, pid2, jaccard, overlap1, overlap2 in high_overlap:
        info1 = playlist_info[pid1]
        info2 = playlist_info[pid2]
        missing1 = len(playlist_track_sets[pid1] - playlist_track_sets[pid2])
        missing2 = len(playlist_track_sets[pid2] - playlist_track_sets[pid1])
        overlap_df.append({
            'Playlist 1': info1['name'],
            'Tracks 1': info1['track_count'],
            'Playlist 2': info2['name'],
            'Tracks 2': info2['track_count'],
            'Similarity': f"{jaccard*100:.1f}%",
            'P1‚ÜíP2 Missing': missing1,
            'P2‚ÜíP1 Missing': missing2,
            'P1 in P2': f"{overlap1*100:.1f}%",
            'P2 in P1': f"{overlap2*100:.1f}%",
        })
    
    df = pd.DataFrame(overlap_df)
    df = df.sort_values('Similarity', ascending=False)
    print(f"\nüìä Found {len(df)} playlist pairs with >70% similarity:\n")
    print(df.to_string(index=False))
    
    print("\nüí° Aggressive Recommendations:")
    print("   - MERGE: If one playlist is smaller, merge it into the larger one")
    print("   - CONSOLIDATE: Add missing tracks from smaller to larger playlist")
    print("   - DELETE: After merging, delete the smaller playlist (no track loss)")
else:
    print("\n‚úÖ No high-overlap playlists found")



‚úÖ No high-overlap playlists found


In [9]:
# Display near-duplicates
if near_duplicates:
    print("\n" + "=" * 80)
    print("üîó NEAR-DUPLICATES (50-70% similarity - Consolidation candidates)")
    print("=" * 80)
    
    near_df = []
    for pid1, pid2, jaccard, overlap1, overlap2 in near_duplicates[:30]:  # Show top 30
        info1 = playlist_info[pid1]
        info2 = playlist_info[pid2]
        missing1 = len(playlist_track_sets[pid1] - playlist_track_sets[pid2])
        missing2 = len(playlist_track_sets[pid2] - playlist_track_sets[pid1])
        near_df.append({
            'Playlist 1': info1['name'],
            'Tracks 1': info1['track_count'],
            'Playlist 2': info2['name'],
            'Tracks 2': info2['track_count'],
            'Similarity': f"{jaccard*100:.1f}%",
            'P1‚ÜíP2 Missing': missing1,
            'P2‚ÜíP1 Missing': missing2,
            'P1 in P2': f"{overlap1*100:.1f}%",
            'P2 in P1': f"{overlap2*100:.1f}%",
        })
    
    df = pd.DataFrame(near_df)
    df = df.sort_values('Similarity', ascending=False)
    print(f"\nüìä Top {len(df)} near-duplicate pairs (showing first 30):\n")
    print(df.to_string(index=False))
    
    if len(near_duplicates) > 30:
        print(f"\n   ... and {len(near_duplicates) - 30} more pairs")
    
    print("\nüí° Aggressive Consolidation Recommendations:")
    print("   - MERGE: If playlists serve similar purpose, merge by adding missing tracks")
    print("   - CONSOLIDATE: Merge smaller into larger if overlap >50% and size difference >2x")
    print("   - DELETE: After merging, delete merged playlist (zero track loss)")
else:
    print("\n‚úÖ No near-duplicate playlists found")

# Display merge candidates (size-based)
if merge_candidates:
    print("\n" + "=" * 80)
    print("üì¶ MERGE CANDIDATES (Small playlists that can merge into larger ones)")
    print("=" * 80)
    
    merge_df = []
    for small_pid, large_pid, small_overlap, large_overlap, small_size, large_size in merge_candidates[:30]:
        small_info = playlist_info[small_pid]
        large_info = playlist_info[large_pid]
        missing_tracks = len(playlist_track_sets[small_pid] - playlist_track_sets[large_pid])
        merge_df.append({
            'Small Playlist': small_info['name'],
            'Small Tracks': small_size,
            'Large Playlist': large_info['name'],
            'Large Tracks': large_size,
            'Overlap': f"{small_overlap*100:.1f}%",
            'Tracks to Add': missing_tracks,
            'Size Ratio': f"{large_size/small_size:.1f}x"
        })
    
    df = pd.DataFrame(merge_df)
    df = df.sort_values('Tracks to Add')
    print(f"\nüìä Top {len(df)} merge candidates (showing first 30):\n")
    print(df.to_string(index=False))
    
    if len(merge_candidates) > 30:
        print(f"\n   ... and {len(merge_candidates) - 30} more candidates")
    
    print("\nüí° Aggressive Recommendations:")
    print("   - MERGE: Add missing tracks from small playlist to large playlist")
    print("   - DELETE: After merging, delete the small playlist (zero track loss)")
    print("   - CONSOLIDATE: Reduces playlist count without losing any tracks")
else:
    print("\n‚úÖ No size-based merge candidates found")



üîó NEAR-DUPLICATES (50-70% similarity - Consolidation candidates)

üìä Top 2 near-duplicate pairs (showing first 30):

    Playlist 1  Tracks 1       Playlist 2  Tracks 2 Similarity  P1‚ÜíP2 Missing  P2‚ÜíP1 Missing P1 in P2 P2 in P1
ChillSunHop V2       882 TrapNeverDies3.0      1140      53.6%            176            434    80.0%    61.9%
 IcedLemonadeüçã       503      RvRChrls3 ü¶¶       561      52.7%            136            194    73.0%    65.4%

üí° Aggressive Consolidation Recommendations:
   - MERGE: If playlists serve similar purpose, merge by adding missing tracks
   - CONSOLIDATE: Merge smaller into larger if overlap >50% and size difference >2x
   - DELETE: After merging, delete merged playlist (zero track loss)

üì¶ MERGE CANDIDATES (Small playlists that can merge into larger ones)

üìä Top 30 merge candidates (showing first 30):

      Small Playlist  Small Tracks   Large Playlist  Large Tracks Overlap  Tracks to Add Size Ratio
            Trapsoul          

## 6Ô∏è‚É£ Comprehensive Redundancy Analysis

Now let's identify ALL user-created playlists that can be safely deleted or consolidated.

**NOTE:** Auto-generated playlists starting with "AJ" prefix are excluded from analysis - they're managed by the sync script.


In [10]:
# Build comprehensive consolidation suggestions using helper functions
from notebook_helpers import build_consolidation_suggestions, is_auto_generated_playlist

# Use helper function to build consolidation suggestions (excludes auto-generated playlists)
consolidation_results = build_consolidation_suggestions(redundancy_results, exclude_auto_generated=True)

# Extract results
safe_to_delete = consolidation_results['safe_to_delete']
consolidation_suggestions = consolidation_results['consolidation_suggestions']

print(f"‚úÖ Built consolidation suggestions!")
print(f"   Safe to delete: {len(safe_to_delete)} playlists")
print(f"   Total suggestions: {len(consolidation_suggestions)}")


‚úÖ Built consolidation suggestions!
   Safe to delete: 23 playlists
   Total suggestions: 23


In [11]:
# Display comprehensive deletion/consolidation recommendations
if consolidation_suggestions:
    print("\n" + "=" * 80)
    print("üìã CONSOLIDATION RECOMMENDATIONS (AGGRESSIVE - Zero Track Loss)")
    print("=" * 80)
    
    df = pd.DataFrame(consolidation_suggestions)
    df = df.sort_values(['tracks_lost', 'playlist_name'])
    
    # Separate by action type
    delete_actions = df[df['action'] == 'delete']
    merge_actions = df[df['action'] == 'merge']
    
    if len(delete_actions) > 0:
        print(f"\nüóëÔ∏è  SAFE TO DELETE ({len(delete_actions)} playlists - 0 tracks lost):")
        print("-" * 80)
        for _, row in delete_actions.iterrows():
            print(f"   ‚Ä¢ {row['playlist_name']}")
            print(f"     ‚Üí {row['reason']}")
            print(f"     ‚Üí Keep: {row['alternative']}")
            print()
    
    if len(merge_actions) > 0:
        print(f"\nüîÄ MERGE RECOMMENDATIONS ({len(merge_actions)} playlists - Zero track loss after adding missing tracks):")
        print("-" * 80)
        zero_loss_merges = merge_actions[merge_actions['tracks_lost'] == 0]
        tracks_to_add_merges = merge_actions[merge_actions['tracks_lost'] > 0]
        
        if len(zero_loss_merges) > 0:
            print(f"\n   ‚úÖ Perfect merges (0 tracks to add - {len(zero_loss_merges)} playlists):")
            for _, row in zero_loss_merges.head(10).iterrows():
                print(f"      ‚Ä¢ {row['playlist_name']}")
                print(f"        ‚Üí {row['reason']}")
                print(f"        ‚Üí {row['alternative']}")
            if len(zero_loss_merges) > 10:
                print(f"      ... and {len(zero_loss_merges) - 10} more perfect merges")
        
        if len(tracks_to_add_merges) > 0:
            print(f"\n   üîÑ Consolidation merges (add missing tracks - {len(tracks_to_add_merges)} playlists):")
            for _, row in tracks_to_add_merges.head(15).iterrows():
                print(f"      ‚Ä¢ {row['playlist_name']}")
                print(f"        ‚Üí {row['reason']}")
                print(f"        ‚Üí Add {row['tracks_lost']} tracks: {row['alternative']}")
            if len(tracks_to_add_merges) > 15:
                print(f"      ... and {len(tracks_to_add_merges) - 15} more consolidation merges")
        print()
    
    # Summary statistics
    total_tracks_to_add = df['tracks_lost'].sum()
    zero_loss = len(df[df['tracks_lost'] == 0])
    
    print("\n" + "=" * 80)
    print("üìä AGGRESSIVE CONSOLIDATION SUMMARY")
    print("=" * 80)
    print(f"   Total playlists recommended for deletion/merge: {len(df)}")
    print(f"   Perfect merges (0 tracks to add): {zero_loss}")
    print(f"   Consolidation merges (add missing tracks): {len(df) - zero_loss}")
    print(f"   Total unique tracks to add (zero loss after merge): {total_tracks_to_add}")
    print(f"   Current total playlists: {len(playlist_info)}")
    print(f"   After consolidation: {len(playlist_info) - len(df)} playlists")
    print(f"   Reduction: {len(df)} playlists ({100*len(df)/len(playlist_info):.1f}%)")
    print(f"   ‚úÖ ALL SUGGESTIONS: Zero information loss (tracks preserved via merge)")
else:
    print("\n‚úÖ No consolidation recommendations - your library is well organized!")



üìã CONSOLIDATION RECOMMENDATIONS (AGGRESSIVE - Zero Track Loss)

üîÄ MERGE RECOMMENDATIONS (23 playlists - Zero track loss after adding missing tracks):
--------------------------------------------------------------------------------

   üîÑ Consolidation merges (add missing tracks - 23 playlists):
      ‚Ä¢ Jazzyüé∑
        ‚Üí Small playlist (19 tracks) with 73.7% overlap in larger "IcedLemonadeüçã" (503 tracks)
        ‚Üí Add 5 tracks: Merge into "IcedLemonadeüçã" (add 5 missing tracks, zero loss)
      ‚Ä¢ Loungin
        ‚Üí Small playlist (11 tracks) with 54.5% overlap in larger "Pop ig dont deep it" (225 tracks)
        ‚Üí Add 5 tracks: Merge into "Pop ig dont deep it" (add 5 missing tracks, zero loss)
      ‚Ä¢  STFU
        ‚Üí Small playlist (24 tracks) with 66.7% overlap in larger "SandTrap üèùÔ∏èüáπüá≠" (986 tracks)
        ‚Üí Add 8 tracks: Merge into "SandTrap üèùÔ∏èüáπüá≠" (add 8 missing tracks, zero loss)
      ‚Ä¢ Trapsoul
        ‚Üí Small playlist (57

## 6Ô∏è‚É£ Find Group Consolidation Opportunities

Find groups of small playlists that can be consolidated together into larger playlists.


In [12]:
# Find groups of small playlists that can be consolidated into larger playlists
# Strategy: Identify multiple small playlists that together would fit well into a larger playlist

print("üîç Finding group consolidation opportunities...")
print("   (Multiple small playlists that can merge into a single larger playlist)\n")

# Build a map of which playlists can merge into which
merge_targets = {}  # target_pid -> list of (source_pid, missing_tracks)
for suggestion in consolidation_suggestions:
    if suggestion['action'] == 'merge' and suggestion['tracks_lost'] >= 0:
        # Extract target playlist name from alternative
        alt = suggestion['alternative']
        if 'Merge into "' in alt:
            target_name = alt.split('Merge into "')[1].split('"')[0]
        else:
            target_name = alt
        
        # Find target playlist ID
        target_pid = None
        for pid, info in playlist_info.items():
            if info['name'] == target_name:
                target_pid = pid
                break
        
        if target_pid:
            if target_pid not in merge_targets:
                merge_targets[target_pid] = []
            merge_targets[target_pid].append((suggestion['playlist_id'], suggestion['tracks_lost']))

# Find groups: multiple small playlists targeting the same larger playlist
# EXCLUDE auto-generated playlists as targets
group_consolidations = []
for target_pid, sources in merge_targets.items():
    # Skip if target is auto-generated - don't suggest merging into auto-generated playlists
    if is_auto_generated_playlist(target_pid):
        continue
    
    # Filter out auto-generated playlists from sources
    valid_sources = [(pid, missing) for pid, missing in sources if not is_auto_generated_playlist(pid)]
    
    if len(valid_sources) >= 2:  # At least 2 valid playlists can merge into this target
        target_info = playlist_info[target_pid]
        total_tracks_to_add = sum(missing for _, missing in valid_sources)
        source_names = [playlist_info[pid]['name'] for pid, _ in valid_sources]
        source_tracks = [playlist_info[pid]['track_count'] for pid, _ in valid_sources]
        
        group_consolidations.append({
            'target_name': target_info['name'],
            'target_tracks': target_info['track_count'],
            'source_count': len(valid_sources),
            'source_names': source_names,
            'source_tracks': source_tracks,
            'total_tracks_to_add': total_tracks_to_add,
            'target_pid': target_pid,
            'source_pids': [pid for pid, _ in valid_sources]
        })

if group_consolidations:
    print(f"‚úÖ Found {len(group_consolidations)} group consolidation opportunities!\n")
    
    # Sort by number of sources (most consolidation first)
    group_consolidations.sort(key=lambda x: x['source_count'], reverse=True)
    
    print("=" * 80)
    print("üì¶ GROUP CONSOLIDATION OPPORTUNITIES")
    print("=" * 80)
    print("   (Multiple small playlists can merge into a single larger playlist)\n")
    
    for i, group in enumerate(group_consolidations[:20], 1):  # Show top 20
        print(f"{i}. Target: {group['target_name']} ({group['target_tracks']} tracks)")
        print(f"   ‚Üí Can consolidate {group['source_count']} playlists into this one:")
        for j, (name, tracks) in enumerate(zip(group['source_names'], group['source_tracks']), 1):
            print(f"      {j}. {name} ({tracks} tracks)")
        print(f"   ‚Üí Total tracks to add: {group['total_tracks_to_add']} (zero loss)")
        print(f"   ‚Üí Reduction: {group['source_count']} playlists ‚Üí 1 playlist")
        print()
    
    if len(group_consolidations) > 20:
        print(f"   ... and {len(group_consolidations) - 20} more group opportunities\n")
    
    # Summary
    total_groups = len(group_consolidations)
    total_sources = sum(g['source_count'] for g in group_consolidations)
    print(f"üìä Group Consolidation Summary:")
    print(f"   Total groups: {total_groups}")
    print(f"   Total playlists that can be consolidated: {total_sources}")
    print(f"   Reduction: {total_sources} playlists ‚Üí {total_groups} playlists")
    print(f"   ‚úÖ Zero information loss - all tracks preserved")
else:
    print("‚úÖ No group consolidation opportunities found")
    print("   (Individual merges are more efficient)")


üîç Finding group consolidation opportunities...
   (Multiple small playlists that can merge into a single larger playlist)

‚úÖ Found 5 group consolidation opportunities!

üì¶ GROUP CONSOLIDATION OPPORTUNITIES
   (Multiple small playlists can merge into a single larger playlist)

1. Target: HiüçÄ (765 tracks)
   ‚Üí Can consolidate 5 playlists into this one:
      1. Pop ig dont deep it (225 tracks)
      2. üê±skillz  (219 tracks)
      3. HapPi ‚ò∫Ô∏è (224 tracks)
      4. Musi(c)ngs (234 tracks)
      5. Four 2060 Nine V2 (104 tracks)
   ‚Üí Total tracks to add: 341 (zero loss)
   ‚Üí Reduction: 5 playlists ‚Üí 1 playlist

2. Target: Jams üçì (491 tracks)
   ‚Üí Can consolidate 4 playlists into this one:
      1. DarkThots (ALLdTIME) (48 tracks)
      2. hrDnc üö® (99 tracks)
      3. Aura üßøü¶ãüåêü™¨ (154 tracks)
      4. Bnce (62 tracks)
   ‚Üí Total tracks to add: 172 (zero loss)
   ‚Üí Reduction: 4 playlists ‚Üí 1 playlist

3. Target: InMyRoom üíΩ (366 tracks)
   ‚Ü

## 7Ô∏è‚É£ Consolidation Strategies for Similar Playlists

Find similar user-created playlists (>40% similarity) that could be consolidated together for better organization.


In [13]:
# Consolidation strategies for similar playlists (40-50% similarity)
# Use helper function to build consolidation strategies
from notebook_helpers import build_consolidation_strategies

print("üîç Finding consolidation strategies for similar playlists...")
print("   (Playlists with 40-50% similarity - good candidates for manual review)\n")

# Build consolidation strategies using helper function
strategies_results = build_consolidation_strategies(redundancy_results, consolidation_results)
similar_consolidation_candidates = strategies_results['similar_consolidation_candidates']
    
if similar_consolidation_candidates:
    print(f"‚úÖ Found {len(similar_consolidation_candidates)} similar playlist pairs for consolidation strategies!\n")
    
    # Sort by similarity (highest first)
    similar_consolidation_candidates.sort(key=lambda x: x['similarity'], reverse=True)
    
    # Group by strategy
    merge_into_larger = [c for c in similar_consolidation_candidates if c['strategy'] == 'merge_into_larger']
    combine = [c for c in similar_consolidation_candidates if c['strategy'] == 'combine']
    review = [c for c in similar_consolidation_candidates if c['strategy'] == 'review']
    
    print("=" * 80)
    print("üîÑ CONSOLIDATION STRATEGIES FOR SIMILAR PLAYLISTS")
    print("=" * 80)
    print("   (User-created playlists only - auto-generated 'AJ' playlists excluded)\n")
    
    if merge_into_larger:
        print(f"üìå MERGE INTO LARGER (High/Medium Confidence - {len(merge_into_larger)} pairs):")
        print("-" * 80)
        for i, candidate in enumerate(merge_into_larger[:20], 1):  # Show top 20
            print(f"{i}. {candidate['recommended_action']}")
            print(f"   ‚Ä¢ Similarity: {candidate['similarity']:.1f}%")
            print(f"   ‚Ä¢ Overlap: {max(candidate['overlap1'], candidate['overlap2']):.1f}%")
            print(f"   ‚Ä¢ Tracks to add: {candidate['tracks_to_add']} (zero loss)")
            print(f"   ‚Ä¢ Confidence: {candidate['confidence']}")
            print()
        if len(merge_into_larger) > 20:
            print(f"   ... and {len(merge_into_larger) - 20} more merge opportunities\n")
    
    if combine:
        print(f"\nüì¶ COMBINE PLAYLISTS (Similar sizes - {len(combine)} pairs):")
        print("-" * 80)
        for i, candidate in enumerate(combine[:15], 1):  # Show top 15
            print(f"{i}. {candidate['recommended_action']}")
            print(f"   ‚Ä¢ '{candidate['playlist1']}' ({candidate['tracks1']} tracks) + '{candidate['playlist2']}' ({candidate['tracks2']} tracks)")
            print(f"   ‚Ä¢ Similarity: {candidate['similarity']:.1f}%")
            if candidate['unique_combined']:
                print(f"   ‚Ä¢ Total unique tracks if combined: ~{candidate['unique_combined']} (zero loss)")
            else:
                overlap_tracks = int(candidate['tracks1'] * candidate['overlap1'] / 100)
                unique_combined = candidate['tracks1'] + candidate['tracks2'] - overlap_tracks
                print(f"   ‚Ä¢ Total unique tracks if combined: ~{unique_combined} (zero loss)")
            print()
        if len(combine) > 15:
            print(f"   ... and {len(combine) - 15} more combine opportunities\n")
    
    if review:
        print(f"\nüîç REVIEW FOR CONSOLIDATION (Lower similarity - {len(review)} pairs):")
        print("-" * 80)
        for i, candidate in enumerate(review[:15], 1):  # Show top 15
            print(f"{i}. {candidate['playlist1']} ({candidate['tracks1']} tracks) ‚Üî {candidate['playlist2']} ({candidate['tracks2']} tracks)")
            print(f"   ‚Ä¢ Similarity: {candidate['similarity']:.1f}%")
            print(f"   ‚Ä¢ Overlap: P1‚ÜíP2: {candidate['overlap1']:.1f}%, P2‚ÜíP1: {candidate['overlap2']:.1f}%")
            print(f"   ‚Ä¢ Recommendation: Review manually - may serve different purposes")
            print()
        if len(review) > 15:
            print(f"   ... and {len(review) - 15} more pairs to review\n")
    
    # Summary
    print("=" * 80)
    print("üìä CONSOLIDATION STRATEGIES SUMMARY")
    print("=" * 80)
    print(f"   Total similar pairs found: {len(similar_consolidation_candidates)}")
    print(f"   High/Medium confidence merges: {len(merge_into_larger)}")
    print(f"   Combine opportunities: {len(combine)}")
    print(f"   Review candidates: {len(review)}")
    print(f"   ‚úÖ All strategies exclude auto-generated 'AJ' playlists")
    print(f"   üí° Review recommendations manually before consolidating")
else:
    print("‚úÖ No similar playlist pairs found for consolidation strategies")
    print("   (Only user-created playlists analyzed, excluding auto-generated 'AJ' playlists)")


üîç Finding consolidation strategies for similar playlists...
   (Playlists with 40-50% similarity - good candidates for manual review)

‚úÖ Found 27 similar playlist pairs for consolidation strategies!

üîÑ CONSOLIDATION STRATEGIES FOR SIMILAR PLAYLISTS
   (User-created playlists only - auto-generated 'AJ' playlists excluded)


üì¶ COMBINE PLAYLISTS (Similar sizes - 6 pairs):
--------------------------------------------------------------------------------
1. Create combined playlist with tracks from both 'IcedLemonadeüçã' and 'SummerChill üåûü•∂'
   ‚Ä¢ 'IcedLemonadeüçã' (503 tracks) + 'SummerChill üåûü•∂' (458 tracks)
   ‚Ä¢ Similarity: 49.5%
   ‚Ä¢ Total unique tracks if combined: ~958 (zero loss)

2. Create combined playlist with tracks from both 'WatsUrFlvr? V3' and '2Silk2Velvet'
   ‚Ä¢ 'WatsUrFlvr? V3' (616 tracks) + '2Silk2Velvet' (524 tracks)
   ‚Ä¢ Similarity: 49.0%
   ‚Ä¢ Total unique tracks if combined: ~1137 (zero loss)

3. Create combined playlist with tracks fro

## 7Ô∏è‚É£ Detailed Track-Level Analysis

For merge recommendations, let's see exactly which tracks would need to be added.


In [14]:
# For merge actions, show which tracks need to be added
# Load tracks data to get track names
from spotim8.analysis import LibraryAnalyzer

analyzer = LibraryAnalyzer(DATA_DIR).load()
tracks_df = analyzer.tracks_all

merge_details = []

for suggestion in consolidation_suggestions:
    if suggestion['action'] == 'merge' and suggestion['tracks_lost'] > 0:
        delete_pid = suggestion['playlist_id']
        delete_tracks = playlist_track_sets[delete_pid]
        
        # Find the playlist to merge into
        keep_name = suggestion['alternative'].split('"')[1] if '"' in suggestion['alternative'] else suggestion['alternative']
        keep_pid = None
        for pid, info in playlist_info.items():
            if info['name'] == keep_name:
                keep_pid = pid
                break
        
        if keep_pid:
            keep_tracks = playlist_track_sets[keep_pid]
            missing_tracks = delete_tracks - keep_tracks
            
            if missing_tracks:
                # Get track names
                missing_track_ids = list(missing_tracks)[:10]  # Show first 10
                track_names_df = tracks_df[tracks_df['track_id'].isin(missing_track_ids)]
                track_names = track_names_df['name'].tolist() if len(track_names_df) > 0 else []
                
                merge_details.append({
                    'Delete': suggestion['playlist_name'],
                    'Merge Into': keep_name,
                    'Missing Tracks': len(missing_tracks),
                    'Sample Tracks': ', '.join(track_names[:5]) if track_names else 'N/A'
                })

if merge_details:
    print("üìã Detailed Merge Analysis (tracks that need to be added):")
    print("=" * 80)
    df = pd.DataFrame(merge_details)
    for _, row in df.iterrows():
        print(f"\nüóëÔ∏è  Delete: {row['Delete']}")
        print(f"   ‚Üí Merge into: {row['Merge Into']}")
        print(f"   ‚Üí Add {row['Missing Tracks']} tracks")
        if row['Sample Tracks'] != 'N/A':
            print(f"   ‚Üí Sample: {row['Sample Tracks']}...")
else:
    print("‚úÖ All merge recommendations have zero track loss!")


‚úÖ Loaded 560 playlists, 5,548 tracks
üìã Detailed Merge Analysis (tracks that need to be added):

üóëÔ∏è  Delete: Jan26
   ‚Üí Merge into: InMyRoom üíΩ
   ‚Üí Add 26 tracks
   ‚Üí Sample: Give Up the Goods (Just Step) (feat. Big Noyd), light years (feat. In√©z), HYPNOSIS, The Less I Know The Better, If U Need It...

üóëÔ∏è  Delete: Trapsoul
   ‚Üí Merge into: InMyRoom üíΩ
   ‚Üí Add 9 tracks
   ‚Üí Sample: A$AP Forever (feat. Moby), Sk8 (with Ciara & EARTHGANG), Geneva (feat. Eli Sostre), Tailor Swif, ZZZ...

üóëÔ∏è  Delete: DarkThots (ALLdTIME)
   ‚Üí Merge into: Jams üçì
   ‚Üí Add 20 tracks
   ‚Üí Sample: Talk of the Town, HYPNOSIS, You Give Me A Feeling, OK OK, Arya...

üóëÔ∏è  Delete: Pop ig dont deep it
   ‚Üí Merge into: HiüçÄ
   ‚Üí Add 67 tracks
   ‚Üí Sample: The Lazy Song, If U Need It, dollaz n dollaz, The Days - NOTION Remix, Moves Like Jagger - Studio Recording From "The Voice" Performance...

üóëÔ∏è  Delete: Loungin
   ‚Üí Merge into: Pop ig dont deep it
   ‚

## 8Ô∏è‚É£ Aggressive Reorganization Strategy

Based on the aggressive analysis, here's a comprehensive reorganization plan to maximize playlist reduction without losing information.

**NOTE:** Auto-generated playlists starting with "AJ" prefix are excluded from all recommendations - they're managed by the sync script and should not be deleted or consolidated.


In [15]:
# Build reorganization plan
reorganization_plan = {
    'delete': [],
    'merge': [],
    'keep': []
}

# Categorize all playlists
all_playlist_ids = set(playlist_track_sets.keys())
to_delete_ids = safe_to_delete
to_keep_ids = all_playlist_ids - to_delete_ids

# Build merge groups
merge_groups = defaultdict(list)
for suggestion in consolidation_suggestions:
    if suggestion['action'] == 'merge':
        keep_name = suggestion['alternative'].split('"')[1] if '"' in suggestion['alternative'] else suggestion['alternative']
        merge_groups[keep_name].append(suggestion['playlist_name'])

# Organize recommendations
for suggestion in consolidation_suggestions:
    if suggestion['action'] == 'delete':
        reorganization_plan['delete'].append({
            'name': suggestion['playlist_name'],
            'reason': suggestion['reason'],
            'alternative': suggestion['alternative']
        })
    elif suggestion['action'] == 'merge':
        reorganization_plan['merge'].append({
            'name': suggestion['playlist_name'],
            'reason': suggestion['reason'],
            'merge_into': suggestion['alternative'],
            'tracks_to_add': suggestion['tracks_lost']
        })

# Keep all others (user-created playlists only - auto-generated already excluded)
for pid in to_keep_ids:
    # Double-check: exclude auto-generated playlists (shouldn't be here, but safety check)
    if pid not in playlist_info or is_auto_generated_playlist(pid):
        continue
    info = playlist_info[pid]
    reorganization_plan['keep'].append({
        'name': info['name'],
        'track_count': info['track_count']
    })

print("=" * 80)
print("üìã AGGRESSIVE REORGANIZATION PLAN (Zero Information Loss)")
print("=" * 80)

print(f"\nüóëÔ∏è  DELETE ({len(reorganization_plan['delete'])} playlists - 0 tracks lost):")
print("-" * 80)
for item in reorganization_plan['delete'][:30]:  # Show top 30
    print(f"   ‚Ä¢ {item['name']}")
    print(f"     ‚Üí {item['reason']}")
    print(f"     ‚Üí Keep: {item['alternative']}")
if len(reorganization_plan['delete']) > 30:
    print(f"\n   ... and {len(reorganization_plan['delete']) - 30} more safe deletions")
print()

print(f"\nüîÄ MERGE ({len(reorganization_plan['merge'])} playlists - Zero loss after merge):")
print("-" * 80)
# Group by target
merge_by_target = defaultdict(list)
for item in reorganization_plan['merge']:
    target = item['merge_into'].split('"')[1] if '"' in item['merge_into'] else item['merge_into']
    merge_by_target[target].append(item)

# Show top groups
for target, items in sorted(merge_by_target.items(), key=lambda x: len(x[1]), reverse=True)[:15]:
    print(f"\n   üì¶ Target: {target}")
    print(f"      ‚Üí Consolidate {len(items)} playlists into this one:")
    total_tracks_to_add = sum(item['tracks_to_add'] for item in items)
    for item in items[:5]:  # Show first 5 in group
        print(f"        ‚Ä¢ {item['name']} ({item['reason']})")
        if item['tracks_to_add'] > 0:
            print(f"          ‚Üí Add {item['tracks_to_add']} tracks")
    if len(items) > 5:
        print(f"        ... and {len(items) - 5} more playlists")
    if total_tracks_to_add > 0:
        print(f"      ‚Üí Total: Add {total_tracks_to_add} tracks (zero loss)")
    else:
        print(f"      ‚Üí Total: Perfect merge (0 tracks to add)")

if len(merge_by_target) > 15:
    print(f"\n   ... and {len(merge_by_target) - 15} more merge targets")
print()

print(f"\n‚úÖ KEEP ({len(reorganization_plan['keep'])} playlists after consolidation):")
print("-" * 80)
# Sort by track count
keep_sorted = sorted(reorganization_plan['keep'], key=lambda x: x['track_count'], reverse=True)
for item in keep_sorted[:25]:  # Show top 25
    print(f"   ‚Ä¢ {item['name']} ({item['track_count']} tracks)")
if len(keep_sorted) > 25:
    print(f"   ... and {len(keep_sorted) - 25} more playlists")

print("\n" + "=" * 80)
print("üìä AGGRESSIVE REORGANIZATION SUMMARY")
print("=" * 80)
total_delete = len(reorganization_plan['delete'])
total_merge = len(reorganization_plan['merge'])
total_keep = len(reorganization_plan['keep'])
total_tracks_to_add = sum(item['tracks_to_add'] for item in reorganization_plan['merge'])
final_count = total_keep

print(f"   Current user-created playlists: {len(playlist_info)}")
# Calculate excluded count if not already defined
try:
    excluded_display = f"{excluded_count} (managed by sync script)"
except NameError:
    # Count auto-generated playlists from playlist_info if excluded_count not available
    excluded_display = "count unavailable (check earlier cells)"
print(f"   Auto-generated 'AJ' playlists excluded: {excluded_display}")
print(f"   Safe deletions (0 tracks lost): {total_delete}")
print(f"   Merge consolidations (zero loss after merge): {total_merge}")
print(f"   User-created playlists to keep: {total_keep}")
print(f"   Final user-created playlist count: {final_count}")
print(f"   Total reduction: {total_delete + total_merge} playlists ({100*(total_delete + total_merge)/len(playlist_info):.1f}% of user-created)")
print(f"   Total tracks to add: {total_tracks_to_add} (zero information loss)")
print(f"   ‚úÖ ALL ACTIONS: Zero information loss - all tracks preserved via merge operations")
print(f"   ‚úÖ EXCLUSIONS: Auto-generated 'AJ' playlists excluded from all recommendations")


üìã AGGRESSIVE REORGANIZATION PLAN (Zero Information Loss)

üóëÔ∏è  DELETE (0 playlists - 0 tracks lost):
--------------------------------------------------------------------------------


üîÄ MERGE (23 playlists - Zero loss after merge):
--------------------------------------------------------------------------------

   üì¶ Target: HiüçÄ
      ‚Üí Consolidate 5 playlists into this one:
        ‚Ä¢ Pop ig dont deep it (Small playlist (225 tracks) with 70.2% overlap in larger "HiüçÄ" (765 tracks))
          ‚Üí Add 67 tracks
        ‚Ä¢ üê±skillz  (Small playlist (219 tracks) with 73.1% overlap in larger "HiüçÄ" (765 tracks))
          ‚Üí Add 59 tracks
        ‚Ä¢ HapPi ‚ò∫Ô∏è (Small playlist (224 tracks) with 68.8% overlap in larger "HiüçÄ" (765 tracks))
          ‚Üí Add 70 tracks
        ‚Ä¢ Musi(c)ngs (Small playlist (234 tracks) with 58.1% overlap in larger "HiüçÄ" (765 tracks))
          ‚Üí Add 98 tracks
        ‚Ä¢ Four 2060 Nine V2 (Small playlist (104 tracks) with 

In [16]:
# Export recommendations to CSV
export_df = pd.DataFrame(consolidation_suggestions)

# Add track counts using playlist_id
export_df['track_count'] = export_df['playlist_id'].apply(
    lambda pid: playlist_info.get(pid, {}).get('track_count', 0)
)

# Reorder columns
export_df = export_df[['playlist_name', 'track_count', 'action', 'reason', 'tracks_lost', 'alternative']]

# Save to CSV
output_file = DATA_DIR / 'playlist_consolidation_recommendations.csv'
export_df.to_csv(output_file, index=False)

print(f"‚úÖ Recommendations exported to: {output_file}")
print(f"\nüìä Summary:")
print(f"   Total recommendations: {len(export_df)}")
print(f"   Safe deletions (0 tracks lost): {len(export_df[export_df['tracks_lost'] == 0])}")
print(f"   Merge recommendations: {len(export_df[export_df['action'] == 'merge'])}")
print(f"\nüí° Next steps:")
print(f"   1. Review the CSV file: {output_file}")
print(f"   2. Manually verify recommendations")
print(f"   3. Delete/merge playlists in Spotify")
print(f"   4. Re-run sync to update your library")


‚úÖ Recommendations exported to: /Users/aryamaan/Desktop/Projects/SPOTIM8/data/playlist_consolidation_recommendations.csv

üìä Summary:
   Total recommendations: 23
   Safe deletions (0 tracks lost): 0
   Merge recommendations: 23

üí° Next steps:
   1. Review the CSV file: /Users/aryamaan/Desktop/Projects/SPOTIM8/data/playlist_consolidation_recommendations.csv
   2. Manually verify recommendations
   3. Delete/merge playlists in Spotify
   4. Re-run sync to update your library


In [17]:
from notebook_helpers import jaccard_similarity

# Load streaming history if available
from spotim8 import load_streaming_history

history_df = load_streaming_history(DATA_DIR)

if history_df is not None and len(history_df) > 0:
    print("="*80)
    print("üìä LISTENING-BASED REDUNDANCY ANALYSIS")
    print("="*80)
    
    # Build track usage map (which tracks are actually played)
    played_tracks = set()
    if 'track_id' in history_df.columns:
        played_tracks = set(history_df['track_id'].dropna().unique())
    else:
        # Match by artist + track name
        played_track_names = set(zip(
            history_df['artist_name'].str.lower().fillna(''),
            history_df['track_name'].str.lower().fillna('')
        ))
        # Match with library tracks
        # Load tracks and artists if not already loaded
        if 'tracks' not in locals() or 'track_artists' not in locals() or 'artists' not in locals():
            tracks = pd.read_parquet(DATA_DIR / "tracks.parquet")
            track_artists = pd.read_parquet(DATA_DIR / "track_artists.parquet")
            artists = pd.read_parquet(DATA_DIR / "artists.parquet")
        
        library_track_names = tracks.merge(
            track_artists[track_artists['position'] == 0],
            on='track_id'
        ).merge(artists[['artist_id', 'name']], on='artist_id')
        library_track_set = set(zip(
            library_track_names['name_x'].str.lower().fillna(''),
            library_track_names['name_y'].str.lower().fillna('')
        ))
        # Get track IDs for played tracks
        matched = library_track_names[
            library_track_names.apply(
                lambda row: (row['name_x'].lower(), row['name_y'].lower()) in played_track_names,
                axis=1
            )
        ]
        played_tracks = set(matched['track_id'].unique())
    
    # Analyze playlist usage
    print("\nüìä Playlist Usage Analysis:")
    print("   (Playlists where tracks are actually played vs. just saved)\n")
    
    unused_playlists = []
    low_usage_playlists = []
    
    for pid in playlist_track_sets.keys():
        playlist_tracks_set = playlist_track_sets[pid]
        if not playlist_tracks_set:
            continue
        
        # Count how many tracks from this playlist were actually played
        played_from_playlist = playlist_tracks_set & played_tracks
        usage_rate = len(played_from_playlist) / len(playlist_tracks_set) if playlist_tracks_set else 0
        
        info = playlist_info[pid]
        
        if usage_rate == 0:
            unused_playlists.append((pid, info['name'], len(playlist_tracks_set)))
        elif usage_rate < 0.1:  # Less than 10% usage
            low_usage_playlists.append((pid, info['name'], len(playlist_tracks_set), usage_rate * 100))
    
    if unused_playlists:
        print(f"üóëÔ∏è  UNUSED PLAYLISTS ({len(unused_playlists)} playlists - 0% tracks played):")
        for pid, name, track_count in sorted(unused_playlists, key=lambda x: x[2], reverse=True)[:10]:
            print(f"   ‚Ä¢ {name} ({track_count} tracks) - Never played from")
        if len(unused_playlists) > 10:
            print(f"   ... and {len(unused_playlists) - 10} more")
    
    if low_usage_playlists:
        print(f"\n‚ö†Ô∏è  LOW USAGE PLAYLISTS ({len(low_usage_playlists)} playlists - <10% tracks played):")
        for pid, name, track_count, usage_pct in sorted(low_usage_playlists, key=lambda x: x[3])[:10]:
            print(f"   ‚Ä¢ {name} ({track_count} tracks) - {usage_pct:.1f}% usage")
        if len(low_usage_playlists) > 10:
            print(f"   ... and {len(low_usage_playlists) - 10} more")
    
    if not unused_playlists and not low_usage_playlists:
        print("   ‚úÖ All playlists have good usage rates!")
    
    # Listening-weighted similarity
    print("\n\nüìä Listening-Weighted Similarity Analysis:")
    print("   (Playlists with similar listening patterns, even if track overlap is low)\n")
    
    # Build listening frequency per track
    track_listen_counts = {}
    if 'track_id' in history_df.columns:
        track_listen_counts = history_df.groupby('track_id').size().to_dict()
    else:
        # Count by artist + track name
        track_name_counts = history_df.groupby(['artist_name', 'track_name']).size()
        # Match to track IDs
        for (artist, track), count in track_name_counts.items():
            # Load tracks and artists if not already loaded
            if 'tracks' not in locals() or 'track_artists' not in locals() or 'artists' not in locals():
                tracks = pd.read_parquet(DATA_DIR / "tracks.parquet")
                track_artists = pd.read_parquet(DATA_DIR / "track_artists.parquet")
                artists = pd.read_parquet(DATA_DIR / "artists.parquet")
            
            matches = tracks.merge(
                track_artists[track_artists['position'] == 0],
                on='track_id'
            ).merge(artists[['artist_id', 'name']], on='artist_id')
            matches = matches[
                (matches['name_x'].str.lower() == track.lower()) &
                (matches['name_y'].str.lower() == artist.lower())
            ]
            if len(matches) > 0:
                track_listen_counts[matches.iloc[0]['track_id']] = count
    
    # Calculate listening-weighted similarity
    listening_redundant = []
    for i, (pid1, set1) in enumerate(playlist_track_sets.items()):
        if not set1:
            continue
        for j, (pid2, set2) in enumerate(playlist_track_sets.items()):
            if i >= j or not set2:
                continue
            
            # Get shared tracks
            shared = set1 & set2
            if not shared:
                continue
            
            # Calculate listening-weighted similarity
            shared_listens = sum(track_listen_counts.get(tid, 0) for tid in shared)
            total_listens_1 = sum(track_listen_counts.get(tid, 0) for tid in set1)
            total_listens_2 = sum(track_listen_counts.get(tid, 0) for tid in set2)
            
            if total_listens_1 == 0 or total_listens_2 == 0:
                continue
            
            # Similarity based on shared listening frequency
            listening_sim = shared_listens / max(total_listens_1, total_listens_2)
            
            # Also check traditional Jaccard
            jaccard = jaccard_similarity(set1, set2)
            
            # If listening similarity is high but Jaccard is low, they serve similar purpose
            if listening_sim > 0.5 and jaccard < 0.7:
                info1 = playlist_info[pid1]
                info2 = playlist_info[pid2]
                listening_redundant.append((pid1, pid2, info1['name'], info2['name'], jaccard, listening_sim))
    
    if listening_redundant:
        print(f"üîó FUNCTIONALLY REDUNDANT PLAYLISTS ({len(listening_redundant)} pairs):")
        print("   (Different tracks but similar listening patterns)\n")
        for pid1, pid2, name1, name2, jaccard, listening_sim in sorted(listening_redundant, key=lambda x: x[5], reverse=True)[:10]:
            print(f"   ‚Ä¢ {name1} ‚Üî {name2}")
            print(f"     Track overlap: {jaccard*100:.1f}%, Listening similarity: {listening_sim*100:.1f}%")
        if len(listening_redundant) > 10:
            print(f"   ... and {len(listening_redundant) - 10} more pairs")
    else:
        print("   ‚úÖ No functionally redundant playlists found based on listening patterns")
    
    print("\nüí° Recommendations:")
    print("   ‚Ä¢ Consider deleting unused playlists (0% usage)")
    print("   ‚Ä¢ Review low usage playlists - they may be outdated")
    print("   ‚Ä¢ Functionally redundant playlists might be merged even with low track overlap")
else:
    print("‚ö†Ô∏è  No streaming history data available")
    print("   Run the streaming history sync in 04_analyze_listening_history.ipynb to enable this analysis")


üìä LISTENING-BASED REDUNDANCY ANALYSIS

üìä Playlist Usage Analysis:
   (Playlists where tracks are actually played vs. just saved)

   ‚úÖ All playlists have good usage rates!


üìä Listening-Weighted Similarity Analysis:
   (Playlists with similar listening patterns, even if track overlap is low)

üîó FUNCTIONALLY REDUNDANT PLAYLISTS (126 pairs):
   (Different tracks but similar listening patterns)

   ‚Ä¢ ChillSunHop V2 ‚Üî TrapNeverDies3.0
     Track overlap: 53.6%, Listening similarity: 83.1%
   ‚Ä¢ IcedLemonadeüçã ‚Üî SummerChill üåûü•∂
     Track overlap: 49.5%, Listening similarity: 75.1%
   ‚Ä¢ Chill Dance ‚Üî Ghar2üè°
     Track overlap: 49.9%, Listening similarity: 75.0%
   ‚Ä¢ IcedLemonadeüçã ‚Üî RvRChrls3 ü¶¶
     Track overlap: 52.7%, Listening similarity: 74.5%
   ‚Ä¢ WatsUrFlvr? V3 ‚Üî RvRChrls3 ü¶¶
     Track overlap: 47.9%, Listening similarity: 74.4%
   ‚Ä¢ FeelDForce ‚ö°Ô∏è ‚Üî PWR4
     Track overlap: 43.7%, Listening similarity: 74.2%
   ‚Ä¢ Stylinü§ô

## üîÄ Generate Merge Commands (Using Merge Logic from Examples)

This cell identifies potential merges using the merge logic pattern:
1. Determine which playlist is older (by earliest track added_at timestamp)
2. Use the older playlist as the source (it will be renamed)
3. Merge tracks from newer playlists into the older one
4. Delete the newer playlists after merge

Based on the redundancy analysis, we'll identify merges from:
- Subsets (fully contained playlists)
- High overlap pairs (>70% similarity)
- Near-duplicates (50-70% similarity with size differences)
- Merge candidates (small playlists with high overlap in larger ones)


In [18]:
# Load playlist tracks to determine playlist age
playlist_tracks_df = pd.read_parquet(DATA_DIR / "playlist_tracks.parquet")

def get_playlist_earliest_timestamp(playlist_id: str) -> pd.Timestamp:
    """Get the earliest added_at timestamp for a playlist."""
    pl_tracks = playlist_tracks_df[playlist_tracks_df['playlist_id'] == playlist_id].copy()
    if len(pl_tracks) == 0:
        return pd.Timestamp.max  # If no tracks, consider it newest
    pl_tracks['added_at'] = pd.to_datetime(pl_tracks['added_at'], errors='coerce', utc=True)
    earliest = pl_tracks['added_at'].min()
    if pd.isna(earliest):
        return pd.Timestamp.max  # If no valid timestamps, consider it newest
    return earliest

# Build merge suggestions using the merge logic pattern
merge_suggestions = []

# 1. Subsets - merge into the superset (which is older by definition if it contains all tracks)
print("=" * 80)
print("üîÄ MERGE SUGGESTIONS (Using Merge Logic Pattern)")
print("=" * 80)
print("\n1Ô∏è‚É£ SUBSETS (merge subset into superset):\n")
for subset_pid, superset_pid, subset_size, superset_size in subsets:
    subset_info = playlist_info[subset_pid]
    superset_info = playlist_info[superset_pid]
    
    # Determine which is older
    subset_earliest = get_playlist_earliest_timestamp(subset_pid)
    superset_earliest = get_playlist_earliest_timestamp(superset_pid)
    
    if subset_earliest <= superset_earliest:
        older_name = subset_info['name']
        older_id = subset_pid
        newer_name = superset_info['name']
        newer_id = superset_pid
        new_name = superset_info['name']  # Keep the superset name
    else:
        older_name = superset_info['name']
        older_id = superset_pid
        newer_name = subset_info['name']
        newer_id = subset_pid
        new_name = superset_info['name']  # Keep the superset name
    
    merge_suggestions.append({
        'type': 'subset',
        'older_playlist': older_name,
        'newer_playlist': newer_name,
        'new_name': new_name,
        'older_id': older_id,
        'newer_id': newer_id,
        'tracks_in_older': playlist_info[older_id]['track_count'],
        'tracks_in_newer': playlist_info[newer_id]['track_count'],
        'command': f'python scripts/merge_to_new_playlist.py "{older_name}" "{newer_name}" "{new_name}"'
    })
    
    print(f"   ‚Ä¢ {subset_info['name']} ‚Üí {superset_info['name']}")
    print(f"     Command: python scripts/merge_to_new_playlist.py \"{subset_info['name']}\" \"{superset_info['name']}\" \"{superset_info['name']}\"")

# 2. High overlap pairs - merge smaller into larger (determine older for target)
print("\n2Ô∏è‚É£ HIGH OVERLAP PAIRS (>70% similarity):\n")
for pid1, pid2, jaccard, overlap1, overlap2 in high_overlap:
    info1 = playlist_info[pid1]
    info2 = playlist_info[pid2]
    
    # Determine which is older
    pid1_earliest = get_playlist_earliest_timestamp(pid1)
    pid2_earliest = get_playlist_earliest_timestamp(pid2)
    
    if pid1_earliest <= pid2_earliest:
        older_name = info1['name']
        older_id = pid1
        newer_name = info2['name']
        newer_id = pid2
    else:
        older_name = info2['name']
        older_id = pid2
        newer_name = info1['name']
        newer_id = pid1
    
    # Use the larger playlist name as the new name
    if info1['track_count'] >= info2['track_count']:
        new_name = info1['name']
    else:
        new_name = info2['name']
    
    merge_suggestions.append({
        'type': 'high_overlap',
        'older_playlist': older_name,
        'newer_playlist': newer_name,
        'new_name': new_name,
        'older_id': older_id,
        'newer_id': newer_id,
        'similarity': f"{jaccard*100:.1f}%",
        'command': f'python scripts/merge_to_new_playlist.py "{older_name}" "{newer_name}" "{new_name}"'
    })
    
    print(f"   ‚Ä¢ {info1['name']} + {info2['name']} (similarity: {jaccard*100:.1f}%)")
    print(f"     Command: python scripts/merge_to_new_playlist.py \"{older_name}\" \"{newer_name}\" \"{new_name}\"")

# 3. Top merge candidates - merge small into large
print("\n3Ô∏è‚É£ TOP MERGE CANDIDATES (small ‚Üí large, showing top 10):\n")
for small_pid, large_pid, small_overlap, large_overlap, small_size, large_size in merge_candidates[:10]:
    small_info = playlist_info[small_pid]
    large_info = playlist_info[large_pid]
    
    # Determine which is older
    small_earliest = get_playlist_earliest_timestamp(small_pid)
    large_earliest = get_playlist_earliest_timestamp(large_pid)
    
    if small_earliest <= large_earliest:
        older_name = small_info['name']
        older_id = small_pid
        newer_name = large_info['name']
        newer_id = large_pid
    else:
        older_name = large_info['name']
        older_id = large_pid
        newer_name = small_info['name']
        newer_id = small_pid
    
    # Use the larger playlist name as the new name
    new_name = large_info['name']
    
    missing_tracks = len(playlist_track_sets[small_pid] - playlist_track_sets[large_pid])
    
    merge_suggestions.append({
        'type': 'merge_candidate',
        'older_playlist': older_name,
        'newer_playlist': newer_name,
        'new_name': new_name,
        'older_id': older_id,
        'newer_id': newer_id,
        'overlap': f"{small_overlap*100:.1f}%",
        'tracks_to_add': missing_tracks,
        'command': f'python scripts/merge_to_new_playlist.py "{older_name}" "{newer_name}" "{new_name}"'
    })
    
    print(f"   ‚Ä¢ {small_info['name']} ‚Üí {large_info['name']} (overlap: {small_overlap*100:.1f}%, add {missing_tracks} tracks)")
    print(f"     Command: python scripts/merge_to_new_playlist.py \"{older_name}\" \"{newer_name}\" \"{new_name}\"")

print(f"\n‚úÖ Generated {len(merge_suggestions)} merge suggestions")
print(f"   (Subsets: {len([m for m in merge_suggestions if m['type'] == 'subset'])})")
print(f"   (High overlap: {len([m for m in merge_suggestions if m['type'] == 'high_overlap'])})")
print(f"   (Merge candidates shown: {len([m for m in merge_suggestions if m['type'] == 'merge_candidate'])})")
print(f"\nüí° Note: Commands use the older playlist as source (following merge logic pattern)")
print(f"   All suggestions preserve tracks - zero information loss")


üîÄ MERGE SUGGESTIONS (Using Merge Logic Pattern)

1Ô∏è‚É£ SUBSETS (merge subset into superset):


2Ô∏è‚É£ HIGH OVERLAP PAIRS (>70% similarity):


3Ô∏è‚É£ TOP MERGE CANDIDATES (small ‚Üí large, showing top 10):

   ‚Ä¢ Jan26 ‚Üí InMyRoom üíΩ (overlap: 71.7%, add 26 tracks)
     Command: python scripts/merge_to_new_playlist.py "InMyRoom üíΩ" "Jan26" "InMyRoom üíΩ"
   ‚Ä¢ Jan26 ‚Üí Jams üçì (overlap: 71.7%, add 26 tracks)
     Command: python scripts/merge_to_new_playlist.py "Jams üçì" "Jan26" "Jams üçì"
   ‚Ä¢ Jan26 ‚Üí Ride üö¥‚Äç‚ôÇÔ∏è (overlap: 73.9%, add 24 tracks)
     Command: python scripts/merge_to_new_playlist.py "Ride üö¥‚Äç‚ôÇÔ∏è" "Jan26" "Ride üö¥‚Äç‚ôÇÔ∏è"
   ‚Ä¢ Jan26 ‚Üí HiüçÄ (overlap: 60.9%, add 36 tracks)
     Command: python scripts/merge_to_new_playlist.py "HiüçÄ" "Jan26" "HiüçÄ"
   ‚Ä¢ Jan26 ‚Üí Stylinü§ô (overlap: 73.9%, add 24 tracks)
     Command: python scripts/merge_to_new_playlist.py "Stylinü§ô" "Jan26" "Stylinü§ô"
   ‚Ä¢ Jan26 ‚Üí GLyD3 (overl

## ‚úÖ Done!


## ‚úÖ Done! (Aggressive Consolidation Complete)

**Summary (AGGRESSIVE MODE):**
- ‚úÖ Identified redundant playlists using **aggressive thresholds** (>50% similarity)
- ‚úÖ Found playlists safe to delete (zero track loss)
- ‚úÖ **FORCED suggestions** for high-overlap playlists (>70% similarity)
- ‚úÖ Suggested merge strategies for near-duplicates (>50% similarity with size difference)
- ‚úÖ Identified size-based merge candidates (3x+ smaller with >50% overlap)
- ‚úÖ Found group consolidation opportunities (multiple small playlists ‚Üí one larger)
- ‚úÖ Created aggressive reorganization plan
- ‚úÖ Exported recommendations to CSV

**Key Metrics:**
- **Lowered thresholds** for maximum consolidation without loss
- **Zero information loss**: All suggestions preserve tracks via merge operations
- **Forced suggestions**: Aggressive approach finds more consolidation opportunities

**Next Steps:**
1. Review the recommendations in the CSV file (`playlist_consolidation_recommendations.csv`)
2. **For deletions**: Delete playlists - zero track loss (all tracks in another playlist)
3. **For merges**: 
   - Add missing tracks from smaller playlist to larger playlist
   - After merging tracks, delete the smaller playlist
   - **Zero loss**: All tracks preserved via merge operations
4. Review group consolidation opportunities for batch merges
5. Re-run `01_sync_data.ipynb` to update your library
6. Re-run this notebook to verify cleanup

**Workflow:**
- `01_sync_data.ipynb` ‚Üí `02_analyze_library.ipynb` ‚Üí `03_playlist_analysis.ipynb` ‚Üí `04_analyze_listening_history.ipynb` ‚Üí `05_liked_songs_monthly_playlists.ipynb` ‚Üí `06_identify_redundant_playlists.ipynb`

**Important Notes:**
- ‚úÖ **All suggestions preserve information** - zero track loss via merge operations
- ‚ö†Ô∏è  Always verify recommendations manually before deleting playlists
- üí° Merge operations require adding missing tracks first, then deleting merged playlist
- üìä Aggressive thresholds ensure maximum playlist reduction while preserving all tracks
