# üîÑ Spotify Library Sync

This notebook downloads your Spotify library and saves it locally for offline analysis.

**What it does:**
- ‚úÖ Fetches all your playlists (owned only)
- ‚úÖ Fetches your Liked Songs (‚ù§Ô∏è master playlist)
- ‚úÖ Downloads track and artist metadata
- ‚úÖ Saves everything to `data/` as parquet files
- ‚úÖ Incremental updates (only fetches changes)

**Run this first!** Then use `02_analyze_library.ipynb` for analysis.

**üí° Tip:** For automated daily syncs, use `src/scripts/automation/sync.py` instead (configured via cron job). See `README.md` for details.

## 1Ô∏è‚É£ Setup

Install dependencies and configure credentials.

In [1]:
# Install dependencies (run once)
%pip install -q pandas spotipy pyarrow tqdm python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [None]:
# Setup project - this adds project root to path
# From src/notebooks/, go up 2 levels to reach project root
from pathlib import Path
from notebook_helpers import setup_project

PROJECT_ROOT = setup_project(Path("../..").resolve())

  from .autonotebook import tqdm as notebook_tqdm


‚úÖ Project root: /Users/aryamaan/Desktop/Projects/SPOTIM8/spotim8


In [3]:
# Load and verify credentials from .env file
from notebook_helpers import setup_credentials

credentials_ok = setup_credentials(PROJECT_ROOT)
if not credentials_ok:
    print("‚ö†Ô∏è  Please set up credentials before continuing!")

‚ö†Ô∏è  No .env file found at /Users/aryamaan/Desktop/Projects/SPOTIM8/spotim8/.env
   Create one with SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET, SPOTIPY_REDIRECT_URI
‚ö†Ô∏è  Please set up credentials before continuing!


## 2Ô∏è‚É£ Connect to Spotify

This will open a browser window for authentication on first run.

In [4]:
# Initialize Spotify client with caching enabled
from notebook_helpers import setup_spotify_client

sf = setup_spotify_client(PROJECT_ROOT, progress=True, cache_ttl=3600)
DATA_DIR = PROJECT_ROOT / "data"

üì¶ API response cache enabled: /Users/aryamaan/Desktop/Projects/SPOTIM8/src/data/.api_cache (TTL: 3600s)
‚úÖ Connected to Spotify!
üìÅ Data will be saved to: /Users/aryamaan/Desktop/Projects/SPOTIM8/src/data


## 3Ô∏è‚É£ Sync Your Library

This fetches your playlists and tracks. First run may take a few minutes.

In [None]:
# Sync library (incremental - only fetches changes)
stats = sf.sync(
    owned_only=True,           # Only your playlists, not followed ones
    include_liked_songs=True   # Include Liked Songs as master playlist
)

print(f"\nüìä Sync complete!")

üîÑ Starting library sync...
üìù 106 playlist(s) changed: ‚ù§Ô∏è Liked Songs, AJFndsJan26, AJFndsDec25, AJFndsNov25, Anthems ...
‚ù§Ô∏è  Fetching Liked Songs (your master playlist)...


Fetching Liked Songs:  73%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé  | 3900/5341 [01:05<00:25, 55.68track/s]

KeyboardInterrupt: 

Fetching Liked Songs:  73%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé  | 3900/5341 [01:20<00:25, 55.68track/s]

## 4Ô∏è‚É£ Build Full Data Tables

Now let's build all the detailed tables (tracks, artists, etc.) from the synced data.

## 5Ô∏è‚É£ View Your Data


In [None]:
# Show status summary
sf.print_status()


üîÑ Syncing export data...
   üìÅ Checking: /Users/aryamaan/Desktop/Projects/src/data
      ‚úÖ Found: Spotify Account Data/ (data)
      ‚úÖ Found: Spotify Extended Streaming History/ (data)
      ‚úÖ Found: Spotify Technical Log Information/ (data)
‚úÖ Streaming history already synced (46,383 records)
‚úÖ Search queries already synced (763 records)
‚úÖ Wrapped data already synced
‚úÖ Follow data already synced (27 records)
‚úÖ Library snapshot already synced (5,168 records)
‚úÖ Playback errors already synced (9 records)
‚úÖ Playback retries already synced (2 records)
‚úÖ Web API events already synced (3,242 records)

‚úÖ Export data sync complete!


In [None]:
# Fetch all data tables (uses cache if available)
print("üì• Building data tables...\n")

playlists = sf.playlists()
print(f"‚úÖ Playlists: {len(playlists):,}")

playlist_tracks = sf.playlist_tracks()
print(f"‚úÖ Playlist-track links: {len(playlist_tracks):,}")

tracks = sf.tracks()
print(f"‚úÖ Unique tracks: {len(tracks):,}")

track_artists = sf.track_artists()
print(f"‚úÖ Track-artist links: {len(track_artists):,}")

artists = sf.artists()
print(f"‚úÖ Artists: {len(artists):,}")

# Build the wide table (everything joined)
library = sf.library_wide()
print(f"‚úÖ Library wide table: {len(library):,} rows")

üì• Building data tables...

‚úÖ Playlists: 560
‚úÖ Playlist-track links: 75,210
‚úÖ Unique tracks: 5,666
‚úÖ Track-artist links: 9,081
‚úÖ Artists: 2,854
‚úÖ Library wide table: 75,961 rows


## 6Ô∏è‚É£ Preview Your Data

In [None]:
# Preview playlists
print("üìÇ Your Playlists:")
playlists[["name", "track_count"]].head(15)


        SPOTIM8 DATA STATUS
üìÅ Cache directory: /Users/aryamaan/Desktop/Projects/src/data
üë§ User: 31iol2qamank24owygxo7kpq533y
üïê Last sync: 2026-01-11T10:00:46.560484+00:00

üìä Cached data:
   ‚Ä¢ Playlists: 560
   ‚Ä¢ Playlist tracks: 75,210
   ‚Ä¢ Unique tracks: 5,666
   ‚Ä¢ Track-artist links: 9,081
   ‚Ä¢ Artists: 2,854



In [None]:
# Preview tracks
print("üéµ Sample Tracks:")
tracks[["name", "album_name", "popularity", "duration_ms"]].head(10)

üìÇ Your Playlists:


Unnamed: 0,name,track_count
0,‚ù§Ô∏è Liked Songs,5268
1,OtherFinds25,281
2,AJDiscovery24,100
3,AJTop24,100
4,AJDiscovery23,100
5,AJTop23,100
6,AJDiscovery22,100
7,AJTop22,100
8,AJDiscovery21,100
9,AJTop21,100


In [None]:
# Preview artists
print("üé§ Top Artists (by followers):")
artists.nlargest(10, "followers")[["name", "genres", "popularity", "followers"]]

üéµ Sample Tracks:


Unnamed: 0,name,album_name,popularity,duration_ms
0,Night Drive,Night Drive,64,220000
1,don't u know?,Missing in Action (The Return),57,176004
2,High No More,High No More,73,198845
3,Back On 74,Volcano,80,209482
4,Afraid To Feel,Afraid To Feel,77,177524
5,DARE (feat. Shaun Ryder & Roses Gabor),Demon Days,83,244999
6,Casio,For Ever,70,234369
7,Electric Feel,Oracular Spectacular,79,229640
8,Move Your Feet,D-D-Don't Don't Stop The Beat,72,181826
9,What I Might Do,What I Might Do (Radio Edit),53,195737


In [None]:
# Preview artists
print("üé§ Top Artists (by followers):")
artists.nlargest(10, "followers")[["name", "genres", "popularity", "followers"]]

üé§ Top Artists (by followers):


Unnamed: 0,name,genres,popularity,followers
1144,Arijit Singh,"[hindi pop, bollywood, desi, bangla pop]",93,170571717
226,Taylor Swift,[],100,149663352
759,Ed Sheeran,[soft pop],90,124275845
95,Billie Eilish,[],93,121794604
550,The Weeknd,[],96,116875799
178,Ariana Grande,[pop],95,109014155
656,Eminem,"[rap, hip hop]",91,106545048
1070,Bad Bunny,"[reggaeton, trap latino, urbano latino, latin]",99,106389089
44,Drake,[rap],98,105877620
756,Justin Bieber,[],94,86335475


## 7Ô∏è‚É£ Check Saved Files

In [None]:
# List saved library data files
print(f"üìÅ Library data files in {DATA_DIR}:\n")

library_files = []
for f in sorted(DATA_DIR.glob("*.parquet")):
    # Only show library data files (exclude export data files)
    if f.name not in ['streaming_history.parquet', 'search_queries.parquet', 
                       'follow_data.parquet', 'library_snapshot.parquet',
                       'playback_errors.parquet', 'playback_retries.parquet',
                       'webapi_events.parquet']:
        library_files.append(f)

if library_files:
    print("üìö Library Data (from Spotify API):")
    for f in library_files:
        size_kb = f.stat().st_size / 1024
        print(f"   {f.name:30} {size_kb:>8.1f} KB")
    print(f"\n‚úÖ {len(library_files)} library data file(s) saved")
else:
    print("‚ö†Ô∏è  No library data files found")
    print("   Run the sync in section 3Ô∏è‚É£ to download your library")

üìÅ Files in /Users/aryamaan/Desktop/Projects/src/data:

üìö Library Data (from API):
   artists.parquet                   215.6 KB
   library_wide.parquet             2752.5 KB
   playlist_tracks.parquet           781.0 KB
   playlists.parquet                 110.5 KB
   track_artists.parquet             234.0 KB
   tracks.parquet                    657.8 KB

üì• Export Data (from Spotify exports):
   follow_data.parquet                 2.0 KB
   library_snapshot.parquet          268.1 KB
   playback_errors.parquet            15.3 KB
   playback_retries.parquet           14.6 KB
   search_queries.parquet             21.9 KB
   streaming_history.parquet        2586.9 KB
   webapi_events.parquet             103.6 KB
   wrapped_data.json                   3.6 KB


---

## ‚úÖ Done!

Your library is now synced and saved locally. Next steps:

1. **Analyze**: Open `02_analyze_library.ipynb` for visualizations
2. **Playlist Analysis**: Open `03_playlist_analysis.ipynb` for genre clustering
3. **Listening History**: Open `04_analyze_listening_history.ipynb` to analyze your actual listening patterns
4. **Create Playlists**: Open `05_liked_songs_monthly_playlists.ipynb` to create automated playlists
5. **Find Redundancy**: Open `06_identify_redundant_playlists.ipynb` to clean up your library

**Re-sync**: Run this notebook again anytime to fetch new changes. The data is cached, so future runs are fast!

**üí° Note**: For export data (streaming history, search queries, etc.), use the sync script: `src/scripts/automation/sync.py`