# üîÑ Spotify Library Sync

This notebook downloads your Spotify library and saves it locally for offline analysis.

**What it does:**
- ‚úÖ Fetches all your playlists (owned only)
- ‚úÖ Fetches your Liked Songs (‚ù§Ô∏è master playlist)
- ‚úÖ Downloads track and artist metadata
- ‚úÖ Saves everything to `../data/` as parquet files
- ‚úÖ Incremental updates (only fetches changes)

**Run this first!** Then use `02_analyze_library.ipynb` for analysis.

**üí° Tip:** For automated daily syncs, use `scripts/sync.py` instead (configured via cron job). See `README.md` for details.

## 1Ô∏è‚É£ Setup

Install dependencies and configure credentials.

In [34]:
# Install dependencies (run once)
%pip install -q pandas spotipy pyarrow tqdm python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [35]:
# Add project to path
import sys
from pathlib import Path

PROJECT_ROOT = Path("..").resolve()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print(f"‚úÖ Project root: {PROJECT_ROOT}")

‚úÖ Project root: /Users/aryamaan/Desktop/Projects/spotim8


In [36]:
import os
from dotenv import load_dotenv

# Load credentials from ../.env file
env_path = PROJECT_ROOT / ".env"
if env_path.exists():
    load_dotenv(env_path)
    print(f"‚úÖ Loaded credentials from {env_path}")
else:
    print(f"‚ö†Ô∏è  No .env file found at {env_path}")
    print("   Create one with SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET, SPOTIPY_REDIRECT_URI")

# Verify credentials are set
client_id = os.environ.get("SPOTIPY_CLIENT_ID", "")
if client_id and client_id != "YOUR_CLIENT_ID":
    print(f"   Client ID: {client_id[:8]}...")
else:
    print("   ‚ùå SPOTIPY_CLIENT_ID not set!")

‚úÖ Loaded credentials from /Users/aryamaan/Desktop/Projects/spotim8/.env
   Client ID: 8263fcc5...


## 2Ô∏è‚É£ Connect to Spotify

This will open a browser window for authentication on first run.

In [37]:
from spotim8 import Spotim8, set_response_cache
from spotim8.catalog import CacheConfig

# Data directory (stores downloaded data)
DATA_DIR = PROJECT_ROOT / "data"
DATA_DIR.mkdir(exist_ok=True)

# Enable API response caching to avoid rate limits
# Cached responses are reused for 1 hour
API_CACHE_DIR = DATA_DIR / ".api_cache"
set_response_cache(API_CACHE_DIR, ttl=3600)

# Initialize client with caching
sf = Spotim8.from_env(
    progress=True,
    cache=CacheConfig(dir=DATA_DIR)
)

print(f"‚úÖ Connected to Spotify!")
print(f"üìÅ Data will be saved to: {DATA_DIR}")

üì¶ API response cache enabled: /Users/aryamaan/Desktop/Projects/spotim8/data/.api_cache (TTL: 3600s)
‚úÖ Connected to Spotify!
üìÅ Data will be saved to: /Users/aryamaan/Desktop/Projects/spotim8/data


## 3Ô∏è‚É£ Sync Your Library

This fetches your playlists and tracks. First run may take a few minutes.

In [38]:
# Sync library (incremental - only fetches changes)
stats = sf.sync(
    owned_only=True,           # Only your playlists, not followed ones
    include_liked_songs=True   # Include Liked Songs as master playlist
)

print(f"\nüìä Sync complete!")

üîÑ Starting library sync...
‚úÖ All playlists up to date!
‚úÖ Sync complete! Checked 202 playlists, updated 0, added 0 track entries

üìä Sync complete!


## 4Ô∏è‚É£ Build Full Data Tables

Now let's build all the detailed tables (tracks, artists, etc.)

## 5Ô∏è‚É£ Sync Export Data (Optional)

Sync data from Spotify export folders (Account Data, Extended History, Technical Logs).


In [None]:
# Sync export data from Spotify export folders
from spotim8.streaming_history import sync_all_export_data
from pathlib import Path

PROJECT_ROOT = Path("..").resolve()
account_data_dir = PROJECT_ROOT / "Spotify Account Data"
extended_history_dir = PROJECT_ROOT / "Spotify Extended Streaming History"
technical_log_dir = PROJECT_ROOT / "Spotify Technical Log Information"

if any([account_data_dir.exists(), extended_history_dir.exists(), technical_log_dir.exists()]):
    print("üîÑ Syncing export data...")
    results = sync_all_export_data(
        account_data_dir=account_data_dir if account_data_dir.exists() else Path("/tmp"),
        extended_history_dir=extended_history_dir if extended_history_dir.exists() else Path("/tmp"),
        technical_log_dir=technical_log_dir if technical_log_dir.exists() else Path("/tmp"),
        data_dir=DATA_DIR,
        force=False
    )
    
    print("\n‚úÖ Export data sync complete!")
else:
    print("‚ÑπÔ∏è  No export folders found")
    print("   To enable export data sync, place folders in project root:")
    print("   - Spotify Account Data/")
    print("   - Spotify Extended Streaming History/")
    print("   - Spotify Technical Log Information/")


In [39]:
# Fetch all data tables (uses cache if available)
print("üì• Building data tables...\n")

playlists = sf.playlists()
print(f"‚úÖ Playlists: {len(playlists):,}")

playlist_tracks = sf.playlist_tracks()
print(f"‚úÖ Playlist-track links: {len(playlist_tracks):,}")

tracks = sf.tracks()
print(f"‚úÖ Unique tracks: {len(tracks):,}")

track_artists = sf.track_artists()
print(f"‚úÖ Track-artist links: {len(track_artists):,}")

artists = sf.artists()
print(f"‚úÖ Artists: {len(artists):,}")

# Build the wide table (everything joined)
library = sf.library_wide()
print(f"‚úÖ Library wide table: {len(library):,} rows")

üì• Building data tables...

‚úÖ Playlists: 633
‚úÖ Playlist-track links: 61,414
‚úÖ Unique tracks: 5,367
‚úÖ Track-artist links: 8,655
‚úÖ Artists: 2,674
‚úÖ Library wide table: 61,519 rows


## 5Ô∏è‚É£ View Your Data

In [40]:
# Show status summary
sf.print_status()


        SPOTIM8 DATA STATUS
üìÅ Cache directory: /Users/aryamaan/Desktop/Projects/spotim8/data
üë§ User: 31iol2qamank24owygxo7kpq533y
üïê Last sync: 2026-01-04T01:21:10.690751+00:00

üìä Cached data:
   ‚Ä¢ Playlists: 633
   ‚Ä¢ Playlist tracks: 61,414
   ‚Ä¢ Unique tracks: 5,367
   ‚Ä¢ Track-artist links: 8,655
   ‚Ä¢ Artists: 2,674



In [41]:
# Preview playlists
print("üìÇ Your Playlists:")
playlists[["name", "track_count"]].head(15)

üìÇ Your Playlists:


Unnamed: 0,name,track_count
0,‚ù§Ô∏è Liked Songs,5209
1,Jan26,44
2,AJamBlues,33
3,AJamClassical,37
4,AJamCountry/Folk,41
5,AJamMetal,52
6,AJamJazz,259
7,AJamWorld,314
8,AJFindsOther24,245
9,AJFindsDance24,43


In [42]:
# Preview tracks
print("üéµ Sample Tracks:")
tracks[["name", "album_name", "popularity", "duration_ms"]].head(10)

üéµ Sample Tracks:


Unnamed: 0,name,album_name,popularity,duration_ms
0,Night Drive,Night Drive,64,220000
1,don't u know?,Missing in Action (The Return),57,176004
2,High No More,High No More,73,198845
3,Back On 74,Volcano,80,209482
4,Afraid To Feel,Afraid To Feel,77,177524
5,DARE (feat. Shaun Ryder & Roses Gabor),Demon Days,82,244999
6,Casio,For Ever,69,234369
7,Electric Feel,Oracular Spectacular,79,229640
8,Move Your Feet,D-D-Don't Don't Stop The Beat,71,181826
9,What I Might Do,What I Might Do (Radio Edit),52,195737


In [43]:
# Preview artists
print("üé§ Top Artists (by followers):")
artists.nlargest(10, "followers")[["name", "genres", "popularity", "followers"]]

üé§ Top Artists (by followers):


Unnamed: 0,name,genres,popularity,followers
1144,Arijit Singh,"[hindi pop, bollywood, desi, bangla pop]",93,169787475
226,Taylor Swift,[],100,149145451
759,Ed Sheeran,[soft pop],90,124121799
95,Billie Eilish,[],93,121491242
550,The Weeknd,[],96,116486101
178,Ariana Grande,[pop],95,108828987
656,Eminem,"[rap, hip hop]",91,106354657
1070,Bad Bunny,"[reggaeton, trap latino, urbano latino, latin]",99,105927613
44,Drake,[rap],98,105568311
756,Justin Bieber,[],94,86161562


## 6Ô∏è‚É£ Check Saved Files

In [None]:
# List saved files
print(f"üìÅ Files in {DATA_DIR}:\n")

# Library data (from API sync)
print("üìö Library Data (from API):")
for f in sorted(DATA_DIR.glob("*.parquet")):
    if f.name not in ['streaming_history.parquet', 'search_queries.parquet', 
                       'follow_data.parquet', 'library_snapshot.parquet',
                       'playback_errors.parquet', 'playback_retries.parquet',
                       'webapi_events.parquet']:
        size_kb = f.stat().st_size / 1024
        print(f"   {f.name:30} {size_kb:>8.1f} KB")

# Export data (from Spotify exports)
print("\nüì• Export Data (from Spotify exports):")
export_files = []
for pattern in ['*.parquet', '*.json']:
    for f in sorted(DATA_DIR.glob(pattern)):
        if f.name in ['streaming_history.parquet', 'search_queries.parquet', 
                       'follow_data.parquet', 'library_snapshot.parquet',
                       'playback_errors.parquet', 'playback_retries.parquet',
                       'webapi_events.parquet', 'wrapped_data.json']:
            export_files.append(f)

if export_files:
    for f in export_files:
        size_kb = f.stat().st_size / 1024
        print(f"   {f.name:30} {size_kb:>8.1f} KB")
else:
    print("   ‚ö†Ô∏è  No export data found")
    print("   Run sync to load export data from:")
    print("   - Spotify Account Data/")
    print("   - Spotify Extended Streaming History/")
    print("   - Spotify Technical Log Information/")

üìÅ Files in /Users/aryamaan/Desktop/Projects/spotim8/data:

   artists.parquet                   202.5 KB
   library_wide.parquet             2369.0 KB
   playlist_tracks.parquet           722.6 KB
   playlists.parquet                 112.8 KB
   track_artists.parquet             221.3 KB
   tracks.parquet                    622.6 KB


---

## ‚úÖ Done!

Your library is now saved locally. Next steps:

1. **Analyze**: Open `02_analyze_library.ipynb` for visualizations
2. **Playlist Analysis**: Open `03_playlist_analysis.ipynb` for genre clustering
3. **Listening History**: Open `04_analyze_listening_history.ipynb` to analyze your actual listening patterns
4. **Create Playlists**: Open `05_liked_songs_monthly_playlists.ipynb` to create automated playlists
5. **Find Redundancy**: Open `06_identify_redundant_playlists.ipynb` to clean up your library

**Re-sync**: Run this notebook again anytime to fetch new changes. The data is cached, so future runs are fast!