# Spotify Playlist Sorter (with songdata.io)

This notebook implements a `SpotifyPlaylistSorter` class to sort music playlists based on:
- Harmonic mixing (Camelot keys)
- BPM (Tempo)
- Energy levels

**How it works:**
- Uses **songdata.io** for track analysis data (Key, BPM, Energy) instead of local audio analysis.
- Scrapes data from songdata.io using BeautifulSoup.
- The user **selects the starting (anchor) track** manually.

## Prerequisites
1.  Run `spotify_auth.ipynb` first to set up authentication and get playlist ids.
2.  Have your playlist ID ready.

## Features
- Loads playlist metadata from Spotify.
- **Scrapes track data** from songdata.io to get Key, BPM, and Energy.
- Builds a Camelot wheel neighbor map.
- Allows the user to select the starting anchor track.
- Sorts the playlist by finding the best transitions based on Key, BPM, and Energy.
- Compares the original and sorted order.
- Updates the playlist order on Spotify (with confirmation).

In [5]:
import os
import base64
import json
import requests
from dotenv import load_dotenv
import spotipy
import pandas as pd
import numpy as np
from typing import List, Dict, Optional
import time
import logging
from bs4 import BeautifulSoup

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

# Load environment variables
load_dotenv()

# Get Spotify credentials
CLIENT_ID = os.getenv('SPOTIFY_CLIENT_ID')
CLIENT_SECRET = os.getenv('SPOTIFY_CLIENT_SECRET')
REDIRECT_URI = 'http://127.0.0.1:8888/callback' # Or your configured URI

# Set up authentication scope
SCOPE = "playlist-modify-public playlist-modify-private playlist-read-private playlist-read-collaborative"

# Try to load cached token if it exists
try:
    with open('.spotify_cache', 'r') as f:
        token_info = json.load(f)
        access_token = token_info['access_token']
        logging.info("Using cached token")
        # Initialize Spotify client with cached token
        sp = spotipy.Spotify(auth=access_token)

        # Test if token is still valid
        try:
            sp.current_user()
            logging.info("Token is valid")
        except spotipy.exceptions.SpotifyException as e:
            if e.http_status == 401: # Token expired
                 logging.warning("Cached token expired, attempting refresh...")
                 # If refresh token exists, try to refresh
                 if 'refresh_token' in token_info:
                     refresh_token = token_info['refresh_token']
                     # Refresh the token
                     token_url = 'https://accounts.spotify.com/api/token'
                     auth_header = base64.b64encode(f"{CLIENT_ID}:{CLIENT_SECRET}".encode()).decode()

                     refresh_data = {
                         'grant_type': 'refresh_token',
                         'refresh_token': refresh_token
                     }

                     headers = {
                         'Authorization': f'Basic {auth_header}',
                         'Content-Type': 'application/x-www-form-urlencoded'
                     }

                     refresh_response = requests.post(token_url, data=refresh_data, headers=headers)

                     if refresh_response.status_code == 200:
                         new_token_info = refresh_response.json()
                         # Preserve the refresh token if not returned by Spotify
                         if 'refresh_token' not in new_token_info:
                             new_token_info['refresh_token'] = refresh_token

                         # Update cache file
                         with open('.spotify_cache', 'w') as f_write:
                             json.dump(new_token_info, f_write)

                         access_token = new_token_info['access_token']
                         sp = spotipy.Spotify(auth=access_token)
                         logging.info("Token refreshed successfully")
                     else:
                         logging.error(f"Failed to refresh token ({refresh_response.status_code}): {refresh_response.text}")
                         raise Exception("Failed to refresh token, need to re-authenticate")
                 else:
                     raise Exception("Cached token expired and no refresh token available, need to re-authenticate")
            else: # Other Spotify API error
                 raise e # Re-raise other errors

except (FileNotFoundError):
    logging.error("Authentication required: .spotify_cache file not found.")
    logging.error("Please run spotify_auth.ipynb first to authenticate with Spotify.")
    raise Exception("Authentication required. Run spotify_auth.ipynb first.")
except Exception as e:
    logging.error(f"An error occurred during authentication setup: {str(e)}")
    raise Exception(f"Authentication failed: {str(e)}")

INFO: Using cached token
INFO: Token is valid


In [6]:
class SpotifyPlaylistSorter:
    def __init__(self, playlist_id: str):
        self.playlist_id = playlist_id
        self.sp = sp # Assumes 'sp' is the authenticated spotipy client
        self.tracks_data = None
        self.camelot_map = self._build_camelot_map()
        self.playlist_name = None
        self.original_track_order = None

    def _build_camelot_map(self) -> Dict[str, List[str]]:
        """Build a map of compatible Camelot keys."""
        camelot_map = {}
        numbers = range(1, 13)
        letters = ['A', 'B']

        for num in numbers:
            for letter in letters:
                key = f"{num}{letter}"
                neighbors = []

                # Same number, different letter (switching between minor/major)
                other_letter = 'B' if letter == 'A' else 'A'
                neighbors.append(f"{num}{other_letter}")

                # Same letter, adjacent numbers
                prev_num = 12 if num == 1 else num - 1
                next_num = 1 if num == 12 else num + 1
                neighbors.extend([f"{prev_num}{letter}", f"{next_num}{letter}"])

                camelot_map[key] = neighbors

        return camelot_map

    def _scrape_songdata_io(self) -> Optional[pd.DataFrame]:
        """Scrape track data from songdata.io for the playlist."""
        url = f"https://songdata.io/playlist/{self.playlist_id}"
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }
        logging.info(f"Attempting to scrape data from: {url}")

        try:
            response = requests.get(url, headers=headers, timeout=30)
            response.raise_for_status()
        except requests.exceptions.RequestException as e:
            logging.error(f"Failed to fetch data from songdata.io: {e}")
            return None

        soup = BeautifulSoup(response.content, 'html.parser')

        # Find the table - adjust selector if songdata.io changes structure
        table = soup.find('table', id='table_chart')
        if not table:
            logging.error("Could not find the track table (id='table_chart') on the page.")
            logging.error("The website structure might have changed.")
            # Try finding by class as a fallback
            table = soup.find('table', class_='table')
            if not table:
                 logging.error("Could not find the track table by class either.")
                 return None
            else:
                 logging.warning("Found table using class='table' as fallback.")

        table_body = table.find('tbody', id='table_body')
        if not table_body:
            # Fallback if tbody doesn't have the specific ID
            table_body = table.find('tbody')
            if not table_body:
                logging.error("Could not find the table body (tbody) within the table.")
                return None
            else:
                logging.warning("Found tbody without specific ID.")

        tracks = []
        rows = table_body.find_all('tr', class_='table_object')

        if not rows:
             logging.error("Found table body, but no rows with class='table_object'.")
             return None

        logging.info(f"Found {len(rows)} potential track rows in the table.")

        for row in rows:
            try:
                # Extract data based on common class names (inspect songdata.io for accuracy)
                track_name_tag = row.find('td', class_='table_name')
                track_name = track_name_tag.find('a').text.strip() if track_name_tag and track_name_tag.find('a') else None

                artist_tag = row.find('td', class_='table_artist')
                artist = artist_tag.text.strip() if artist_tag else None

                key_tag = row.find('td', class_='table_key')
                key = key_tag.text.strip() if key_tag else None

                camelot_tag = row.find('td', class_='table_camelot')
                camelot = camelot_tag.text.strip() if camelot_tag else None

                bpm_tag = row.find('td', class_='table_bpm')
                bpm = bpm_tag.text.strip() if bpm_tag else None

                energy_tag = row.find('td', class_='table_energy')
                energy = energy_tag.text.strip() if energy_tag else None

                # Popularity is often in 'table_data' but might need specific identification
                all_data_tags = row.find_all('td', class_='table_data')
                popularity = None
                if len(all_data_tags) > 5:
                    release_date_tag = row.find('td', class_='table_data', string=lambda t: t and ('-' in t or '/' in t))
                    if release_date_tag:
                        prev_sibling = release_date_tag.find_previous_sibling('td', class_='table_data')
                        if prev_sibling:
                            popularity = prev_sibling.text.strip()

                # Spotify ID is usually in a data-src attribute
                spotify_link_cell = row.find('td', id='spotify_obj')
                spotify_id = spotify_link_cell['data-src'].strip() if spotify_link_cell and 'data-src' in spotify_link_cell.attrs else None

                if not all([track_name, artist, camelot, bpm, energy, spotify_id]):
                     logging.warning(f"Skipping row due to missing essential data (Name, Artist, Camelot, BPM, Energy, ID): {track_name}, {artist}")
                     continue

                tracks.append({
                    'id': spotify_id,
                    'Track': track_name,
                    'Artist': artist,
                    'Key': key,
                    'Camelot': camelot,
                    'BPM': bpm,
                    'Energy': energy,
                    'Popularity': popularity
                })
            except Exception as e:
                logging.warning(f"Error parsing a row: {e}. Row content: {row.text[:100]}...")
                continue

        if not tracks:
            logging.error("No tracks successfully parsed from the table.")
            return None

        df = pd.DataFrame(tracks)

        # --- Data Cleaning and Type Conversion ---
        try:
            # Convert relevant columns to numeric, coercing errors to NaN
            df['BPM'] = pd.to_numeric(df['BPM'], errors='coerce')
            # Energy from songdata.io might be 1-10 scale or 0-1. Let's assume 0-1 for now.
            raw_energy = pd.to_numeric(df['Energy'], errors='coerce')
            if raw_energy.max() > 1.0:
                 logging.warning("Detected Energy values > 1. Assuming 1-10 scale and normalizing to 0-1.")
                 df['Energy'] = raw_energy / 10.0
            else:
                 df['Energy'] = raw_energy

            df['Popularity'] = pd.to_numeric(df['Popularity'], errors='coerce')

            # Validate Camelot format (e.g., '1A', '12B')
            df['Camelot'] = df['Camelot'].str.upper()
            valid_camelot_mask = df['Camelot'].str.match(r'^[1-9]A$|^1[0-2]A$|^[1-9]B$|^1[0-2]B$', na=False)
            invalid_camelot = df[~valid_camelot_mask]['Camelot'].unique()
            if len(invalid_camelot) > 0:
                logging.warning(f"Found potentially invalid Camelot keys: {invalid_camelot}. Replacing with NaN.")
                df.loc[~valid_camelot_mask, 'Camelot'] = np.nan

        except Exception as e:
            logging.error(f"Error during data type conversion: {e}")

        logging.info(f"Successfully scraped and parsed {len(df)} tracks.")
        return df

    def load_playlist(self):
        """Load playlist name from Spotify and track data by scraping songdata.io."""
        logging.info(f"Loading playlist metadata for: {self.playlist_id}")
        try:
            # Get playlist name from Spotify (more reliable than scraping)
            playlist_info = self.sp.playlist(self.playlist_id, fields="name")
            self.playlist_name = playlist_info['name']
            logging.info(f"Playlist Name (from Spotify): '{self.playlist_name}'")
        except Exception as e:
            logging.warning(f"Failed to get playlist name from Spotify: {e}. Will proceed without it.")
            self.playlist_name = f"Playlist {self.playlist_id}"

        # Scrape track data from songdata.io
        scraped_data = self._scrape_songdata_io()

        if scraped_data is None or scraped_data.empty:
            logging.error("Failed to scrape data from songdata.io. Cannot proceed.")
            self.tracks_data = pd.DataFrame()
            self.original_track_order = []
            return None
        else:
            self.tracks_data = scraped_data
            # Store original order based on scraped table
            self.original_track_order = self.tracks_data['id'].tolist()
            logging.info(f"Using original track order based on songdata.io table ({len(self.original_track_order)} tracks).")

            # Ensure required columns exist even if scraping missed some
            for col in ['id', 'Track', 'Artist', 'Camelot', 'BPM', 'Energy', 'Popularity']:
                 if col not in self.tracks_data.columns:
                     self.tracks_data[col] = np.nan

            # Drop rows where essential sorting keys are missing AFTER scraping
            initial_count = len(self.tracks_data)
            self.tracks_data.dropna(subset=['id', 'Camelot', 'BPM', 'Energy'], inplace=True)
            dropped_count = initial_count - len(self.tracks_data)
            if dropped_count > 0:
                logging.warning(f"Dropped {dropped_count} tracks due to missing essential data (ID, Camelot, BPM, or Energy) after scraping.")

            if self.tracks_data.empty:
                 logging.error("No tracks remaining after dropping those with missing essential data.")
                 return None

        return self.tracks_data

    def calculate_transition_score(self, track1: pd.Series, track2: pd.Series) -> float:
        """Calculate transition score between two tracks using scraped data."""
        # --- Key Compatibility ---
        key1 = track1.get('Camelot')
        key2 = track2.get('Camelot')

        if pd.isna(key1) or pd.isna(key2) or key1 not in self.camelot_map:
            key_compatible = False
            key_multiplier = 0.5
            if key1 not in self.camelot_map and not pd.isna(key1):
                 logging.debug(f"Key {key1} not in camelot map for score calc.")
        else:
            key_compatible = key2 in self.camelot_map[key1]
            key_multiplier = 1.5 if key_compatible else 0.5

        # --- BPM Difference Score ---
        bpm1 = track1.get('BPM')
        bpm2 = track2.get('BPM')
        if pd.isna(bpm1) or pd.isna(bpm2):
            bpm_score = 0.0
            bpm_diff = float('inf')
        else:
            bpm_diff = abs(bpm1 - bpm2)
            bpm_score = max(0, 1 - (bpm_diff / 20.0))

        # --- Energy Transition Score ---
        energy1 = track1.get('Energy')
        energy2 = track2.get('Energy')
        if pd.isna(energy1) or pd.isna(energy2):
             energy_score = 0.5
        else:
            energy_diff = energy2 - energy1
            if energy_diff >= -0.1:
                energy_score = max(0, 1 - abs(energy_diff) * 0.5)
            else:
                energy_score = max(0, 1 - abs(energy_diff) * 1.5)

        # --- Combine Scores ---
        base_score = (bpm_score * 0.6) + (energy_score * 0.4)
        final_score = base_score * key_multiplier

        # --- Optional Bonuses ---
        if key_compatible and key1 == key2:
             final_score *= 1.1

        if bpm_diff <= 3:
             final_score *= 1.1

        return final_score

    def sort_playlist(self, start_track_id: str) -> List[str]:
        """Sort the playlist using transition scores, starting from anchor."""
        if self.tracks_data is None or self.tracks_data.empty:
            logging.error("Track data is not loaded or is empty. Cannot sort.")
            return []

        sortable_tracks = self.tracks_data.copy()

        if start_track_id not in sortable_tracks['id'].values:
             logging.error(f"Start track ID '{start_track_id}' not found in the loaded & filtered tracks.")
             if self.original_track_order and start_track_id in self.original_track_order:
                  logging.warning("Anchor track was present initially but filtered out due to missing data. Cannot use as anchor.")
             return []

        logging.info(f"Starting sort with anchor track ID: {start_track_id}")
        current_id = start_track_id
        sorted_ids = [current_id]
        remaining_tracks = sortable_tracks.set_index('id', drop=False)
        remaining_tracks = remaining_tracks.drop(current_id)

        while not remaining_tracks.empty:
            current_track_data = sortable_tracks[sortable_tracks['id'] == current_id]
            if current_track_data.empty:
                 logging.error(f"Could not find data for current track ID: {current_id}. Stopping sort.")
                 break
            current_track = current_track_data.iloc[0]

            scores = remaining_tracks.apply(
                lambda x: self.calculate_transition_score(current_track, x),
                axis=1
            )

            if scores.empty or scores.isna().all():
                logging.warning(f"Could not calculate valid scores from {current_track.get('Track', current_id)}. Stopping sort.")
                break

            next_track_idx = scores.idxmax()
            next_track = remaining_tracks.loc[next_track_idx]

            sorted_ids.append(next_track['id'])
            remaining_tracks = remaining_tracks.drop(next_track_idx)

            current_id = next_track['id']
            logging.debug(f"Added: {next_track.get('Track', current_id)} (Score: {scores.loc[next_track_idx]:.2f})")

        original_ids_set = set(self.original_track_order) if self.original_track_order else set()
        sorted_ids_set = set(sorted_ids)
        initial_sortable_ids = set(sortable_tracks['id'])
        missing_from_sort = list(initial_sortable_ids - sorted_ids_set)

        if missing_from_sort:
             logging.warning(f"Sort finished, but {len(missing_from_sort)} tracks that had data were not placed.")
             missing_tracks_ordered = [tid for tid in self.original_track_order if tid in missing_from_sort]
             logging.info(f"Appending {len(missing_tracks_ordered)} tracks that were not placed during sorting.")
             sorted_ids.extend(missing_tracks_ordered)
        elif len(sorted_ids) < len(initial_sortable_ids):
             logging.warning(f"Sorting ended with {len(sorted_ids)} tracks, but started with {len(initial_sortable_ids)} sortable tracks.")

        logging.info(f"Playlist sorting complete. Final track count: {len(sorted_ids)}")
        return sorted_ids

    def compare_playlists(self, sorted_ids: List[str]):
        """Compare original (scraped order) and sorted playlist."""
        if self.tracks_data is None or self.tracks_data.empty or not self.original_track_order:
            logging.error("Cannot compare playlists: Data not loaded or original order missing.")
            return pd.DataFrame(), pd.DataFrame()

        compare_df = self.tracks_data.copy()
        valid_original_ids = [tid for tid in self.original_track_order if tid in compare_df['id'].values]
        valid_sorted_ids = [tid for tid in sorted_ids if tid in compare_df['id'].values]

        if not valid_original_ids or not valid_sorted_ids:
             logging.error("No valid track data found for comparison after filtering.")
             return pd.DataFrame(), pd.DataFrame()

        original_df = compare_df.set_index('id').loc[valid_original_ids].reset_index()
        sorted_df = compare_df.set_index('id').loc[valid_sorted_ids].reset_index()

        original_df['Position'] = range(1, len(original_df) + 1)
        sorted_df['Position'] = range(1, len(sorted_df) + 1)

        position_map = {id_val: pos for pos, id_val in enumerate(valid_original_ids, 1)}
        sorted_df['Original Position'] = sorted_df['id'].map(position_map)
        sorted_df['Original Position'] = sorted_df['Original Position'].fillna('N/A')

        sorted_df['Position'] = pd.to_numeric(sorted_df['Position'], errors='coerce')
        sorted_df['Original Position'] = pd.to_numeric(sorted_df['Original Position'], errors='coerce')

        sorted_df['Position Change'] = sorted_df.apply(
            lambda row: row['Original Position'] - row['Position'] if pd.notna(row['Original Position']) and pd.notna(row['Position']) else np.nan,
            axis=1
        )

        return original_df, sorted_df

    def _get_track_uris(self, track_ids: List[str]) -> Dict[str, str]:
        """Fetch Spotify URIs for a list of track IDs."""
        uri_map = {}
        if not track_ids:
            return uri_map

        for i in range(0, len(track_ids), 50):
            batch_ids = track_ids[i:i+50]
            try:
                results = self.sp.tracks(tracks=batch_ids)
                for track in results['tracks']:
                    if track and track['id'] and track['uri']:
                        uri_map[track['id']] = track['uri']
                    elif track and track['id']:
                         logging.warning(f"Could not find URI for track ID: {track['id']}")
            except Exception as e:
                logging.error(f"Failed to fetch track details batch (starting index {i}): {e}")
            time.sleep(0.5)

        return uri_map

    def update_spotify_playlist(self, sorted_ids: List[str]):
        """Update the Spotify playlist with the new track order."""
        if not sorted_ids:
             logging.error("No sorted track IDs provided to update playlist.")
             return
        if self.tracks_data is None or self.tracks_data.empty:
             logging.error("No track data available to map IDs to URIs.")
             return

        logging.info(f"Fetching URIs for {len(sorted_ids)} sorted tracks...")
        uri_map = self._get_track_uris(sorted_ids)

        track_uris = [uri_map[track_id] for track_id in sorted_ids if track_id in uri_map]

        if not track_uris:
            logging.error("No valid track URIs could be fetched for the sorted IDs. Cannot update playlist.")
            return

        if len(track_uris) != len(sorted_ids):
             logging.warning(f"Could only find URIs for {len(track_uris)} out of {len(sorted_ids)} tracks. Playlist will be updated with available tracks.")

        logging.info(f"Updating Spotify playlist '{self.playlist_name}' with {len(track_uris)} tracks.")

        try:
            self.sp.playlist_replace_items(self.playlist_id, track_uris[:100])
            logging.info(f"Replaced/set first {min(len(track_uris), 100)} tracks.")

            for i in range(100, len(track_uris), 100):
                batch = track_uris[i:i+100]
                self.sp.playlist_add_items(self.playlist_id, batch)
                logging.info(f"Added batch of {len(batch)} tracks (starting index {i}).")
                time.sleep(1)

            logging.info(f"Successfully updated playlist '{self.playlist_name}' order on Spotify!")

        except Exception as e:
            logging.error(f"Failed to update Spotify playlist: {e}")
            logging.error("Check API permissions (scope), rate limits, and playlist ownership.")

    def print_transition_analysis(self, sorted_ids: List[str]):
        """Print analysis of the transitions in the sorted playlist."""
        if self.tracks_data is None or self.tracks_data.empty:
             logging.warning("No track data to analyze transitions.")
             return

        print("\nTransition Analysis:")
        print("-" * 80)

        if len(sorted_ids) < 2:
             print("Not enough tracks for transition analysis.")
             return

        track_map = self.tracks_data.set_index('id').to_dict('index')

        total_score = 0
        valid_transitions = 0

        for i in range(len(sorted_ids) - 1):
            track1_id = sorted_ids[i]
            track2_id = sorted_ids[i+1]

            track1_data = track_map.get(track1_id)
            track2_data = track_map.get(track2_id)

            if not track1_data or not track2_data:
                 print(f"\nSkipping transition {i+1}: Track data missing in final map for {track1_id or 'N/A'} or {track2_id or 'N/A'}")
                 continue

            track1 = pd.Series(track1_data, name=track1_id)
            track2 = pd.Series(track2_data, name=track2_id)

            has_essential_data = not pd.isna(track1.get('Camelot')) and not pd.isna(track1.get('BPM')) and not pd.isna(track1.get('Energy')) and \
                                 not pd.isna(track2.get('Camelot')) and not pd.isna(track2.get('BPM')) and not pd.isna(track2.get('Energy'))

            print(f"\n{i+1}. {track1.get('Track', track1_id)} → {track2.get('Track', track2_id)}")

            if has_essential_data:
                score = self.calculate_transition_score(track1, track2)
                key_compatible = False
                key1_camelot = track1.get('Camelot')
                key2_camelot = track2.get('Camelot')
                if key1_camelot and key1_camelot in self.camelot_map:
                     key_compatible = key2_camelot in self.camelot_map[key1_camelot]

                bpm1 = track1.get('BPM', np.nan)
                bpm2 = track2.get('BPM', np.nan)
                energy1 = track1.get('Energy', np.nan)
                energy2 = track2.get('Energy', np.nan)

                print(f"   Camelot: {key1_camelot or 'N/A'} → {key2_camelot or 'N/A'} ({'✓ Compatible' if key_compatible else '✗ Incompatible'}) ({'Perfect Match' if key1_camelot == key2_camelot and key1_camelot else ''})")
                print(f"   BPM:     {int(bpm1) if pd.notna(bpm1) else 'N/A'} → {int(bpm2) if pd.notna(bpm2) else 'N/A'} (Δ{abs(int(bpm1) - int(bpm2)) if pd.notna(bpm1) and pd.notna(bpm2) else 'N/A'})")
                energy1_str = f"{energy1:.2f}" if pd.notna(energy1) else 'N/A',
                energy2_str = f"{energy2:.2f}" if pd.notna(energy2) else 'N/A',
                energy_diff = f"{energy2 - energy1:.2f}" if pd.notna(energy1) and pd.notna(energy2) else 'N/A',
                print(f"   Energy:  {energy1_str} → {energy2_str} (Δ{energy_diff})")
                print(f"   Score:   {score:.2f}")
                total_score += score
                valid_transitions += 1
            else:
                 print(f"   Camelot: {track1.get('Camelot', 'N/A')} → {track2.get('Camelot', 'N/A')}")
                 print(f"   BPM:     {track1.get('BPM', 'N/A')} → {track2.get('BPM', 'N/A')}")
                 print(f"   Energy:  {track1.get('Energy', 'N/A')} → {track2.get('Energy', 'N/A')}")
                 print("   (Skipping score: Missing essential data for one or both tracks)")

        if valid_transitions > 0:
             average_score = total_score / valid_transitions
             print("-" * 80)
             print(f"Average Transition Score (for {valid_transitions} scored transitions): {average_score:.2f}")
        else:
             print("-" * 80)
             print("No valid transitions could be scored (check scraped data).")
        print("-" * 80)

In [7]:
# --- Configuration ---
# Enter your playlist ID here
PLAYLIST_ID = "5yivZ4ZtnHZv2lyTsnSt0d" # <--- Use your actual Playlist ID

# --- Create Sorter Instance ---
sorter = SpotifyPlaylistSorter(PLAYLIST_ID)

# --- Load Playlist Data (now uses scraping) ---
print("Loading playlist name from Spotify and scraping track data from songdata.io...")
print("\nDisclaimer: This process relies on scraping songdata.io.")
print("If the website structure changes, this script may fail to retrieve data.")

tracks_df = sorter.load_playlist()

if tracks_df is None or tracks_df.empty:
    print("\nFailed to load or scrape track data. Exiting.")
    print("Please check: Playlist ID, Spotify authentication, internet connection, and potential changes to songdata.io.")
    # exit() # Uncomment to stop execution on failure
else:
    print(f"\nLoaded data for {len(tracks_df)} tracks for playlist: {sorter.playlist_name}")
    print("\nSample tracks (with scraped features):")
    # Display relevant columns, handling potential missing ones
    display_cols = ['Track', 'Artist', 'Camelot', 'BPM', 'Energy', 'Popularity']
    display_df = tracks_df[display_cols].dropna(subset=display_cols)
    print(display_df.head(10))

INFO: Loading playlist metadata for: 5yivZ4ZtnHZv2lyTsnSt0d


Loading playlist name from Spotify and scraping track data from songdata.io...

Disclaimer: This process relies on scraping songdata.io.
If the website structure changes, this script may fail to retrieve data.


INFO: Playlist Name (from Spotify): 'Simply Bollywood (2000 - 2025)'
INFO: Attempting to scrape data from: https://songdata.io/playlist/5yivZ4ZtnHZv2lyTsnSt0d
INFO: Found 321 potential track rows in the table.
INFO: Successfully scraped and parsed 321 tracks.
INFO: Using original track order based on songdata.io table (321 tracks).



Loaded data for 321 tracks for playlist: Simply Bollywood (2000 - 2025)

Sample tracks (with scraped features):
                               Track  \
0     Javeda Zindagi-Tose Naina Lage   
1            Yaariyaan - Male Vocals   
2                        Zaroori Tha   
3  Ek Dil Ek Jaan (From "Padmaavat")   
4                             Sajdaa   
5                               Ravi   
6                        Jiyein Kyun   
7                      Shukran Allah   
8     Ghodey Pe Sawaar (From "Qala")   
9                      Labb Par Aaye   

                                              Artist Camelot  BPM  Energy  \
0                          Kshitij Tarey, Shilpa Rao     12A  130     0.6   
1                                       Mohan Kannan      9B  130     0.8   
2                               Rahat Fateh Ali Khan      1A  114     0.7   
3               Shivam Pathak, Sanjay Leela Bhansali     10B   98     0.7   
4  Shankar-Ehsaan-Loy, Rahat Fateh Ali Khan, Shan...      4B 

In [8]:
# --- Ask User for Anchor Track ---
print("\n" + "="*30 + " Select Anchor Track " + "="*30)
print("Please choose the first track for your sorted playlist.")
print("Available Tracks (from scraped data):")

# Display tracks with index for easier selection
if not tracks_df.empty:
    # Create a numbered list of tracks
    print("\nTrack List:")
    print("-" * 100)
    print(f"{'#':<4} {'Track':<50} {'Artist':<40} {'ID':<30}")
    print("-" * 100)
    for idx, row in tracks_df.iterrows():
        print(f"{idx+1:<4} {row['Track'][:47]:<50} {row['Artist'][:37]:<40} {row['id']}")
    print("-" * 100)
else:
    print("(No tracks available after loading/filtering)")
    exit() # Exit if no tracks are available to select from

anchor_track_id = None
while True:
    try:
        selection = input("\nEnter the track number (1-{}) or ID of the track you want to start with: ".format(len(tracks_df))).strip()
        
        # Check if user entered a number
        if selection.isdigit():
            track_num = int(selection)
            if 1 <= track_num <= len(tracks_df):
                anchor_track_id = tracks_df.iloc[track_num-1]['id']
                break
            else:
                print(f"Please enter a number between 1 and {len(tracks_df)}.")
        # Check if user entered a track ID
        elif selection in tracks_df['id'].values:
            anchor_track_id = selection
            break
        else:
            print("Invalid selection. Please enter either a track number or a valid track ID.")
    except ValueError:
        print("Please enter a valid number or track ID.")

selected_track = tracks_df[tracks_df['id'] == anchor_track_id].iloc[0]
print(f"\nSelected '{selected_track['Track']}' by {selected_track['Artist']} as the anchor track.")
print("="*80)

# --- Sort the Playlist ---
print("\nSorting playlist based on transitions...")
sorted_ids = sorter.sort_playlist(start_track_id=anchor_track_id)

if not sorted_ids:
     print("Sorting failed. Please check logs for errors.")
     # exit()
else:
    # --- Print Transition Analysis ---
    sorter.print_transition_analysis(sorted_ids)


Please choose the first track for your sorted playlist.
Available Tracks (from scraped data):

Track List:
----------------------------------------------------------------------------------------------------
#    Track                                              Artist                                   ID                            
----------------------------------------------------------------------------------------------------
1    Javeda Zindagi-Tose Naina Lage                     Kshitij Tarey, Shilpa Rao                04FnSzoogJD1iQbghug23K
2    Yaariyaan - Male Vocals                            Mohan Kannan                             0EiZCuVOP3E9MobSidssSC
3    Zaroori Tha                                        Rahat Fateh Ali Khan                     0JChw6k59cZxegh0SGceE1
4    Ek Dil Ek Jaan (From "Padmaavat")                  Shivam Pathak, Sanjay Leela Bhansali     0M9UxL5CebtmlijPMH6KfW
5    Sajdaa                                             Shankar-Ehsaan-Loy, Rahat 

INFO: Starting sort with anchor track ID: 5HXYyYih9EhgJzvldwtvRp



Selected 'Iktara' by Amit Trivedi, Kavita Seth, Amitabh Bhattacharya as the anchor track.

Sorting playlist based on transitions...


INFO: Playlist sorting complete. Final track count: 321



Transition Analysis:
--------------------------------------------------------------------------------

1. Iktara → Teri Deewani
   Camelot: 9B → 9A (✓ Compatible) ()
   BPM:     80 → 80 (Δ0)
   Energy:  ('0.50',) → ('0.60',) (Δ('0.10',))
   Score:   1.62

2. Teri Deewani → Nazm Nazm
   Camelot: 9A → 9B (✓ Compatible) ()
   BPM:     80 → 82 (Δ2)
   Energy:  ('0.60',) → ('0.60',) (Δ('0.00',))
   Score:   1.55

3. Nazm Nazm → Piya Aaye Na
   Camelot: 9B → 9A (✓ Compatible) ()
   BPM:     82 → 82 (Δ0)
   Energy:  ('0.60',) → ('0.60',) (Δ('0.00',))
   Score:   1.65

4. Piya Aaye Na → Sunn Raha Hai (Female)
   Camelot: 9A → 8A (✓ Compatible) ()
   BPM:     82 → 82 (Δ0)
   Energy:  ('0.60',) → ('0.60',) (Δ('0.00',))
   Score:   1.65

5. Sunn Raha Hai (Female) → Kuchh Toh Hua Hai
   Camelot: 8A → 9A (✓ Compatible) ()
   BPM:     82 → 83 (Δ1)
   Energy:  ('0.60',) → ('0.60',) (Δ('0.00',))
   Score:   1.60

6. Kuchh Toh Hua Hai → Dekho Na
   Camelot: 9A → 9B (✓ Compatible) ()
   BPM:     83 → 8

In [9]:
# --- Compare and Display Changes ---
print("\nComparing original (scraped order) and sorted playlists...")
original_df, sorted_df = sorter.compare_playlists(sorted_ids)

if not original_df.empty and not sorted_df.empty:
    # Display original playlist
    print(f"\n\nORIGINAL PLAYLIST ORDER (Scraped from songdata.io): {sorter.playlist_name}")
    print("-" * 90)
    cols_orig = ['Position', 'Track', 'Artist', 'Camelot', 'BPM', 'Energy']
    print(original_df[[c for c in cols_orig if c in original_df.columns]].to_string(index=False, max_colwidth=40))
    print("-" * 90)

    # Display sorted playlist with position changes
    print(f"\n\nPROPOSED SORTED PLAYLIST: {sorter.playlist_name}")
    print("-" * 100)
    cols_sorted = ['Position', 'Track', 'Artist', 'Camelot', 'BPM', 'Energy', 'Original Position', 'Position Change']
    # Format Position Change nicely
    if 'Position Change' in sorted_df.columns:
         sorted_df['Position Change Str'] = sorted_df['Position Change'].apply(
             lambda x: f"{'+' if pd.notna(x) and x > 0 else ''}{int(x)}" if pd.notna(x) else 'N/A'
         )
         # Ensure Original Position is displayed before the formatted change string
         cols_sorted_display = ['Position', 'Track', 'Artist', 'Camelot', 'BPM', 'Energy', 'Original Position', 'Position Change Str']
    else:
         cols_sorted_display = ['Position', 'Track', 'Artist', 'Camelot', 'BPM', 'Energy', 'Original Position']

    print(sorted_df[[c for c in cols_sorted_display if c in sorted_df.columns]].to_string(index=False, max_colwidth=40))
    print("-" * 100)

    # Highlight significant changes
    if 'Position Change' in sorted_df.columns and pd.api.types.is_numeric_dtype(sorted_df['Position Change']):
        big_movers = sorted_df.dropna(subset=['Position Change'])
        big_movers = big_movers[abs(big_movers['Position Change']) > 10] # Example threshold

        if not big_movers.empty:
            print("\n\nTracks with significant position changes (>10 positions):")
            print("-" * 80)
            for _, row in big_movers.iterrows():
                change = int(row['Position Change'])
                direction = "up" if change > 0 else "down"
                orig_pos = int(row['Original Position']) if pd.notna(row['Original Position']) else 'N/A'
                print(f"{row.get('Track', 'N/A')} by {row.get('Artist', 'N/A')}: Moved {direction} {abs(change)} positions (Original: {orig_pos}, New: {row.get('Position', 'N/A')})")
    else:
        print("\nCould not calculate significant position changes (Position Change data missing or non-numeric).")

else:
    print("\nCould not generate comparison dataframes.")


Comparing original (scraped order) and sorted playlists...


ORIGINAL PLAYLIST ORDER (Scraped from songdata.io): Simply Bollywood (2000 - 2025)
------------------------------------------------------------------------------------------
 Position                                    Track                                   Artist Camelot  BPM  Energy
        1           Javeda Zindagi-Tose Naina Lage                Kshitij Tarey, Shilpa Rao     12A  130     0.6
        2                  Yaariyaan - Male Vocals                             Mohan Kannan      9B  130     0.8
        3                              Zaroori Tha                     Rahat Fateh Ali Khan      1A  114     0.7
        4        Ek Dil Ek Jaan (From "Padmaavat")     Shivam Pathak, Sanjay Leela Bhansali     10B   98     0.7
        5                                   Sajdaa Shankar-Ehsaan-Loy, Rahat Fateh Ali K...      4B   80     0.7
        6                                     Ravi                               Sajja

In [10]:
# --- Confirmation and Update ---
print("\n" + "="*30 + " Update Spotify Playlist " + "="*30)
# Ask for user confirmation before updating the playlist
confirmation = input(f"\nReady to update '{sorter.playlist_name}' on Spotify with the sorted order? \nThis will REPLACE the current playlist content. (yes/no): ").strip().lower()

if confirmation in ['yes', 'y']:
    # Update the playlist on Spotify
    sorter.update_spotify_playlist(sorted_ids)
    # Confirmation message is now inside the update function on success/failure
else:
    print("\nUpdate canceled. Your Spotify playlist remains unchanged.")

# Handle case where sorting failed earlier
if 'sorted_ids' not in locals() or not sorted_ids:
     print("\nSkipping comparison and Spotify update: Playlist was not sorted successfully.")




INFO: Fetching URIs for 321 sorted tracks...
INFO: Updating Spotify playlist 'Simply Bollywood (2000 - 2025)' with 321 tracks.
INFO: Replaced/set first 100 tracks.
INFO: Added batch of 100 tracks (starting index 100).
INFO: Added batch of 100 tracks (starting index 200).
INFO: Added batch of 21 tracks (starting index 300).
INFO: Successfully updated playlist 'Simply Bollywood (2000 - 2025)' order on Spotify!
