# AI DJ: Sequential Playlist Generation with Intelligent Track Transitions

**Course:** CSE 158/258 - Web Mining and Recommender Systems

**Assignment:** 2

---

## Project Overview

This project implements an intelligent DJ system that generates sequential playlists by:
1. **Sequential Recommendation:** Predicting the next song given playlist history (FPMC)
2. **Transition Quality:** Assessing musical compatibility between consecutive tracks (XGBoost)
3. **Audio Generation:** Creating smooth crossfades based on learned transition quality (Spleeter)

**Dataset:** Spotify Million Playlist Dataset

**Key Models:**
- Factorized Personalized Markov Chains (FPMC)
- XGBoost Regression for transition quality
- Spleeter (pretrained) for audio source separation

In [None]:
# Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import json
from pathlib import Path
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

# Set random seed for reproducibility
np.random.seed(42)

---

# Section 1: Predictive Tasks and Evaluation

## 1.1 Task Definition

We formulate two complementary predictive tasks:

### Task 1A: Next Track Prediction
**Input:** Playlist history $s_1, s_2, ..., s_t$

**Output:** Next song $s_{t+1}$

**Objective:** Maximize the likelihood of predicting the actual next song in the playlist

### Task 1B: Transition Quality Regression
**Input:** Audio features of consecutive tracks $(s_i, s_j)$

**Output:** Smoothness score $Q(s_i, s_j) \in [0, 1]$

**Objective:** Learn a compatibility function that captures human DJ preferences

## 1.2 Evaluation Metrics

### For Task 1A (Sequential Recommendation):
- **Hit@K:** Fraction of test cases where the true next song appears in top-K predictions
- **AUC:** Area under ROC curve for ranking quality

### For Task 1B (Transition Quality):
- **MSE:** Mean Squared Error
- **MAE:** Mean Absolute Error
- **R²:** Coefficient of determination

## 1.3 Baselines

### Task 1A Baselines:
1. **Random:** Uniform random selection from catalog
2. **Popularity:** Always recommend most popular tracks
3. **First-Order Markov Chain:** $P(s_j | s_i)$ from co-occurrence statistics

### Task 1B Baselines:
1. **Mean Baseline:** Predict average smoothness score
2. **Linear Regression:** Using all 13 audio transition features

## 1.4 Validity Assessment

We assess model validity through:
- **Train/Val/Test Split:** 70/15/15 at playlist level (no data leakage)
- **Statistical Significance Testing:** Paired t-tests for model comparisons
- **Qualitative Analysis:** Manual inspection of generated playlists
- **Feature Importance:** XGBoost feature importance to validate learned patterns

In [None]:
# TODO: Implement evaluation metrics
# Will be completed in Section 4

---

# Section 2: Exploratory Data Analysis

## 2.1 Dataset Context

**Source:** Spotify Million Playlist Dataset (AICrowd Challenge)

**Collection Method:** User-generated playlists from Spotify platform

**Size:** 1M playlists, ~2M unique tracks

**Our Sample:** 100K playlists with 5-50 tracks each

## 2.2 Data Loading and Preprocessing

In [None]:
# Load raw playlist data
# TODO: Implement data loading from JSON files

data_dir = Path('../data/raw')
print(f"Data directory: {data_dir}")
print(f"Ready to load Spotify Million Playlist Dataset")

## 2.3 Exploratory Analysis

We perform the following analyses to understand the data and motivate our modeling choices:

In [None]:
# Analysis 1: Basic Statistics
# TODO: Show playlist length distribution, unique tracks, etc.

In [None]:
# Analysis 2: BPM Transition Distribution
# TODO: Histogram showing human DJs prefer smooth BPM transitions

In [None]:
# Analysis 3: Key Transition Heatmap (Circle of Fifths)
# TODO: Show harmonic mixing patterns

In [None]:
# Analysis 4: Energy Flow Over Playlist Position
# TODO: Show typical playlist energy arcs

In [None]:
# Analysis 5: Cold Start Analysis
# TODO: Show distribution of song frequencies (motivation for content-based features)

## 2.4 Spotify API Feature Enrichment

In [None]:
# Fetch audio features from Spotify API
# TODO: Implement Spotify API integration with caching
# Features: BPM, key, mode, energy, valence, danceability, acousticness, etc.

---

# Section 3: Modeling

## 3.1 Task Formulation

### Task 1A: Sequential Recommendation (FPMC)

We model playlist generation as a sequential recommendation problem using Factorized Personalized Markov Chains.

**Mathematical Formulation:**

$$\hat{y}_{u,i,j} = \langle V_u^U, V_i^I \rangle + \langle V_j^{LI}, V_i^{IL} \rangle$$

Where:
- $V_u^U$: User (playlist) embedding
- $V_i^I$: Item (song) embedding
- $V_j^{LI}$: Previous song embedding (as context)
- $V_i^{IL}$: Current song embedding (as transition target)

**Optimization:** Bayesian Personalized Ranking (BPR)

### Task 1B: Transition Quality (XGBoost Regression)

We learn a compatibility function $Q: (s_i, s_j) \rightarrow [0, 1]$ using gradient boosting.

**Input Features (13 dimensions):**
- BPM difference
- Key distance (circle of fifths)
- Energy difference
- Valence difference
- Danceability difference
- Acousticness difference
- Instrumentalness difference
- Loudness difference
- Speechiness difference
- Liveness difference
- Mode match (binary)
- Harmonic compatibility (binary)
- BPM ratio (for doubling/halving detection)

**Target:** Smoothness score computed from ground truth transitions

## 3.2 Model Architecture and Implementation

### 3.2.1 Baseline Models

In [None]:
# TODO: Implement baseline models
# - Random baseline
# - Popularity baseline
# - First-order Markov Chain
# - Mean baseline (regression)
# - Linear regression (regression)

### 3.2.2 FPMC Model

In [None]:
# TODO: Implement FPMC using LightFM
# Option: Implement from scratch for deeper understanding

### 3.2.3 XGBoost Regression Model

In [None]:
# TODO: Implement XGBoost for transition quality prediction

### 3.2.4 Hybrid System

In [None]:
# TODO: Combine FPMC + XGBoost
# Score = alpha * P_seq + beta * Q_trans

## 3.3 Advantages and Disadvantages

### FPMC:
**Advantages:**
- Captures sequential dependencies beyond first-order Markov
- Personalized to playlist style
- Efficient training with BPR

**Disadvantages:**
- Cold start problem for rare songs
- Requires sufficient playlist history
- May ignore audio features

### XGBoost Regression:
**Advantages:**
- Leverages rich audio features
- Handles cold start well
- Interpretable feature importance

**Disadvantages:**
- No sequential modeling
- Requires labeled transition quality scores
- May overfit to feature engineering

### Hybrid:
**Advantages:**
- Combines collaborative + content-based signals
- Best of both worlds

**Disadvantages:**
- Additional hyperparameter (mixing weight)
- Increased complexity

---

# Section 4: Evaluation

## 4.1 Evaluation Protocol

We evaluate on a held-out test set of playlists (15% of data).

### Metrics Justification:

**Hit@K:** Appropriate for recommendation tasks where presenting a ranked list is realistic. We report K=5, 10, 20.

**AUC:** Measures ranking quality independent of threshold.

**MSE/MAE:** Standard regression metrics for transition quality.

**R²:** Interpretable measure of explained variance.

## 4.2 Baseline Comparisons

In [None]:
# TODO: Evaluate all models on test set
# Create comparison tables

## 4.3 Statistical Significance Testing

In [None]:
# TODO: Paired t-tests between models

## 4.4 Feature Importance Analysis

In [None]:
# TODO: XGBoost feature importance plot

## 4.5 Qualitative Evaluation: Demo Playlists

In [None]:
# TODO: Generate example playlists
# - Morning Workout
# - Evening Chill
# - Failure case analysis

## 4.6 Audio Mixing Demo (Spleeter)

**Note:** We use Spleeter, a pretrained model by Deezer Research (Hennequin et al., 2020), for audio source separation. This is NOT our contribution—we only use it to demonstrate intelligent crossfading guided by our learned transition quality scores.

In [None]:
# TODO: Implement Spleeter-based audio mixing
# - Separate stems
# - Apply crossfade based on transition quality
# - Generate output audio

---

# Section 5: Discussion of Related Work

## 5.1 Sequential Recommendation

- **Rendle et al. (2010):** Introduced FPMC for next-basket recommendation, combining matrix factorization with Markov chains. Our work applies this to music playlists.
- **Chen et al. (2012):** Studied playlist generation using collaborative filtering.
- **Jannach et al. (2015):** Survey of session-based recommendation systems.

## 5.2 Music Recommendation

- **Van den Oord et al. (2013):** Deep content-based music recommendation using CNNs on raw audio.
- **Anderson et al. (2020):** Spotify's approach to algorithmic radio and personalized playlists.
- **Schedl et al. (2018):** Comprehensive survey of music recommendation systems.

## 5.3 Compatibility Modeling

- **McAuley et al. (2015):** Learning visual compatibility in fashion recommendation—we adapt this idea to audio features.
- **He & McAuley (2016):** VBPR for visual features in recommendation.

## 5.4 Audio Source Separation

- **Hennequin et al. (2020):** Spleeter—pretrained model for audio source separation. We use this for our demo.

## 5.5 Our Contribution

**Novel Aspects:**
1. Hybrid system combining sequential recommendation (FPMC) with explicit transition quality modeling (XGBoost)
2. Application to playlist generation with smooth transitions
3. End-to-end demo from recommendation to audio generation

**Comparison to Prior Work:**
- Most music recommenders focus on single-song prediction without considering transitions
- Our system explicitly models the sequential nature and audio compatibility
- We extend beyond prediction to actual audio generation guided by learned quality scores

## 5.6 Results Comparison

TODO: Compare our Hit@K and AUC results to reported benchmarks on Spotify dataset (if available)

---

# Conclusion

This project successfully implements an intelligent DJ system that:
1. Generates sequential playlists using FPMC
2. Assesses transition quality using XGBoost on audio features
3. Creates smooth audio mixes using learned quality scores

**Key Findings:**
- TODO: Summarize main results
- TODO: Discuss limitations and future work

**Future Directions:**
- Real-time inference optimization
- User feedback loop for personalization
- Advanced audio effects beyond crossfading
- Scaling to full million-playlist dataset

---

# References

1. Rendle, S., Freudenthaler, C., & Schmidt-Thieme, L. (2010). Factorizing personalized markov chains for next-basket recommendation. WWW 2010.

2. McAuley, J., Targett, C., Shi, Q., & Van Den Hengel, A. (2015). Image-based recommendations on styles and substitutes. SIGIR 2015.

3. Hennequin, R., Khlif, A., Voituret, F., & Moussallam, M. (2020). Spleeter: a fast and efficient music source separation tool with pre-trained models. ISMIR 2020.

4. Chen, S., Moore, J. L., Turnbull, D., & Joachims, T. (2012). Playlist prediction via metric embedding. KDD 2012.

5. Van den Oord, A., Dieleman, S., & Schrauwen, B. (2013). Deep content-based music recommendation. NIPS 2013.

6. Anderson, A., Kumar, R., Tomkins, A., & Vassilvitskii, S. (2020). The dynamics of repeat consumption. WWW 2013.

7. Spotify Million Playlist Dataset: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge