# Automatic Playlist Generation:
# A Content-Based Music Sequence Recommender System

## 1. Concept

#### A. Podcast-like Playlists:
- Categorical Tags (Genre, Era/Year, Label, Producers)
- Qualitative Tags (Dancebility, BPM, Key, Vocal/Instrumental)

#### B. Mixtape-like Playlists:  
- Audio Features
- Feature Similarity Measures   
    - Harmony
    - Rhythm
    - Sound
    - Instrumentation
    - Mood/Sentiment
    - Dynamic

#### C. DJ-Mixes:
- Start/Intro & End/Outro Features of Songs
- Beat Matching Features
- Song to Song Transition Features
- Story Telling Features over whole Song Sequence    
- Coherence Measures for Transitions & Sequences

## 2. Recommender Systems Overview: State of the Art Approaches

#### A. Two General Approaches:

- **Collaborative filtering:** Matrix Factorization, alternating least squares
- **Content-based approaches:** Input is music information (basis of songs and/or 
    existing playlists) fetched through Music Information Retrieval (MIR) processes


#### B. What are the Recommendations / the generated Playlists based on?

- emotion / mood
- genre
- user taste
- user similarity
- popularity


#### C. More recent Approaches / Deep Learning Approaches

- Sequence-aware music recommendation:
    - Next track recommendatons
    - Automatic playlist continuation (APC)

## 3. Possible Datasets, Models & Feature Selection

#### A. Datasets:

[**Melon Music Dataset**](https://github.com/MTG/melon-music-dataset)  
[last.fm Dataset](https://zenodo.org/record/6090214)  
[MTG Barcelona Datasets & Software](https://www.upf.edu/web/mtg/software-datasets)  
[Kaggle: Spotify Tracks Dataset](https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset?datasetId=2570056&sortBy=voteCount)  
[Kaggle: Spotify Playlists Dataset](https://www.kaggle.com/datasets/andrewmvd/spotify-playlists?datasetId=1720572&sortBy=voteCount)

#### B. Python Audio Analysis (MIR) Packages: 

[**Essentia (ML Application ready)**](https://essentia.upf.edu/)  
[Essentia citing papers](https://essentia.upf.edu/research_papers.html)  
[**Librosa (lightweigth analysis)**](https://librosa.org/doc/main/feature.html)


#### C. Youtube Tutorials:

[Spotify Playlist Generation](https://www.youtube.com/watch?v=3vvvjdmBoyc&list=PL-wATfeyAMNrTEgZyfF66TwejA0MRX7x1&index=2)  
[Librosa Music Analysis](https://www.youtube.com/watch?v=MhOdbtPhbLU)

## 4. Content-Based Recommendation

**Reasoning:** *Cold-start problem for metadata-based recommendation systems using only user-generated metadata*  
**Solution:** *Find underlying features of audio/music by MIR*  
**High-level Features:** *genre, mood, instrument(s), vocals, gender of singing voice, lyrics, ...*  
**Low-level Features:** *MFCC, ZCR, Spectral Coefficients, mixability*



## 5. Strategy

    1. Obtain mixes data

    2. Get songs for each mix  

    3. Analyze song / sequence data for content-based Recommendation system  

    4. Produce playlists  

    5. Compare to baseline model  

    (6. Produce dj-mix with transitions)

map mixes songs to spotify

extract items featurers matrix for mixes

## CODE

In [112]:
import os
import numpy as np
import pandas as pd

import matplotlib
import matplotlib.pylab as plt
%matplotlib inline

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import time
import math


import librosa
import librosa.display

from IPython.display import Audio
import ipywidgets as widgets

#### Librosa: MIR Library

In [113]:
audio_folder = '../audio/'

In [114]:
files = os.listdir(audio_folder)

In [116]:
audio_files = list()
for file in files:
    if file.endswith('.mp3') or file.endswith('.wav'):
        audio_files.append(file)

In [122]:
audio_files_loaded = dict()
for audio_file in audio_files:
    name = audio_file.split('.')[0]
    y, sr = librosa.load(f'{audio_folder}{audio_file}')
    audio_files_loaded[name] = [y, sr]

[src/libmpg123/id3.c:process_comment():584] error: No comment text / valid description?


In [123]:
audio_files_loaded

{"Gary Newman - are 'friends' electric": [array([ 0., -0.,  0., ..., -0.,  0., -0.], dtype=float32),
  22050],
 'Peaches- Fuck The Pain Away': [array([ 0., -0.,  0., ..., -0.,  0., -0.], dtype=float32),
  22050],
 'Die Goldenen Zitronen - Wenn ich ein Turnschuh wär': [array([ 1.3478850e-03,  2.3140463e-03,  2.5934768e-03, ...,
         -6.2827181e-05, -1.8949751e-04, -4.2081982e-04], dtype=float32),
  22050],
 'Lui Mafuta - 1928': [array([ 1.3289474e-05,  3.4368720e-06, -4.8658635e-06, ...,
          2.0529916e-07, -8.8673016e-07,  4.2070878e-06], dtype=float32),
  22050],
 'Charlotte Adigéry & Bolis Pupul - Paténipat': [array([-1.0906756e-03, -1.2390316e-03, -1.4695525e-04, ...,
         -2.0798245e-07, -1.4607306e-07, -2.1995569e-07], dtype=float32),
  22050],
 'Charlotte Adigéry & Bolis Pupul - 1,618': [array([-3.8951635e-05, -2.4765730e-05, -2.4795532e-05, ...,
          0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
  22050],
 'Madonna - Erotica': [array([ 0.,

In [None]:
# file = '../audio/Marie Davidson - Work It.mp3'
# song_file = file.split('/')[-1]
# song_file
# song_name = song_file.split('.')[0]
# song_name
# song_ext = song_file.split('.')[1]
# song_ext
# y, sr = librosa.load(file)

#### The Mel-Spectogram

In [None]:
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)

In [None]:
log_S = librosa.power_to_db(S, ref=np.max)

In [None]:
plt.figure(figsize=(16,4))
librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
plt.title('mel power sprectrogram for: {}'.format(song_name))
plt.colorbar(format='%+02.0f dB')
plt.show()

#### The Chromagram

In [None]:
y_harmonic, y_percussive = librosa.effects.hpss(y)

In [None]:
C = librosa.feature.chroma_cqt(y=y_harmonic, sr=sr)

In [None]:
plt.figure(figsize=(16,4))
librosa.display.specshow(C, sr=sr, x_axis='time', y_axis='chroma', vmin=0, vmax=1)
plt.title('Chromagram for: {}'.format(song_name))
plt.colorbar()
plt.show()

#### mixesDb Extractions