#### Playlist Generation

Now that we have all of these insights on what fuels a successful playlist, we can use those to create our own 'hopefully viral' playlist. We will do so in a four step process:

1) We will ask the playlist curator for a genre of which to create a playlist from and we will subset our songs data into only songs from said genre.

2) We will give each individual song in this arsenal a 'score' for how well it matches these attributes.

3) We will collect the songs with the highest set of scores into a dataframe and have them as a pool to draw from.

4) We will randomly select songs from this pool, keeping in mind some of the genre variables we introduced and their performances.

Our scoring metric itself is affected, as mentioned, by the attributes from our final models. Those attributes are average song popularity, the top song popularity, the percentage of explicit songs, the average artists followers, the number of songs, and numerous interaction terms between genres and artists. In order to consistently make playlists for a given genre, however, only a few of these factors become significant. Since we are creating playlists for one genre at a time, the interaction terms become only relevant in their given genres. For demonstration purposes we will be creating an R&B playlist (so here the only valid interaction term is that with The Weeknd), as well as a Jazz playlist (no interactions), and a pop playlist (including an interaction with Sia). Additionally, the number of songs variable becomes less crucial here since we want to consistently generate playlists with the same number of songs. We'll use 50 since that's where the benefit tops out and is customary for Spotify curated playlists.

In [2]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.api import OLS
import sklearn.metrics as metrics
from sklearn.model_selection import cross_val_score
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import KFold
from sklearn.linear_model import RidgeCV
from sklearn.linear_model import LassoCV
from sklearn.metrics import r2_score
import plotly
plotly.tools.set_credentials_file(username='cohenk2', api_key='dF6eJ0G0zN5JH3ifD1sH')
import plotly.plotly as py
import plotly.figure_factory as ff
import scipy.stats as stats
import pylab 
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns; sns.set(color_codes=True)
sns.set(style="whitegrid")
%matplotlib inline

  from pandas.core import datetools


In [3]:
df = pd.read_csv("total_info.csv", encoding = 'latin')
del df['Unnamed: 0']

desired_genre = 'r&b'
df = df[df['artist_genres'].str.contains(desired_genre)]
df.head()

Unnamed: 0,playlist_id,playlist_name,followers,song_name,number_of_artists,artist_name,artist_id,popularity,track_number,explicit,duration_ms,available_markets,delete,artist_popularity,artist_followers,artist_genres
1686,37i9dQZF1DXcBWIGoYBM5M,Today's Top Hits,18123888.0,Plain Jane,1,A$AP Ferg,5dHt1vcEm9qb8fCyLcB3HL,90,8,1,173600,"['AD', 'AR', 'AT', 'AU', 'BE', 'BG', 'BO', 'BR...",,87,821565,"['dwn trap', 'hip hop', 'indie r&b', 'pop rap'..."
1687,37i9dQZF1DX0XUsuxWHRQd,RapCaviar,8318573.0,Plain Jane,1,A$AP Ferg,5dHt1vcEm9qb8fCyLcB3HL,90,8,1,173600,"['AD', 'AR', 'AT', 'AU', 'BE', 'BG', 'BO', 'BR...",,87,821565,"['dwn trap', 'hip hop', 'indie r&b', 'pop rap'..."
1688,37i9dQZF1DWVstK6FYh8Nw,This Is: Future,267999.0,New Level REMIX,4,A$AP Ferg,5dHt1vcEm9qb8fCyLcB3HL,58,1,1,251194,"['AD', 'AR', 'AT', 'AU', 'BE', 'BG', 'BO', 'BR...",,87,821565,"['dwn trap', 'hip hop', 'indie r&b', 'pop rap'..."
1689,37i9dQZF1DX2A29LI7xHn1,Signed XOXO,1302273.0,Plain Jane,1,A$AP Ferg,5dHt1vcEm9qb8fCyLcB3HL,90,8,1,173600,"['AD', 'AR', 'AT', 'AU', 'BE', 'BG', 'BO', 'BR...",,87,821565,"['dwn trap', 'hip hop', 'indie r&b', 'pop rap'..."
1690,37i9dQZF1DWY4xHQp97fN6,Get Turnt,3263165.0,Plain Jane,1,A$AP Ferg,5dHt1vcEm9qb8fCyLcB3HL,90,8,1,173600,"['AD', 'AR', 'AT', 'AU', 'BE', 'BG', 'BO', 'BR...",,87,821565,"['dwn trap', 'hip hop', 'indie r&b', 'pop rap'..."


In [4]:
def song_score(artist, song_pop, explicit, artist_followers):
    score = 1
    score = score*1.2*song_pop
    score = score*.05*artist_followers
    if "The Weeknd" in artist and 'r&b' == desired_genre:
        score = score*1.02
    if "Sia" in artist and 'pop' == desired_genre:
        score = score*1.04
    if song_pop > 95:
        score = score*1.09
    if explicit == 1:
        score = score*1.007
    return(score)

As aforementioned, we can score our R&B songs weighted the variables accordingly (based on the weighted outputs from our random forest model) - including the bonus on songs from the Weeknd. Below you can see a small subset of what this scoring looks like.

In [5]:
unique_songs = df['song_name'].unique()
columns = ['song_name','artist_name','score']
score_frame = pd.DataFrame(index=range(0,len(unique_songs)), columns=columns)
for idx,song in enumerate(unique_songs):
    try:
        score_frame.loc[idx]['song_name'] = song
        score_frame.loc[idx]['artist_name'] = df.loc[df['song_name'] == song]['artist_name'].iloc[0]
        this_artist = df.loc[df['song_name'] == song]['artist_name'].iloc[0]
        this_popularity = df.loc[df['song_name'] == song]['popularity'].iloc[0]
        this_explicit = df.loc[df['song_name'] == song]['explicit'].iloc[0]
        this_art_followers = df.loc[df['song_name'] == song]['artist_followers'].iloc[0]
        score_frame.loc[idx]['score'] = song_score(this_artist, this_popularity, this_explicit, this_art_followers)
    except:
        score_frame.loc[idx]['score'] = 0
        pass
score_frame.head()

Unnamed: 0,song_name,artist_name,score
0,Plain Jane,A$AP Ferg,4467510.0
1,New Level REMIX,A$AP Ferg,2879060.0
2,Work REMIX,A$AP Ferg,3822200.0
3,Rubber Band Man,A$AP Ferg,3673280.0
4,Olympian,A$AP Ferg,3127250.0


Our R&B playlist then is as follows:

In [6]:
score_frame.sort_values('score', ascending = False)
threshold = score_frame['score'].quantile(q = 0.75)
top_songs = score_frame[score_frame['score'] >= threshold]
curated_playlist = top_songs.sample(50)
curated_playlist

Unnamed: 0,song_name,artist_name,score
38,Don't Wake Me Up,Chris Brown,19794700.0
1384,"I Be Puttin' On - feat. Wiz Khalifa, French Mo...",Wale,2209300.0
552,Distance And Time,Alicia Keys,9519840.0
202,I Luv This Shit - Remix,August Alsina,2613060.0
979,No Limit,Usher,16949400.0
279,Make Me Like You,Gwen Stefani,4118860.0
602,Sin City,John Legend,6999780.0
615,#thatPOWER,will.i.am,5153730.0
1038,Beautiful,Christina Aguilera,8386350.0
1488,The First Noel,Mary J. Blige,2182670.0


We can repeat the same process for Jazz and Pop. Our Jazz playlist is as follows:

In [7]:
df = pd.read_csv("total_info.csv", encoding = 'latin')
del df['Unnamed: 0']

desired_genre = 'jazz'
df = df[df['artist_genres'].str.contains(desired_genre)]
df.head()

unique_songs = df['song_name'].unique()
columns = ['song_name','artist_name','score']
score_frame = pd.DataFrame(index=range(0,len(unique_songs)), columns=columns)
for idx,song in enumerate(unique_songs):
    try:
        score_frame.loc[idx]['song_name'] = song
        score_frame.loc[idx]['artist_name'] = df.loc[df['song_name'] == song]['artist_name'].iloc[0]
        this_artist = df.loc[df['song_name'] == song]['artist_name'].iloc[0]
        this_popularity = df.loc[df['song_name'] == song]['popularity'].iloc[0]
        this_explicit = df.loc[df['song_name'] == song]['explicit'].iloc[0]
        this_art_followers = df.loc[df['song_name'] == song]['artist_followers'].iloc[0]
        score_frame.loc[idx]['score'] = song_score(this_artist, this_popularity, this_explicit, this_art_followers)
    except:
        score_frame.loc[idx]['score'] = 0
        pass
    
score_frame.sort_values('score', ascending = False)
threshold = score_frame['score'].quantile(q = 0.75)
top_songs = score_frame[score_frame['score'] >= threshold]
curated_playlist = top_songs.sample(50)
curated_playlist

Unnamed: 0,song_name,artist_name,score
2327,The Red One,John Scofield,136927.0
2414,This Land Is Your Land,Sharon Jones & The Dap-Kings,348156.0
73,Let It Be,Aretha Franklin,2340950.0
150,The Bare Necessities,Tony Bennett,469550.0
234,White Christmas,Bing Crosby,138672.0
2613,Perhaps,Oscar D'LeÌÄå_n,301563.0
1987,The Fool,Fleshgod Apocalypse,94817.2
279,Work Song,Nina Simone,1904080.0
365,Eyes Of Man,Chuck Berry,918435.0
228,Chatanooga Choo Choo,Glenn Miller,221286.0


Our Pop playlist is as follows:

In [8]:
df = pd.read_csv("total_info.csv", encoding = 'latin')
del df['Unnamed: 0']

desired_genre = 'pop'
df = df[df['artist_genres'].str.contains(desired_genre)]
df.head()

unique_songs = df['song_name'].unique()
columns = ['song_name','artist_name','score']
score_frame = pd.DataFrame(index=range(0,len(unique_songs)), columns=columns)
for idx,song in enumerate(unique_songs):
    try:
        score_frame.loc[idx]['song_name'] = song
        score_frame.loc[idx]['artist_name'] = df.loc[df['song_name'] == song]['artist_name'].iloc[0]
        this_artist = df.loc[df['song_name'] == song]['artist_name'].iloc[0]
        this_popularity = df.loc[df['song_name'] == song]['popularity'].iloc[0]
        this_explicit = df.loc[df['song_name'] == song]['explicit'].iloc[0]
        this_art_followers = df.loc[df['song_name'] == song]['artist_followers'].iloc[0]
        score_frame.loc[idx]['score'] = song_score(this_artist, this_popularity, this_explicit, this_art_followers)
    except:
        score_frame.loc[idx]['score'] = 0
        pass
    
score_frame.sort_values('score', ascending = False)
threshold = score_frame['score'].quantile(q = 0.75)
top_songs = score_frame[score_frame['score'] >= threshold]
curated_playlist = top_songs.sample(50)
curated_playlist

Unnamed: 0,song_name,artist_name,score
6155,Welcome to the Family,Avenged Sevenfold,7082210.0
963,Nothing Without You,The Weeknd,33328800.0
7219,Unbreakable,Of Mice & Men,2038970.0
5134,Dirty Laundry,Nickelback,7090690.0
8850,U + Me (Love Lesson),Mary J. Blige,3101690.0
7117,Painting Flowers,All Time Low,3442350.0
3324,"Young, Wild & Free (feat. Bruno Mars) - feat. ...",Snoop Dogg,11267200.0
599,New Flame,Chris Brown,21831700.0
8326,Ain't Nobody Takin My Baby,Russ,4255020.0
3266,Untitled,blink-182,7586410.0
