# Analyse artist relations in recently played Spotify tracks

email: egemen.sahin.23@dartmouth.edu

---

In [None]:
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import pandas as pd

## 1.Data Gathering and Pre-processing
---

### 1. Initializing Spotify API's environment variables

Find your environment variables in your app [here](https://developer.spotify.com/dashboard/applications/)

Spotify Object *sp* handles all calls from the Spotify API.

In [None]:
SPOTIPY_CLIENT_ID = 'your-client-id'
SPOTIPY_CLIENT_SECRET = 'your-client-secret'
SPOTIPY_REDIRECT_URI = 'http://localhost/'
SCOPE = "user-read-recently-played"

sp = spotipy.Spotify(
    auth_manager=SpotifyOAuth(
        client_id=SPOTIPY_CLIENT_ID, 
        client_secret=SPOTIPY_CLIENT_SECRET, 
        redirect_uri=SPOTIPY_REDIRECT_URI, scope=SCOPE
    ),
    requests_timeout=5,
)

### 2. Retrieving the $N$ recently played tracks

$N \leq 50$ according to [Spotify Documentation](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-recently-played)

In [None]:
N = 50

recent_tracks = sp.current_user_recently_played(limit=N)

# Preview the data
recent_tracks['items'][0]

### 3. Extracting data about the song's artist's name

recent_tracks is already ordered from Spotify, so the played_at value for each song is not needed.

$i = 0$ as the data only retains information about the main artist

In [None]:
i = 0
data = []

for song in recent_tracks['items']:
    data.append([song['track']['album']['artists'][i]['name']])
    
# Preview the data
data

### 4. Appending the artist of the next song to each song's artist

In the association rule which is used later, the antecedent (Artist A) cannot be equal to consequent (Artist A), so to avoid this issue, return a custom value *same_artist = SAME ARTIST* when antecedent = consequent, retaining information about same artist transitions (Artist A &rarr; Artist A).

For the most recently played song, since there is no next song, remove it from the dataset.

In [None]:
same_artist = 'SAME ARTIST'

for i in range(1, len(data)):
    if (data[i][0] == data[i-1][0]):
        data[i] = [data[i][0], same_artist]
    else:
        data[i]= [data[i][0], data[i-1][0]]

if (len(data[0]) == 1):
    del data[0]

# Preview the data
data

### 5. Encoding the data for each artist transition

In [None]:
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_array = te.fit(data).transform(data)

# Preview the data
te_array

### 6. Converting the encoded data into a pandas DataFrame

Retain information about artist name through *columns = te.columns_*.

In [None]:
df = pd.DataFrame(te_array, columns = te.columns_)

# Preview the data
df

## 2. Data Analysis
---

### 1. Extracting the frequent artist transitions

Return the transitions that have a minimum support of *m_s = 0.01*.

In [None]:
from mlxtend.frequent_patterns import apriori

m_s = 0.01

frequent_transitions = apriori(df, min_support = m_s, use_colnames = True)

# Preview the data
frequent_transitions

### 2. Finding the association rules (Artist A &rarr; Artist B)

Return the list of artist transitions with a minimum confidence of *m_c = 0.25*, and a minimum lift of *m_l = 1*.

In [None]:
from mlxtend.frequent_patterns import association_rules

m_c = 0.25
m_l = 1

rules = association_rules(frequent_transitions, metric = 'confidence', min_threshold = m_t)
rules = rules[rules['lift'] >= m_l]

### 3. Tidying up and formatting the dataframe

Exclude data where *antecedent = same_artist* as information about changing artists after listening to the same artist in a loop is not important for the task at hand.

Replace the *same_artist* string with artist's name.

In [None]:
rules = rules[rules['antecedents'] != {same_artist}]
rules = rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

m = rules['consequents'] == {same_artist}
rules.loc[m, ['consequents']] = (rules.loc[m, ['antecedents']].values)

# Formatting 
rules["antecedents"] = rules["antecedents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
rules["consequents"] = rules["consequents"].apply(lambda x: ', '.join(list(x))).astype("unicode")
rules.rename(columns = {"antecedents" : "artist", "consequents": "next artist"}, inplace = True)

# Preview the data

rules

### 3. Exporting the data into CSV

Convert the data into a CSV for visualization in Tableau

In [None]:
rules.to_csv('rules.csv')

# Shortcomings

1. Low amount of data: Spotify's API only allows for the last 50 recently played songs to be pulled, heavily limiting the ability to gain insight from data (low support).

2. Found trends between artists, however due to the nature of the Association Rule, A &rarr; B and B &rarr; A represent the same rule, however listening to artist B after artist A and vice versa are different. A way to work around this is having unique labels for the current and next artists (similar to the *same_artist* method in 1.4).

# Possible Uses

1. Recommendation algorithm: New Spotify feature that recommends recently played songs' artists to user in a new 'radio' mode. Different than current algorithms in its focus on the short term artist trends of a user.