# Music to My Ears

**Author:** [Vu Brown](https://www.linkedin.com/in/austin-brown-b5211384/)

## Overview
***
This project develops a content-based filtering recommendation system for musical tracks by utilizing a multilabel binarizer as a preprocessing tool on a million user playlists. This creates a large scale utility matrix (1,000,000 by 2,262,292) comprised of playlists and tracks, which is used with Non-negative Matrix Factorization (NMF) along side cosine similarity to build the model.

## Business Objective
***
This project explores unsupervised learning in an attempt to develop a musical track recommendation system simply based off of user playlists. Inspiration for this project came from Spotify's weekly generated playlist called Discover Weekly. The playlist consists of 30 tracks that the user has never heard before but are curtailed to the user's personal music preferences. Although I do not expect to create a recommendation system better than Spotify's Discover Weekly, I would like to explore a unique way this recommendation system could be built.

## The Data
***
The data used in this project was sourced from:
* A data package provided from [AIcrowd](https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files) consisting of 1,000,000 playlists.
* API calls to the [Spotify Web API|Spotify for Developers](https://developer.spotify.com/dashboard/applications).

## Preprocessing
***
The goal of this section is to create a utility matrix consisting of playlists (rows) and tracks included in those playlists (columns). Since I'm unable to obtain actual Spotify user data, the assumption I'm making is that each playlist acts as an individual user's musical preference, so, if I were too build a utility matrix as described, I would be able to develop a content-based filtering recommendation system from users' musical preferences.

The data package provided from AIcrowd was split into 1,000 separate JSON files which each included 1,000 playlists, totaling 1,000,000 playlists. The package also included a useful TEXT file, `stats.txt`, that had a basic summary of aspects regarding the dataset. The TEXT file was particularly useful in that it informed me to expect 2,262,292 unique tracks. Given this information, I expect the dimensions of the final utilitly matrix to be 1,000,000, by 2,262,292.

In [None]:
# Import entire modules
import json
import numpy as np
import pandas as pd
import sys
 
# Import specific functions from modules
from sklearn.preprocessing import MultiLabelBinarizer
from scipy.sparse import save_npz

**IMPORTANT NOTICE**

* The data package from AIcrowd is much too large to upload to GitHub. You will have to navigate to the link above, download the data package (ZIP file - 5.39GB) to the project folder on your local computer, and then extract the contents from the ZIP file there.
* This section of the notebook may require more than 8GB of RAM to run successfully. 

### Utility Matrix Set-Up

The first objective is to create a `for` loop that iterates through each JSON file to ultimately create a DataFrame, `final_df`, with 1 million rows representing playlists and one column that consists of lists of tracks pertaining to each playlist.

The first step to building the `for` loop is to read from the JSON files. Each of the JSON file names has two identifying features: an initial playlist number ending in 0, `initial_num`, and a final playlist number ending in 999, `final_num`. Applying an `incrementer` of 1,000 to both the initial and final playlist numbers within the `for` loop allows us to effectively read from each JSON file.

Next, we have to pull the tracks from each playlist. This requires a nested `for` loop that populates a temporary list, `data`, with 1,000 lists where each list consists of each track in the playlist and each track has identifying information pertaining to it, shown below:
   * `track_name` - the name of the track
   * `track_uri` - the Spotify URI of the track
   * `album_name` - the name of the track's album
   * `album_uri` - the Spotify URI of the album
   * `artist_name` - the name of the track's primary artist
   * `artist_uri` - the Spotify URI of track's primary artist
   * `duration_ms` - the duration of the track in milliseconds
   * `pos` - the position of the track in the playlist (zero-based)

Once `data` has been fully populated with 1,000 lists from the nested `for` loop, I convert `data` to a temporary DataFrame, `df`, with dimensions 1,000 by 1. I then manipulate the single column in `df` to create a new column that represents a list of tracks with only one identifying feature for a track opposed to all of the identifying features mentioned above. I also decided to use the `track_uri` instead of the `track_name` as the primary identifying feature for a track, so I could pull additional track data from Spotify's API later if needed.

Since memory consumption is an issue with this dataset, using a single identifying feature for a track minimized this problem tremendously. As you will find, I also took additional measures throughout this section of the notebook to reduce memory consumption as best as I could.

The final step to the `for` loop before iterating to the next JSON file is to concatenate `df` with `final_df`, essentially adding the list of tracks for each playlist from the currently open JSON file to the final DataFrame.

**NOTICE**: This block of code will take a while to run.

In [None]:
# DO NOT CHANGE THESE VALUES!!!
initial_num = 0
final_num = 999
incrementer = 1000

# If memory is a limitation, reduce the `num_files` as needed.
num_files = 1000

# Declaring empty DataFrame, `final_d`
# This DataFrame will consist of the full amount of playlists (aka 1,000,000)
# and one column that consists of lists of tracks pertaining to each playlist.
final_df = pd.DataFrame()

# The following `for` loop is used to iterate through each JSON file and populate `final_df`
for file_index in range(0, num_files):
    # `print` function shows the progress of the `for` loop
    print(file_index)
    
    # Declaring empty list, `data`
    data = []
    
    # Opening the JSON file
    f = open(f'./spotify_million_playlist_dataset/data/mpd.slice.{initial_num}-{final_num}.json')
    
    # Creating a dictionary, `d` from the JSON data contained in `f`
    d = json.load(f)
    
    
    # The following `for` loop is used to populate `data` with 1,000 lists.
    # Each list pertains to each playlist and consists of each track in the playlist.
    # Additionally, each track has identifying information pertaining to it.
    for playlist in range(len(d['playlists'])):
        tracks_list = d['playlists'][playlist]['tracks']
        data.append(tracks_list)
    
    
    # Converting `data` from a list to a DataFrame with dimensions 1,000 by 1
    df = pd.DataFrame(pd.Series(data))
    df.rename(columns={0: "tracks"}, inplace=True)
    
    # Creating an additional column within `df` that will inlcude lists of tracks with one identifying feature.
    # The primary identifying feature chosen: `track_uri`
    df['track_uris'] = df['tracks'].map(lambda x: [track['track_uri'] for track in x])
    
    # Dropping first column that is no longer needed and consequently reduces memory consumption
    df.drop(columns='tracks', inplace=True)
    
    # Concatenating `final_df` with `df` 
    final_df = pd.concat([final_df, df])
    
    # Incrementing initial and final playlist numbers in order to select next JSON file
    initial_num += incrementer
    final_num += incrementer
    
    # Closing the currently open JSON file
    f.close()

# Reducing memory used by the following variables
d = {}
data = []
df = pd.DataFrame()

In [None]:
# Output Expectation: (1000000, 1)
final_df.shape

In [None]:
# Reset indices to `final_df`
final_df.reset_index(drop=True, inplace=True)

In [None]:
# Here's a good visual representation of the DataFrame in its current state
final_df

### Multilabel Binarizer

`final_df` looks perfect so far! Just one more step to create the final utility matrix we desire. This will require the use of a multilabel binarizer to create a Compressed Sparse Row (CSR) matrix that will act as our final utility matrix.

A multilabel binarizer will work wonders for what we want to accomplish. For one, we want to create tons of columns that represent each track in the entire dataset, and most importantly identify with a 1 (yes) or a 0 (no) if a specific track was included or not in any of the one million playlists. A multilable binarizer accomplishes just this, and, additionally, scikit-learn's `MultiLabelBinarizer` can output a CSR matrix if the `sparse_output` parameter is set to `True`. Since most of the elements in our utility matrix will be zero-valued, a CSR matrix will be an ideal output datatype for reducing memory consumption.

**NOTICE**: This block of code will take a while to run.

In [None]:
mlb = MultiLabelBinarizer(sparse_output=True)

U = mlb.fit_transform(final_df.pop('track_uris'))

U

Notice that the dimensions of `U` are exactly what we had hoped for: 1,000,000 by 2,262,292!

Additionally, look how little memory the CSR matrix uses! 48 bytes!

In [None]:
# Output Expectation: 48 (bytes)
sys.getsizeof(U)

To bypass having to run the lengthy/time-consuming block of code above, I'm going to save the CSR matrix as a NPZ file and the list of tracks and playlists as NPY files to a folder called `tmp`, so I can simply load them into the modeling notebook later. This will save lots of time down the line.

In [None]:
save_npz('./tmp/U.npz', U)

In [None]:
tracks = mlb.classes_
np.save('./tmp/tracks.npy', tracks)

In [None]:
playlists = np.asarray(final_df.index)
np.save('./tmp/playlists.npy', playlists)

Although we have a utility matrix that looks promising, let's still convert the CSR matrix to a DataFrame in an effort to visually verify that the CSR matrix is a correct representation of the data.

**NOTICE**: This block of code will take a while to run.

In [None]:
final_df = pd.DataFrame.sparse.from_spmatrix(U, index=playlists, columns=tracks)

In [None]:
# Output Expectation: (1000000, 2262292)
final_df.shape

In [None]:
# Here's a good visual representation of the utility matrix in it's final state
final_df

In [None]:
# Output Expectation: 6 occurrences of this specific song
final_df['spotify:track:0002yNGLtYSYtc0X6ZnFvp'].value_counts()

The data in the DataFrame version of the utility matrix appears to be a correct representation of the dataset, which infers that the CSR matrix is as well. It's now time to move on to the modeling section!

Side Note: Look how significantly larger the DataFrame version of our utility matrix is in comparison to the CSR matrix!

In [None]:
# # NOTICE: This block of code will take a while to run.
# # Output Expectation: 531718224 (bytes)
# sys.getsizeof(final_df)

## Modeling
***
The goal of this section is to create a musical track recommendation system by using the utility matrix created from the Preprocessing section of this notebook. Non-negative Matrix Factorization (NMF) and cosine similarity will be employed by the system to generate the track recommendations.

### Spotify Web API
In order for all code in this notebook to execute properly, you will need your own unique Client ID and Client Secret. Here is how to create a Spotify Web App through the Spotify Web API to obtain them:
   * Create Spotify profile or sign in with your Spotify credentials [here](https://developer.spotify.com/dashboard/applications).
   * On the Spotify for Developers Dashboard, navigate to the "Create An App" button and fill-in/agree to all items.
   * Your unique Client ID and Client Secret will then be displayed within your newly created Web App.

In [None]:
# Import entire modules
import matplotlib.pyplot as plt
import spotipy
 
# Import specific functions from modules
from scipy.sparse import load_npz
from sklearn.decomposition import NMF
from sklearn.metrics.pairwise import cosine_similarity
from spotipy.oauth2 import SpotifyClientCredentials

####################################################################################
######### Provide your unique Client ID and Client Secret as strings below #########
####################################################################################
SPOTIPY_CLIENT_ID = ''
SPOTIPY_CLIENT_SECRET = ''
####################################################################################

# Establish Client Credentials Flow
auth_manager = SpotifyClientCredentials(SPOTIPY_CLIENT_ID, SPOTIPY_CLIENT_SECRET)
sp = spotipy.Spotify(auth_manager=auth_manager)

### Load Preprocessed Data

Load in the utility matrix, list of track URIs, and list of playlists:

In [None]:
# Output Expectation: 1000000x2262292 sparse matrix
U = load_npz('./tmp/U.npz')

U

Notice that only 65,464,776 elements are stored in the utility matrix. According to `stats.txt`, there should be a total of 66,346,428 elements. It appears that ~1.3% of the tracks (aka 881,652 tracks) was lost when using the multilabel binarizer. Although, not majorly significant, I found it interesting enough to point out and perhaps resolve at a later time. 

In [None]:
# Output Expectation for length of `tracks`: 2262292
tracks = np.load('./tmp/tracks.npy', allow_pickle=True)

display(len(tracks))
display(tracks)

In [None]:
# Output Expectation for length of `playlists`: 1000000
playlists = np.load('./tmp/playlists.npy', allow_pickle=True)

display(len(playlists))
display(playlists)

### Non-negative Matrix Factorization (NMF)

Since there are so many tracks and playlists in the utility matrix, a dimensionality reduction algorithm needs to be employed, which is where NMF comes in. NMF factorizes one matrix with non-negative elements, in our case the utility matrix, into two separate matrices, W and H, which will resultingly also have non-negative elements. The non-negativity of the elements allows the resulting factorized matrices to be more easily interpretable.

<center> $$U = WH$$ <center>

One dimension of the each of the factorized matrices will be significantly smaller, and in that dimension is where interesting, hidden features can be identified. The exact size of this dimension can also be tuned, and, as such, this is where the majority of my experimentation stemmed from. I used various hidden feature amounts (aka `n_components`) ranging from 2 to 10, and I began to notice that the algorithm was trying to match a major genre with each hidden feature. However, when the hidden feature amount was too small or too large, the algorithm wasn't optimized. This was easily identifiable when looking at the actual genres associated with the top tracks in each hidden feature and observing genre overlap amongst the top tracks. With that being said, the algorithm seemed to be categorizing tracks by genre at its best with 4 hidden features, which is why I have set `n_components` to 4 below.

**NOTICE**: This block of code will take a while to run.

In [None]:
n_components = 4

model = NMF(n_components, verbose=10, random_state=1)
H = model.fit_transform(U)
W = model.components_

np.save(f'./tmp/W_{n_components}.npy', W)
np.save(f'./tmp/H_{n_components}.npy', H)

In [None]:
W = np.load(f'./tmp/W_{n_components}.npy', allow_pickle=True)
H = np.load(f'./tmp/H_{n_components}.npy', allow_pickle=True)

In [None]:
# Output Expectation for `W.shape`: (4, 2262292)
# Output Expectation for `H.shape`: (1000000, 4)
display(W.shape)
display(H.shape)

The utility matrix has now be factorized into two matrices, `W` and `H`. `W` represents hidden features/genres vs. tracks, whereas `H` represents playlists vs. hidden features/genres.

The following function, `interpret_track`, inputs new lines within long track names and inbetween artist names in order to clean up the text displayed in the plotting function below, `plot_top_tracks`.

In [None]:
def interpret_track(track_name, track_artists):
    
    track_name_counter = 0
    updated_track_name = ''
    
    track_name_list = track_name.split(' ')
    for word in track_name_list:
        track_name_counter += len(word)
        if track_name_counter <= 20:
            updated_track_name += word + " "
        else:
            track_name_counter = 0
            updated_track_name += "\n" + word + " "
            
    
    interpretable_track = ''
    
    if len(track_artists) == 1:
        interpretable_track = updated_track_name + "\nby " + track_artists[0]
    elif len(track_artists) == 2:
        interpretable_track = updated_track_name + "\nby " + track_artists[0] + "\nand " + track_artists[1]
    else:
        for index in range(len(track_artists)):
            if index == 0:
                interpretable_track = updated_track_name + "\nby " + track_artists[index] + ","
            elif (index > 0) and (index < len(track_artists) - 1):
                interpretable_track += "\n" + track_artists[index] + ","
            else:
                interpretable_track += "\nand " + track_artists[index]
                
    return interpretable_track

The following function, `plot_top_tracks`, takes the factorized matrix, `W`, which consists of hidden features associated with tracks, and plots the top tracks in each hidden feature. This function was adapted from a Jupyter Notebook found within Praveen Gowtham's GitHub repository located [here](https://github.com/admveen/NMF_tutorial/blob/master/CVID19_Analysis.ipynb).

In [None]:
def plot_top_tracks(n_components, W, tracks, num_top_tracks, title):
    
    colors = ['seagreen', 'chocolate', 'darkblue', 'firebrick']
    
    fig, axes = plt.subplots(1, n_components, figsize=(30, 15), sharex=True)
    axes = axes.flatten()
    
    for index, values in enumerate(W):
        top_tracks_index = values.argsort()[: -num_top_tracks - 1 : -1]
        top_tracks = [tracks[i] for i in top_tracks_index]
                
        # Pull track names and artist names using Spotify Web API
        interpretable_top_tracks = []
        for top_track in top_tracks:
            track = sp.track(top_track)
            track_name = track['name']
            track_artists = [track['artists'][index]['name'] for index in range(len(track['artists']))]
            interpretable_top_tracks.append(interpret_track(track_name, track_artists))
        weights = values[top_tracks_index]

        ax = axes[index]
        ax.barh(interpretable_top_tracks, weights, height=0.6, color=colors[index])
        ax.set_title(f"Genre {index + 1}", fontdict={"fontsize": 20})
        ax.invert_yaxis()
        ax.tick_params(axis="both", which="major", labelsize=15)
        for border_line in "top right".split():
            ax.spines[border_line].set_visible(False)
        fig.suptitle(title, fontsize=25)

    plt.subplots_adjust(top=0.90, bottom=0.05, wspace=0.90, hspace=0.3)
    plt.savefig('./images/feature_importance.jpg', dpi=300, transparent=True)
    plt.show()

In [None]:
num_top_tracks = 10

plot_top_tracks(n_components, W, tracks, num_top_tracks, "Genres in NMF Model")

Notice that Genre 1 is mainly associated with Hip-Hop/Rap, Genre 2 with Pop/EDM, Genre 3 with Rock, and Genre 4 with Country.

In [None]:
genres = ['Hip-Hop/Rap', 'Pop/EDM', 'Rock', 'Country']
colors = ['seagreen', 'chocolate', 'darkblue', 'firebrick']

### Selecting a Playlist

Now, let's pull a specfic playlist that we would like to recommend tracks for. I personally enjoy playlist 253,699, which consists of EDM music.

In [None]:
playlist_0 = 253699

The following function, `track_uris_from_playlist`, returns a list of track URIs and a list of track indices associated with the tracks in `playlist_0`. The parameters of the function are the playlist number, `playlist_0`, and the list of all track URIs, `tracks`. The track URIs are useful for pulling track names and artist names from the Spotify API, which is what the next function is designed for.

In [None]:
def track_uris_from_playlist(playlist_0, tracks):
    
    np_playlist_0 = U[playlist_0].toarray()
    playlist_track_indices = np.where(np_playlist_0[0] == 1)[0]
    tracks_in_playlist_0 = [tracks[i] for i in playlist_track_indices]
    
    return (tracks_in_playlist_0, playlist_track_indices)

The following function, `interpretable_tracks_in_playlist`, returns a list of tracks from `playlist_0` with their names and artists by using the Spotify API.

In [None]:
def interpretable_tracks_in_playlist(playlist_0, tracks):

    interpretable_tracks_in_playlist_0 = []
    
    tracks_in_playlist_0 = track_uris_from_playlist(playlist_0, tracks)[0]
    for track_uri in tracks_in_playlist_0:
        track = sp.track(track_uri)
        track_name = track['name']
        track_artists = [track['artists'][index]['name'] for index in range(len(track['artists']))]
        interpretable_tracks_in_playlist_0.append(interpret_track(track_name, track_artists))

    return interpretable_tracks_in_playlist_0

Let's now take a look at the tracks in our playlist.

In [None]:
for index, track in enumerate(interpretable_tracks_in_playlist(playlist_0, tracks)):
    print('Track', index+1, ':')
    print(track)
    print("\n")

They sure are EDM tracks.

The following function, `playlist_verification`, verifies that all of the tracks in `playlist_0` are present. It does this by comparing the tracks in `playlist_0` to the tracks found in the original JSON source file. This is useful since a small portion of the tracks in the utility matrix are missing. This function assures you that the playlist you wish to generate recommendations for is accurate when compared to it's original source.

In [None]:
def playlist_verification(playlist_0, tracks):
    
    tracks_in_playlist_0 = track_uris_from_playlist(playlist_0, tracks)[0]
    
    f = open(f'./spotify_million_playlist_dataset/data/mpd.slice.{str(playlist_0)[:3]}000-{str(playlist_0)[:3]}999.json')
    d = json.load(f)
    tracks_in_playlist_file = [d['playlists'][int(str(playlist_0)[-3:])]['tracks'][i]['track_uri']\
                               for i in range(len(d['playlists'][int(str(playlist_0)[-3:])]['tracks']))]
    
    tracks_in_playlist_0.sort()
    tracks_in_playlist_file.sort()
    
    return tracks_in_playlist_0 == tracks_in_playlist_file

In [None]:
# Output Expectation: True
# If False, I recommend choosing a different playlist.
playlist_verification(playlist_0, tracks)

Great! Now that we have verified our playlist and seen first hand that our playlist has EDM tracks, let's take a look at the hidden feature/genre distribution of the playlist to see if our model categorized these tracks in the correct genre. This will require accessing `H`, the facorized matrix that consists of hidden features associated with playlists.

In [None]:
H_0 = H[playlist_0]

The following function, `plot_playlist_genre_distribution`, plots the genre distribution within the playlist.

In [None]:
def plot_playlist_genre_distribution(H_0, genres, colors):
    
    H_0_percent_distribution = [value/np.sum(H_0) for value in H_0]

    playlist_genre_distribution = []
    playlist_colors = []
    
    for index in np.array(H_0_percent_distribution).argsort():
        playlist_genre_distribution.append(genres[index])
        playlist_colors.append(colors[index])
    
    fig, ax = plt.subplots(figsize=(6, 4))
    ax.barh(y=playlist_genre_distribution, width=np.sort(H_0_percent_distribution)*100, height=0.6, color=playlist_colors)
    ax.set_title("Genre Distribution (%) Within Playlist", fontdict={"fontsize": 14})
    ax.set_xticks([num for num in range(0, 110, 10)])
    for border_line in "top right".split():
        ax.spines[border_line].set_visible(False)
    
    plt.show()

In [None]:
plot_playlist_genre_distribution(H_0, genres, colors)

This confirms playlist 253,699 does consist of mainly Pop/EDM tracks.

### Cosine Similarity
Now that we have selected our playlist and understand the genre distribution within the playlist, it is time to perform pair-wise cosine similarity between our playlist and all tracks. In doing this, we can find tracks that are most similar to our playlist and ultimately recommend those.

Cosine similarity defines the cosine of the angle between two vectors as the dot product between the two vectors divided by the product of the magnitude of the two vectors, like so:

<center> $$ \cos_{track}(\theta) = \frac{H_0 \cdot W^T_{track}}{||H_0||||W^T_{track}||}$$ <center>

Notice in the equation that we need to transpose the factorized matrix `W`. This is necessary so that there is no dimension incompatibility when performing pair-wise cosine similarity between our playlist and each track. After transposing `W`, cosine similarity can then be executed, and the cosine valus can be sorted from highest to lowest (highest being 1 and lowest being 0 in our case). Additionally, the indices for the cosine values can be obtained, specifically for the high cosine values, and these indices correlate directly to the indices of `tracks`, which has all of the Spotify track URIs. From there, we can begin adding tracks to a recommendation list, as long as the track is not already in the playlist and the track has a popularity score above 5, according to Spotify's popularity metric from 0 to 100. I decided to include this popularity metric as a factor to consider before adding a track to the recommendation list because I was noticing that many tracks that were being recommended were subjectively not good. I began to notice that most of these poor sounding tracks had a Spotify popularity score between 0 and 5, which is why I chose 5 as a cutoff. What I have explained in this segment is precisely what the function, `track_recommendations`, performs.

In [None]:
def track_recommendations(num_recommendations, W, H_0, playlist_0, tracks):
    
    # Transpose `W`
    W_T = np.transpose(W)
    
    # Perform cosine similarity between `H_0` and `W_T`
    cos_sim = cosine_similarity([H_0], W_T)
    
    # Sort values in `cos_sim` from highest to lowest,
    # and obtain indices which correlate to track indices
    track_recommendations = cos_sim[0].argsort()[:: -1]
    
    # Declare empty list, `final_recommendations_indices`
    final_recommendations_indices = []
    
    # Loop through `track_recommendations`
    for track_recommendation in track_recommendations:
        track_recommendation_in_playlist_0 = False
        
        # End `for` loop when length of `final_recommendations_indices`
        # is equal to `num_recommendations`
        if len(final_recommendations_indices) == num_recommendations:
            break
        else:
            # Check to see if recommended track is already in `playlist_0`
            for playlist_track_index in track_uris_from_playlist(playlist_0, tracks)[1]:
                if playlist_track_index == track_recommendation:
                    track_recommendation_in_playlist_0 = True
                    break
                else:
                    continue
            
            # If recommended track is not in `playlist_0`, make sure the popularity of
            # the track is at least above a 5 in accordance with Spotify API's 
            # popularity metric before adding to `final_recommendations_indices`
            if track_recommendation_in_playlist_0 == False:
                if sp.track(tracks[track_recommendation])['popularity'] > 5:
                    final_recommendations_indices.append(track_recommendation)
                else:
                    continue
            else:
                continue
                
    # Pull track URIs for list of final recommended tracks
    final_recommendation_uris = [tracks[i] for i in final_recommendations_indices]
    
    # Convert track URIs to interpretable track names with artist names
    interpretable_final_recommendations = []
    for final_recommendation_uri in final_recommendation_uris:
        track = sp.track(final_recommendation_uri)
        track_name = track['name']
        track_artists = [track['artists'][index]['name'] for index in range(len(track['artists']))]
        interpretable_final_recommendations.append(interpret_track(track_name, track_artists))

    return final_recommendation_uris, interpretable_final_recommendations

## Results of Model
***

In [None]:
num_recommendations = 10

recommendation_uris, recommendations = track_recommendations(num_recommendations, W, H_0, playlist_0, tracks)
for index, track in enumerate(recommendations):
    print('Track Recommendation', index+1, ':')
    print(recommendation_uris[index])
    print(track)
    print("\n")

Interestingly, of the top 10 recommended tracks, tracks 1, 3, 4, 5, and 9, were EDM related, and tracks 2, 5, 7, 8, and 10 were Spanish-speaking related. This isn't ideal, but I would say that of the 5 EDM tracks I subjectively like 3 of them (tracks 1, 3, and 4), and I've never heard of any of the artists before which is great for being exposed to new artists. Unfortunately, the Spotify API does not have a track language attribute, which would be ideal for identifying tracks that are not in the English language and excluding them from the recommendation list.

I implore you to try out other playlists and see what the model recommends. Simply alter the `playlist_0` and `num_recommendations` variabes and run the two blocks of code to generate track recommendations.

In [None]:
playlist_0 = 869717

H_0 = H[playlist_0]

print("Playlist Verified:", playlist_verification(playlist_0, tracks))

plot_playlist_genre_distribution(H_0, genres, colors)

# # Uncomment the code below if you wish to see the names of the tracks and their artists
# for index, track in enumerate(interpretable_tracks_in_playlist(playlist_0, tracks)):
#     print('Track', index+1, ':')
#     print(track)
#     print("\n")

In [None]:
num_recommendations = 10

recommendation_uris, recommendations = track_recommendations(num_recommendations, W, H_0, playlist_0, tracks)
for index, track in enumerate(recommendations):
    print('Track Recommendation', index+1, ':')
    print(recommendation_uris[index])
    print(track)
    print("\n")

## Conclusion
***
Although the model is not great considering the track recommendations are a little wonky, consider what data was used: playlists. Not a single characteristic of a track was actually utilized. No audio features, no track names, no artist names, no release info, etc. Just playlists. This was a limitation I knew coming into this project, but, considering that simple fact, this model works surprisingly well in that it can even recommend a few tracks remotely similar to those within a given playlist.

## Next Steps
***
When time permits, I would like to introduce track characteristics into this model and see how that would affect the model's performance. Spotify's Web API has just this capability as well when utilizing the function `audio_features` on a track. In addition, it could be useful to explore supervised learning techniques and build a classification model using Spotify's track popularity metric to predict top hits. Perhaps the combination of the two models could yield fascinating results.

## Appendix
***
The following function, `plot_top_playlists`, takes a transposed factorized matrix, H, which consists of hidden features associated with playlists, and plots the top playlists in each hidden feature. This function can be particularly useful in identifing playlists that strongly lean towards a specific hidden feature. This function was adapted from a Jupyter Notebook found within Praveen Gowtham's GitHub repository located [here](https://github.com/admveen/NMF_tutorial/blob/master/CVID19_Analysis.ipynb).

In [None]:
def plot_top_playlists(n_components, H_T, playlists, num_top_playlists, title):
    
    colors = ['seagreen', 'chocolate', 'darkblue', 'firebrick']
    
    fig, axes = plt.subplots(1, n_components, figsize=(20, 10), sharex=True)
    axes = axes.flatten()
    
    for index, values in enumerate(H_T):
        top_playlists_index = values.argsort()[: -num_top_playlists - 1 : -1]
        top_playlists = [str(playlists[i]) for i in top_playlists_index]
        weights = values[top_playlists_index]
        
        ax = axes[index]
        ax.barh(top_playlists, weights, height=0.6, color=colors[index])
        ax.set_title(f"Genre {index + 1}", fontdict={"fontsize": 20})
        ax.invert_yaxis()
        ax.tick_params(axis="both", which="major", labelsize=15)
        for i in "top right".split():
            ax.spines[i].set_visible(False)
        fig.suptitle(title, fontsize=25)

    plt.subplots_adjust(top=0.90, bottom=0.05, wspace=0.90, hspace=0.3)
    plt.show()

H_T = np.transpose(H)
plot_top_playlists(n_components, H_T, playlists, 10, "Genres in NMF Model")