___________
# **Music Data Analysis: Spotify API with Pandas, Altair, and NetworkX**

This notebook illustrates several ways to operate Spotify API using Spotipy – a Python package designed to enable user-friendly (ish) interactions with Spotify's music metadata. In Part I of this notebook, we will use Spotipy and Pandas to **set up a DataFrame containing a collection of songs (tracks)** found by a playlist ID. Then, we will investigate ways to **visually represent and compare** this collection using Altair (Part II) and explore the basics of **network graph visualization** using Pyvis and NetworkX.

You can learn more about these resources here:
* [Spotify API](https://developer.spotify.com/documentation/web-api/)
* [Spotipy](https://spotipy.readthedocs.io/en/master/#)
* [Pandas](https://pandas.pydata.org/)
* [Altair](https://altair-viz.github.io/)
* [Pyvis](https://pyvis.readthedocs.io/en/latest/)
* [NetworkX](https://networkx.org/)

### Brief Introduction: Spotify, APIs, Spotify API

As many of you know, **Spotify** is a paid music streaming web application launched in 2006. The service has about 182 million subscribers and hosts more than 70 million tracks. In 2014, Spotify released **Spotify API**, a web-based interface that allows anyone with a Spotify account to search, analyze, and manipulate Spotify's music metadata. In short, **an API** is a piece of software that enables two or more programs to talk to each other. You can learn more about APIs [here](https://en.wikipedia.org/wiki/API).

Going through this notebook, you'll be able to request Spotify API access for your personal notebook and perform all sorts of analyses on the tracks, users, artists, albums, and playlists of your interest. While some of the material covered in this Notebook is very basic, some elements might seem quite puzzling. Please don't hesitate to reach out and ask questions.
______

## **Part 1: Setting up**
#### Step 1.1: Importing Python Libraries

In [1]:
import pandas as pd
import numpy as np
import random
import altair as alt
import requests
import inspect
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import networkx as nx
import networkx.algorithms.community as nx_comm
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pyvis
from pyvis import network as net
from itertools import combinations
from community import community_louvain
from copy import deepcopy
import plotly.graph_objects as go
import plotly.offline as pyo

#### Step 1.2: Providing User Credentials


In order to utilize the functionality of Spotify's API, you'll need to establish a connection between the local endpoint (your laptop) and the API (cloud). To do that, you'll need to create a **web client** (read more [here](https://en.wikipedia.org/wiki/Client_(computing))).

A web client typically requires authentication parameters **(key and secret)**. Spotify API uses OAuth2.0 authorization scheme. As we don't want to trouble you with setting up your own tokens, we have created one common set of login credentials for this course. You can learn more about authentication [here](https://en.wikipedia.org/wiki/OAuth).

Please find the tokens below:

In [2]:
# storing the credentials:
CLIENT_ID = "116bae2a86fd4737862816c5f45d4c36"
CLIENT_SECRET = "4f4a732d83d04cfa94acc26d2b77169f"
my_username = "sx47r9lq4dwrjx1r0ct9f9m09"

# instantiating the client
# source: Max Hilsdorf (https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6)
client_credentials_manager = SpotifyClientCredentials(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

At this point, you should be perfectly able to access the API! Hence, we move on to scraping and analyzing music metadata.

----------
## **Part 2: Analyzing Playlists**

### Step 2.1: Obtaining Data

We can **get tracks in a playlist** of a user using the *sp.user_playlist_tracks(username, playlist)* method and turning it into a Pandas DataFrame. The two parameters we need for this are **user ID** and **playlist ID**; they can be easily found on the Spotify website or in the Spotify app. Just look in the URL bar and copy the IDs as Strings.

In this we are using the following data:
* "sx47r9lq4dwrjx1r0ct9f9m09": Oleh's **Spotify User ID**. Typically, a Spotify ID is formatted somewhat nicer (e.g. "barackobama" but Oleh somehow messed his up...
* "7KfWEjHxpcOIkqvDqMW5RV": the **Playlist ID** for one of Oleh's playlists. 

Both playlist ID and User ID **can be found in a web browser** when accessing the User's or Playlist's webpage.

* for example, Oleh's Spotify User page can be found at: "https://open.spotify.com/user/sx47r9lq4dwrjx1r0ct9f9m09", and you can see that the User ID is what follows ater "...user/", meaning "sx47r9lq4dwrjx1r0ct9f9m09"
* for example, Spotify's featured Pop Mix playlist can be found at: "https://open.spotify.com/playlist/37i9dQZF1EQncLwOalG3K7", and you can find the Playlist ID ater "...playlist/", meaning "37i9dQZF1EQncLwOalG3K7"

In [19]:
full_40 = pd.DataFrame(sp.playlist_items("7cmryeqK3ftJ4L6IzobBax"))
sample_10_A = pd.DataFrame(sp.playlist_items("1NppEwvZhkjeG3ZTYoOwVM"))
sample_10_B = pd.DataFrame(sp.playlist_items("2hfOGugGPsjfPTYKlZojom"))



We can take a look at an **individual track** here:

As you can see, each track has **a large number of recorded audio features**. These are typically generated by Spotify and cover various musical aspects, ranging from Loudness to Liveness, from Danceability to Duration, and from Tempo to Time Signature. The feature values are of different **data types**: "key" is an **Integer**, "energy" is a **Float**, "id" is a **String**, and "mode" is a **Boolean** represented as Integer. As you work your way through this notebook, you will discover many options to count, bin, sort, graph, and connect variables and values of different types.

Consider the function below (courtesy of Max Hilsdorf), which can help us **loop through the items of a playlist and get every track's [audio] features of interest**:

In [4]:
# This function is created based on Max Hilsdorf's article
# Source: https://towardsdatascience.com/how-to-create-large-music-datasets-using-spotipy-40e7242cc6a6
def get_audio_features_df(playlist):
    
    # Create an empty dataframe
    playlist_features_list = ["artist", "album", "track_name", "track_id","danceability","energy","key","loudness","mode", "speechiness","instrumentalness","liveness","valence","tempo", "duration_ms","time_signature"]
    playlist_df = pd.DataFrame(columns = playlist_features_list)
    
    # Loop through every track in the playlist, extract features and append the features to the playlist df
    for track in playlist["items"]:
        # Create empty dict
        playlist_features = {}
        # Get metadata
        playlist_features["artist"] = track["track"]["album"]["artists"][0]["name"]
        playlist_features["album"] = track["track"]["album"]["name"]
        playlist_features["track_name"] = track["track"]["name"]
        playlist_features["track_id"] = track["track"]["id"]
        
        # Get audio features
        audio_features = sp.audio_features(playlist_features["track_id"])[0]
        for feature in playlist_features_list[4:]:
            playlist_features[feature] = audio_features[feature]
        
        # Concat the DataFrames
        track_df = pd.DataFrame(playlist_features, index = [0])
        playlist_df = pd.concat([playlist_df, track_df], ignore_index = True)
        
    return playlist_df

Note: the **@playlist parameter** (that is passed in to the get_audio_features_df() method) should be a **DataFrame consisting of several track objects**. In our case, we have one such collection stored in **playlist_tracks**, which we got from calling sp.user_playlist_tracks() on a playlist and storing it as a Pandas DataFrame. 

Hence, we run the get_audio_features_df() method on our collection to obtain the **audio features DataFrame** for the tracks in **playlist_tracks**.

In [12]:
audio_features_df = get_audio_features_df(sample_10_B)
len(audio_features_df)

10

As you can see above, our new DataFrame contains **Spotify's audio features for every track in the provided playlist**.

### Step 2.2: Charting Data with Radar Plot

>`import plotly.graph_objects as go
import plotly.offline as pyo
length = len(audio_features_df)
input_data = audio_features_df.sample(length).copy()
feature_columns = ["danceability", "energy", "speechiness", "liveness","danceability"]
def createRadarElement(row, feature_cols):
    return go.Scatterpolar(
        r = row[feature_cols].values.tolist(), 
        theta = feature_cols, 
        mode = 'lines', 
        name = row['track_name'])
data = list(input_data.apply(createRadarElement, axis=1, args=(feature_columns, )))  
fig = go.Figure(data, )
fig.show()`



### Import Libraries and Define Functions

In [15]:
import plotly.graph_objects as go
import plotly.offline as pyo

feature_columns = ["danceability", "energy", "speechiness", "liveness", "instrumentalness", "liveness","valence", "danceability"]

def createRadarElement(row, feature_cols):
    return go.Scatterpolar(
        r = row[feature_cols].values.tolist(), 
        theta = feature_cols, 
        mode = 'lines', 
        name = row['track_name'])

def radar_for_lists(list_of_lists):
    for item in list_of_lists:
        this_list = pd.DataFrame(sp.playlist_items(item))
        audio_features_df = get_audio_features_df(this_list)
        length = len(audio_features_df)
        input_data = audio_features_df.sample(length).copy()
    
        data = list(input_data.apply(createRadarElement, axis=1, args=(feature_columns, )))  
        fig = go.Figure(data, )
        fig.show()

In [25]:
list_of_lists = ["1NppEwvZhkjeG3ZTYoOwVM",
                "2hfOGugGPsjfPTYKlZojom",
                "6QWlhkFUKDxhW91gfMisNl",
                "1d4mKuQzM62pmSqrQKU9EX",
                "5NuMif6wvlJeJv4dMuX5eh",
                "4sWTtloXKghMS1cz48f2qI"]

radar_for_lists(list_of_lists)