# Analizing artist songs

We are going to analize songs from artists with the spotify API, in this case Taylor Swift's


## Get credentials from Spotify API

Go to your Spotify Dashboard at https://developer.spotify.com/dashboard/ and create a new application, then grab the Client ID and Client Secret Key.

In [None]:
# FILL IN THESE VALUES WITH YOUR OWN KEYS
client_id = ""
client_secret = ""

# Make sure to add this on "Edit Settings" in your Dashboard
redirect_uri = "http://localhost:9000"

Upon creating the Web App, you’ll need to go to “Edit Settings” and add a redirect URI, in our case, we aren’t hosting anything, so just set it to a localhost URL (make sure this doesn’t interfere with anything you have running locally, like Jupyter Notebook for instance).


# Connect with spotipy library

Next we will use the Spotipy library (which is a wrapper on the official REST API) to connect with the Spotify API directly with Python:

In [None]:
!pip install spotipy

In [None]:
import spotipy

In [None]:
from spotipy.oauth2 import SpotifyOAuth



Now we will connect, but we also need to define the scope of our App, do we want the app to be able to edit/create playlists? Or just read general information from Spotify? You can find a list of scopes here: https://developer.spotify.com/documentation/general/guides/authorization/scopes/


In [None]:


# Connect with API Keys created earlier
scope = "user-read-recently-played"
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(scope=scope,
                                               client_id=client_id,
                                              client_secret=client_secret,
                                              redirect_uri=redirect_uri))



 ## Test API

In [None]:
results = sp.current_user_recently_played()
type(results)


In [None]:

for idx, item in enumerate(results['items']):
    track_id = item['track']
    track_name = track_id['name']
    # This assumes one artist name, but its a list for collabs
    artist_name = track_id['artists'][0]['name']

    print(f"{idx}.) {track_name} by {artist_name}")



# Taylor Swift Albums

In [None]:
taylor_swift = sp.artist('06HL4z0CvFAxyc27GXpf02')
taylor_swift

In [None]:
taylor_albums = sp.artist_albums(taylor_swift['id'], limit=50)

In [None]:
for album in taylor_albums['items']:
    print(f"Album: {album['name']} -- ID: {album['id']}")

# Cleaning the data

In [None]:
album_names = []
albums = []


for album in taylor_albums['items']:
    album_name = album['name']
    album_id = album['id']
    # Very ugly hacky code to clean out the duplicates, but it worked

    if album_name[:3] not in album_names and "remix" not in album_name and "Karaoke" not in album_name and "Live" not in album_name:
        album_names.append(album_name[:3])
        albums.append((album_name, album_id))


In [None]:
albums

In [None]:
final_album_list = albums[:-5]
final_album_list

# Songs from taylor's Albums

In [None]:
taylor_song_ids = []


for album_name, album_id in final_album_list:
    tracks = sp.album_tracks(album_id)['items']
    for track in tracks:
        track_name = track['name']
        track_id = track['id']
        taylor_song_ids.append((album_name, album_id, track_name,track_id))

print(taylor_song_ids[0], len(taylor_song_ids))

# Song features

In [None]:
import pandas as pd

In [None]:


# Test out one song:
features = sp.audio_features(tracks=['4g2c7NoTWAOSYDy44l9nub'])



In [None]:
features

In [None]:
pd.Series(features[0])

In [None]:
df = pd.DataFrame()

# This will take awhile, due to us not using the batch feature
for album_name,album_id,track_name, track_id in taylor_song_ids:
    features = sp.audio_features(tracks=[track_id])[0]
    features['album_name'] = album_name
    features['album_id'] = album_id
    features['track_name'] = track_name
    ser = pd.Series(features)
    df = pd.concat([df,ser],axis=1)



In [None]:
# Tranpose to fix concat issue due to axis=1
df = df.transpose().set_index('id')
df.head()[['danceability', 'energy', 'key', 'acousticness','album_name','track_name']]


# Visualizing the songs

In [None]:
df.columns

In [None]:
data_features = df.drop(['type','uri','track_href','analysis_url','album_name','album_id','track_name'],axis=1)

In [None]:
data_features.describe()[['danceability','energy','key','loudness','mode','speechiness','acousticness']]

In [None]:
for col in data_features.columns:
    data_features[col] = data_features[col].apply(float)
    

In [None]:
data_features.describe()[['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness','acousticness']]


Plotting acoustic music quality

If you’re a true “Swiftie”, then you know some albums sound way more acoustic than others, let’s explore the data features to see if that is reflected in the data:


In [None]:
!pip install matplotlib

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
plt.figure(dpi=200)
sns.scatterplot(x='loudness',y='acousticness',data=data_features, hue=df['album_name'])
plt.legend(loc=(1.05,0.2))

If you are familiar with Taylor’s albums, you can see that it makes sense that more folklore and evermore songs are higher on the “acousticness” feature spectrum and lower on the “loudness” feature.

It looks like “reputation” is one of the more distinct albums. Let’s explore Taylor’s songwriting range by finding her most distinct songs.

# Calculating Taylor’s Songwriting Range

Obviously there is a lot of variance in Taylor’s songwriting across albums, but what songs are the most different? We can calculate the euclidean distance between all the songs and then find the largest distance.

In [None]:
import scipy

In [None]:
distances = scipy.spatial.distance.cdist(data_features, data_features, metric='euclidean')


In [None]:
distances.shape

Here we can see a heatmap of the distance between each song. Obviosly, each song will have 0 distance between itself (Which is the dark diagonal line you see). But what about this crazy range we see around the 48-50 id mark?

In [None]:
plt.figure(dpi=200)
sns.heatmap(distances)

In [None]:
df[distances==distances.max()][['album_name','track_name']]

Looks like its “Glitch” from Midnights and “All Too Well (10 Minute Version) (Taylor’s Version)” from the album Red. So what’s the next step? Give both songs a quick listen, and you’ll notice they are actually not extremely different, except for the length of time. Could that have been a major feature? Let’s try dropping some data features and trying out this analysis again.

In [None]:
no_time = data_features.drop(['duration_ms','time_signature'], axis=1)
distances = scipy.spatial.distance.cdist(no_time, no_time, metric='euclidean')

In [None]:
distances.max()

In [None]:
df[distances==distances.max()][['album_name','track_name']]



Interesting. My conclusion after listening to these songs? Given these features, its probably not useful to take such a mathematical approach on these songs, and instead, better to just listen and enjoy them!

Hope you found this interesting, check out our other blog posts for more fun analysis!
