<blockquote style="border: 2px solid #666; padding: 10px; background-color: #acc;"><b>Reading Data Collected From Spotify's API</b></blockquote> 

In this project, we use the [Spotify API](https://developer.spotify.com/). This was done in python using the [Spotipy](https://spotipy.readthedocs.io/en/latest/#) python library. Using python, we obtained Spotify's top playlists and stored each playlist in a CSV format.

In [1]:
import matplotlib.pyplot as plt
import os
from scipy.cluster.hierarchy import dendrogram, linkage, leaves_list
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
from pandas import DataFrame
from os import listdir
from os.path import isfile, join
from scipy.cluster.hierarchy import fcluster
from mpl_toolkits.mplot3d import Axes3D


playlists = [f for f in listdir('./playlists/') if isfile(join('./playlists/', f))]
for playlist in playlists:
    print(playlist)
    

00s Rock Anthems
100% LatinX
2000s Smash Hits
50 Latin Classics
60s Rock Anthems
70s & 80s Acoustic
70s Rock Anthems
80s Hard Rock
80s Love Songs
80s Rock Anthems
80s Smash Hits
90's Hip-Hop Don't Stop
90s Acoustic
90s Pop Rock Essentials
90s Rock Anthems
A Perfect Day
A1 Hip-Hop
Abuela's Mix
Acoustic Covers
Acoustic Hits
Acoustic Hits_ Oldies but Goodies
Adrenaline Workout
African Heat
Afropop.csv
All A Cappella
All Aussie Hip-Hop
All Out 00s
All Out 50s
All Out 60s
All Out 70s
All Out 80s
All Out 90s
All The Feels
Alternative 00s
Alternative 10s
Alternative 60s
Alternative 70s
Alternative 80s
Alternative 90s
Alternative Hip Hop
Anti Pop
Are & Be
Autumn Acoustic
Autumn Leaves
Bachata Classics
Beast Mode
Beats & Rhymes
Beats n' Bars
Bedroom Pop
Big 3
Big Gains Workout
Black History Salute
Bodega Sounds
Body & Soul
Boleros
Boogaloo Essentials
Born in the USA
Born To Run 150 BPM
Cali Fire
Calm Vibes
Canciones del Recuerdo
Cardio
Certified Gold
Chicano Fly Zone
Chill Instrumental Beats
Ch

<blockquote style="border: 2px solid #666; padding: 10px; background-color: #acc;"><b>Concatenate Playlist Data</b></blockquote> 

In [2]:
path = r'C:\Users\edalr\Desktop\school\ANLT212\project2\spotify_clustering\playlists'

#Read an initial file to obtain column headers
initial_read = r'C:\Users\edalr\Desktop\school\ANLT212\project2\spotify_clustering\playlists\Boleros'
col_names = DataFrame(pd.read_csv(initial_read, index_col = None, header = 0))
col_names["playlist"] = "" 
col_names = list(col_names.columns.values)

#Create empty dataframe with saved column headers
songs_df = DataFrame(columns = col_names)

#Iterate through the folder titled "playlist"
#Folder contains a collection of csv files
#Each csv file represents a playlist pulled
#from Spotify's API
#Store all songs into a dataframe
#And store the playlists names
for filename in os.listdir(path):
        p = path + "\\" + filename
        #concatenate all files together and
        #add additional column "playlist" that saves
        #the name of the playlist each song originated from 
        playlist = DataFrame(pd.read_csv(p, index_col = None, header = 0))
        playlist["playlist"] = filename
        frames = [songs_df, playlist]
        songs_df = pd.concat(frames)

<blockquote style="border: 2px solid #666; padding: 10px; background-color: #acc;"><b>Print Data</b></blockquote> 

In [3]:
print("Columns names: \n")
print(songs_df.columns.values)

Columns names: 

['Unnamed: 0' 'song_name' 'song_popularity' 'date_added_to_playlist'
 'song_duration_ms' 'artist_name' 'album_names' 'album_release_date'
 'acousticness' 'danceability' 'energy' 'instrumentalness' 'key'
 'liveness' 'loudness' 'audio_mode' 'speechiness' 'tempo' 'time_signature'
 'audio_valence' 'playlist']


In [4]:
print(songs_df.head())

  Unnamed: 0                   song_name song_popularity  \
0          0  Boulevard of Broken Dreams              73   
1          1                  In The End              66   
2          2           Seven Nation Army              76   
3          3                  By The Way              74   
4          4           How You Remind Me              56   

  date_added_to_playlist song_duration_ms            artist_name  \
0   2018-09-29T13:24:32Z           262333              Green Day   
1   2018-09-29T13:24:32Z           216933            Linkin Park   
2   2018-09-29T13:24:32Z           231733      The White Stripes   
3   2018-09-29T13:24:32Z           216933  Red Hot Chili Peppers   
4   2018-09-29T13:24:32Z           223826             Nickelback   

                          album_names album_release_date  acousticness  \
0  Greatest Hits: God's Favorite Band         2017-11-17      0.005520   
1                       Hybrid Theory         2000-10-24      0.010300   
2       

<blockquote style="border: 2px solid #666; padding: 10px; background-color: #acc;"><b>Separate Numerical and String Data</b></blockquote> 

In [5]:
#Remove the unnamed column that's empty
songs_df = songs_df.set_index('song_name')


#Store string data that contains album and playlist info in a separate 
#data frame
song_info = songs_df[[ "artist_name", "album_names", "playlist"]]

#Drop features that aren't needed for clustering
songs_df = songs_df.drop(["Unnamed: 0", "date_added_to_playlist", "artist_name", "album_names", "album_release_date", "playlist"], axis = 1)



Our two separate dataframes are now `song_info` and `songs_df`, where `songs_df` contains numerical data for each attriubte, and `song_info` contains the artist, album, and playlist associated with each song. These are the two dataframes that we have written to CSV files that we will use throughout our project. To save both into separate CSV files, use the `.to_csv()` function. 

In [6]:
print(song_info.head())

                                      artist_name  \
song_name                                           
Boulevard of Broken Dreams              Green Day   
In The End                            Linkin Park   
Seven Nation Army               The White Stripes   
By The Way                  Red Hot Chili Peppers   
How You Remind Me                      Nickelback   

                                                   album_names  \
song_name                                                        
Boulevard of Broken Dreams  Greatest Hits: God's Favorite Band   
In The End                                       Hybrid Theory   
Seven Nation Army                                     Elephant   
By The Way                         By The Way (Deluxe Version)   
How You Remind Me                               Silver Side Up   

                                    playlist  
song_name                                     
Boulevard of Broken Dreams  00s Rock Anthems  
In The End              

In [7]:
print(songs_df.head())

                           song_popularity song_duration_ms  acousticness  \
song_name                                                                   
Boulevard of Broken Dreams              73           262333      0.005520   
In The End                              66           216933      0.010300   
Seven Nation Army                       76           231733      0.008170   
By The Way                              74           216933      0.026400   
How You Remind Me                       56           223826      0.000954   

                            danceability  energy  instrumentalness key  \
song_name                                                                
Boulevard of Broken Dreams         0.496   0.682          0.000029   8   
In The End                         0.542   0.853          0.000000   3   
Seven Nation Army                  0.737   0.463          0.447000   0   
By The Way                         0.451   0.970          0.003550   0   
How You Remind M