# IACS Capstone Project
## Team Spotify 1: Playlist Prediction
### Midterm 1: Data Exploration
#### Omar Abboud, Sonu Mehta, Laura Ware
This document contains preliminary data exploration on the Spotify API, specifically via the Python module "spotipy," which allows easy access to Spotify API data given a company-issued API key.

In [72]:
import numpy as np
import pandas as pd
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
import sys

### Authentication

In [4]:
client_credentials_manager = SpotifyClientCredentials(client_id='df846cfd28e745178054587b3484f91c', client_secret='e3d39fc92a954e028ff1490288f3fe5c')
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

### Playlist DataFrame Generation
The below code generates a DataFrame with all the playlists that were generated by the user ID "spotify" - in other words, those created by the company for wide distribution and recommendation. Alongside each playlist ID, we have extracted the mean popularity of the playlist's tracks, the number of followers of the playlist, as well as the playlist's name.

In [7]:
names = []
total_tracks = []
followers = []
popularity_means = []

playlists = sp.user_playlists('spotify', limit=50, offset=0)

while playlists:
    for i, playlist in enumerate(playlists['items']):
        try:
            metadata = sp.user_playlist('spotify', playlist_id=playlist['id'],fields='followers.total,tracks.items(added_at, track.popularity, track.name)')
            popularities = np.empty(len(metadata['tracks']['items']))
            for index, item in enumerate(metadata['tracks']['items']): 
                popularities[index] = item['track']['popularity']
            popularity_means.append(popularities.mean())
            followers.append(metadata['followers']['total'])
            names.append(playlist['name'])
            total_tracks.append(playlist['tracks']['total'])
            #print("%4d %s" % (i + 1 + playlists['offset'], playlist['name']))
        except:
            a=0
            #print("NO METADATA")
    if playlists['next']:
        playlists = sp.next(playlists)
    else:
        playlists = None
        break



In [9]:
playlist_data = pd.DataFrame({
     'names': names,
     'total_tracks': total_tracks,
     'followers': followers,
     'mean_popularity': popularity_means,
    })

In [11]:
playlist_data.head()

Unnamed: 0,followers,mean_popularity,names,total_tracks
0,13468811,79.82,Today's Top Hits,50
1,5705788,72.72,Rap Caviar,50
2,3887591,60.5,electroNOW,52
3,2450946,51.569444,Afternoon Acoustic,72
4,2575182,57.87,Peaceful Piano,149


### Track DataFrame Generation
In the below code, we perform a deeper analysis of one sample playlist, "Today's Top Hits." We obtain the list of tracks from the playlist and make an API call to obtain more detailed information about each track, particularly audio features. We also include a "sequence" column that indicates where in the playlist that particular track is located.

In [79]:
playlist_sample = sp.user_playlist('spotify','5FJXhjdILmRA2z5bvz4nzf')['tracks']['items']
list_of_tracks = []
for a in playlist_sample:
    list_of_tracks.append(a['track'])
sample = pd.DataFrame(list_of_tracks)[['id','name','external_ids','artists','duration_ms','explicit','track_number','popularity']]
features = sp.audio_features(tracks=sample['id'])
features_df = pd.DataFrame(features)
sample['acousticness'] = features_df['acousticness']
sample['danceability'] = features_df['danceability']
sample['energy'] = features_df['energy']
sample['instrumentalness'] = features_df['instrumentalness']
sample['key'] = features_df['key']
sample['liveness'] = features_df['liveness']
sample['loudness'] = features_df['loudness']
sample['mode'] = features_df['speechiness']
sample['tempo'] = features_df['tempo']
sample['time_signature'] = features_df['time_signature']
sample['valence'] = features_df['valence']
sample['sequence'] = sample.index + 1
sample.head()

Unnamed: 0,id,name,external_ids,artists,duration_ms,explicit,track_number,popularity,acousticness,danceability,energy,instrumentalness,key,liveness,loudness,mode,tempo,time_signature,valence,sequence
0,1dNIEtp7AY3oDAKCGg2XkH,Something Just Like This,{u'isrc': u'USQX91700278'},"[{u'name': u'The Chainsmokers', u'external_url...",247626,False,1,0,0.0306,0.607,0.649,2.5e-05,11,0.174,-6.695,0.0362,102.996,4,0.47,1
1,12GEpg2XOPyqk03JZEZnJs,It Ain’t Me (with Selena Gomez),{u'isrc': u'SEBGA1700015'},"[{u'name': u'Kygo', u'external_urls': {u'spoti...",220780,False,1,76,0.0905,0.648,0.532,0.0,0,0.0831,-6.597,0.0746,99.983,4,0.497,2
2,6AeQlMyRzvSl1nkFztZyKl,Issues,{u'isrc': u'USUM71615691'},"[{u'name': u'Julia Michaels', u'external_urls'...",176346,False,1,82,0.416,0.704,0.423,0.0,8,0.0607,-6.792,0.0862,113.962,4,0.45,3
3,0FE9t6xYkqWXU2ahLh6D8X,Shape of You,{u'isrc': u'GBAHS1600463'},"[{u'name': u'Ed Sheeran', u'external_urls': {u...",233712,False,1,100,0.581,0.825,0.652,0.0,1,0.0931,-3.183,0.0802,95.977,4,0.933,4
4,3ebXMykcMXOcLeJ9xZ17XH,Scared To Be Lonely,{u'isrc': u'NLM5S1600025'},"[{u'name': u'Martin Garrix', u'external_urls':...",220883,False,1,91,0.0895,0.584,0.54,0.0,1,0.261,-7.786,0.0576,137.972,4,0.19,5


In [81]:
sample.to_csv('sample.csv',encoding='utf-8')