# Getting Tracks from each Playlist using Spotify API w. Spotipy Library

From the previous notebook I now have the dataset that has the playlist name, id and the number of tracks on the playlist.
<br> I will iterate through the playlists and get each track along with some information from the track.
In the next notebook I will iterate through each track and get the audio features (duration, loudness, bounciness etc.) and do some cleanup</br>
<p> I am still attempting to match the songs to the Skip Prediction dataset based on the track's audio features </p>


### Imports & Credentials

In [1]:
import pandas as pd   

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials, SpotifyOAuth
import spotipy.util as util

In [2]:
token = util.prompt_for_user_token(USER-ID, # my user id on spotify
                                   client_id=CLIENT-ID, # my 'app' information
                                   client_secret=CLIENT-SECRET,
                                   redirect_uri=URL)
sp = spotipy.Spotify(auth=token) # getting the authorization token


In [3]:
df = pd.read_csv('./data/playlists.csv')
df.shape

(1437, 3)

In [4]:
df.head(15)

Unnamed: 0,playlist_name,playlist_id,total_tracks
0,Today's Top Hits,37i9dQZF1DXcBWIGoYBM5M,50
1,RapCaviar,37i9dQZF1DX0XUsuxWHRQd,50
2,Hot Country,37i9dQZF1DX1lVhptIYRda,54
3,¡Viva Latino!,37i9dQZF1DX10zKzsJ2jva,50
4,New Music Friday,37i9dQZF1DX4JAvHpjipBk,100
5,mint,37i9dQZF1DX4dyzvuaRJ0n,65
6,Are & Be,37i9dQZF1DX4SBhb3fqCJd,42
7,Rock Classics,37i9dQZF1DWXRqgorJj26U,145
8,Peaceful Piano,37i9dQZF1DX4sWSpwq3LiO,284
9,Rock This,37i9dQZF1DXcF6B6QPhFDv,50


### Cleanup
Before running the function to get the track Information, I need to clean up any playlists that are empty. I did check on Spotify to confirm that the playlists are empty, rather than pulled incorrectly

In [5]:
df.loc[df['total_tracks'] == 0]

Unnamed: 0,playlist_name,playlist_id,total_tracks
471,Carpool Karaoke,0NaXa68Xyo4wmMiun5Lqgm,0
612,Mollie King,0dVRyMERU2g9M4tWEfaD5B,0
639,Nelson Mandela Tribute,7iUOaMP7iKI22rBNflBwjX,0


In [6]:
df = df[df['total_tracks'] != 0]
df.shape

(1434, 3)

In [7]:
print(f"In the end I should have close to {df['total_tracks'].sum()} tracks") 

In the end I should have close to 103661 tracks


### Getting Track Info Loops
Looping over the playlists to get the track ids, name, popularity, etc. for each playlist. This will be broken up in 2 loops

In [8]:
all_tracks = [] # empty list to add all new track info pulled

In [9]:
for playlist in df.values: # for each playlist in the df dataframe
    results = sp.user_playlist_tracks('spotify',playlist[1]) # call Spotipy/Spotify API and get the track from playlist
    tracks = results['items'] # the resulting list (max 100)
    while results['next']: # if the list of tracks has a 'next' key, then:
        results = sp.next(results) # pull the next set of tracks from this playlist (some playlists had 500+ tracks)
        tracks.extend(results['items']) # extend the tracks list as more pulls are being made from this playlist
    all_tracks.extend(tracks) # add all new tracks to the all tracks empty list. 
#     print(f"Retrieved all {len(tracks)} tracks from {playlist[0]}") # print how many tracks were pulled from which playlist

print statement would look like:
<br> Retrieved all 50 tracks from Today's Top Hits</br>
<br> Retrieved all 50 tracks from RapCaviar</br>
<br> Retrieved all 54 tracks from Hot Country</br>
<br> Retrieved all 50 tracks from ¡Viva Latino!</br>
<br> Retrieved all 100 tracks from New Music Friday</br>
<br> Retrieved all 65 tracks from mint</br>
<br> Retrieved all 42 tracks from Are & Be</br>
<br> Retrieved all 145 tracks from Rock Classics</br>
<br> Retrieved all 284 tracks from Peaceful Piano</br>
<br> Retrieved all 50 tracks from Rock This </br>

In [10]:
all_tracks[:1] # get an idea of what each pull looks like

[{'added_at': '2020-10-02T04:00:51Z',
  'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/'},
   'href': 'https://api.spotify.com/v1/users/',
   'id': '',
   'type': 'user',
   'uri': 'spotify:user:'},
  'is_local': False,
  'primary_color': None,
  'track': {'album': {'album_type': 'single',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/7n2wHs1TKAczGzO7Dd2rGr'},
      'href': 'https://api.spotify.com/v1/artists/7n2wHs1TKAczGzO7Dd2rGr',
      'id': '7n2wHs1TKAczGzO7Dd2rGr',
      'name': 'Shawn Mendes',
      'type': 'artist',
      'uri': 'spotify:artist:7n2wHs1TKAczGzO7Dd2rGr'}],
    'available_markets': ['AD',
     'AE',
     'AL',
     'AR',
     'AT',
     'AU',
     'BA',
     'BE',
     'BG',
     'BH',
     'BO',
     'BR',
     'BY',
     'CA',
     'CH',
     'CL',
     'CO',
     'CR',
     'CY',
     'CZ',
     'DE',
     'DK',
     'DO',
     'DZ',
     'EC',
     'EE',
     'EG',
     'ES',
     'FI',
     'FR',
    

In [11]:
# empty list to store track info.
track_id = []
title = []
popularity = []
explicit =[]
release_date = []
artists = []

In [12]:
for track in all_tracks : # for each track in the created track list
    # I used try except because some values were missing and the loops would break, this way, wherever there was a missing value,
    # the except clause would fill it with 0. Later, when I look at it, I may be able to manually fill in these values IF any other
    # useful information was gathered.
    try :
        track_id.append(track['track']['id']) # track unique id
        title.append(track['track']['name']) #track title
        popularity.append(track['track']['popularity']) # popularity of the track
        explicit.append(track['track']['explicit']) # If there are swear words (boolean)
        release_date.append(track['track']['album']['release_date']) #the date the album was released
        artists.append(track['track']['artists'][0]['name']) # the first artist name
    except :
        track_id.append(0)
        title.append(0)
        popularity.append(0) # except clauses making the value 0 where no value was available to pull
        explicit.append(0)
        release_date.append(0)
        artists.append(0)

   


### Create DataFrame and Save
This is where I'll create a dataframe of the values that were gathered so far and Save them for a checkpoint. 

In [13]:
# Dataframe with columns of pulled values
tracks = pd.DataFrame(
    {'artists' : artists,
     'title': title,
     'track_id': track_id,
     'popularity' : popularity,
     'explicit': explicit,
     'release_date' : release_date
    })

In [14]:
tracks.head()

Unnamed: 0,artists,title,track_id,popularity,explicit,release_date
0,Shawn Mendes,Wonder,5KCbr5ndeby4y4ggthdiAb,0,False,2020-10-02
1,Justin Bieber,Holy (feat. Chance The Rapper),5u1n1kITHCxxp8twBcZxWy,92,False,2020-09-18
2,24kGoldn,Mood (feat. Iann Dior),3tjFYV6RSFtuktYl3ZtYcq,100,True,2020-07-24
3,Internet Money,Lemonade,02kDW379Yfd5PzW5A6vuGt,93,True,2020-08-14
4,BLACKPINK,Bet You Wanna (feat. Cardi B),1hPkiovjTqiJAJen4uyNRg,0,False,2020-10-02


In [15]:
# tracks.to_csv('./data/all_tracks.csv', index= False) # saving to CSV as checkpoint

The Data acquistion notebooks are broken up, because the pull requests from Spotify can be heavy at times, and we can see the breakdown plan for each notebook much better this way.
<br><center> See you in the next notebook for track features and cleanup</center></br>