# Adding Spotify API Integration
- This notebook uploads rand_samp, a list containing a random sample of names, to the Spotify API.
- Variables are extracted and the data is cleaned in preparation for EDA and model building.

## Retrieving rand_samp and viewing it

In [19]:
# Retrieving rand_samp
%store -r rand_samp

print(rand_samp[0:20])

['Lekhara - Khara', 'The Seed - Edit', 'Lady Marian (2003 - Remaster)', 'Empires Lost to Time', 'Digane - Raphael Afro Vibe Remix Extended', 'How He Works (feat. Nico Vega) - Coflo Remix', 'Dia Perfeito', 'A State of Trance (ASOT 1183) - New Scoops!', 'Flame In My Heart', 'By the Sea - Live', 'Laugh Track (feat. Phoebe Bridgers)', "Ok, Let's Roll", 'Canção pra Tua Glória', "You're Gonna Cry", 'Oh Eesa - Club Mix', "L'œuf", 'leave me in the dark', 'One Day Star', 'アダムの林檎 (LPバージョン)', 'Two Zips']


- After glimpsing the first 20 songs, it is clear that some cleaning is neccessary.
1. The first song is '', and should be removed
2. There are multiple ocurrences of 'no song found', which should be removed

### Cleaning rand_samp

In [20]:
# Removing instances of '' and 'no song found'
rm = []

for song in rand_samp:
    if song == '' or song == 'no song found':
        rand_samp.remove(song)
        rm.append(song)

## Spotify API integration
- In the code below, requests are pulled from the Spotify API to get variables about each song's informational and audio feature variables

###  Get Audio Features Function
- Beforehand, the get_audio_features function, below, must be created.

In [39]:
def get_audio_features(track_id, access_token):
    
    features_url = f"https://api.spotify.com/v1/audio-features/{track_id}"
    features_headers = {
        "Authorization": f"Bearer {access_token}"
    }

    # Request audio featuers
    response = requests.get(features_url, headers=features_headers)
    
    # If request fails, retry after 
    if response.status_code == 429:  # Too Many Requests                                                           
        retry_after = int(response.headers.get("Retry-After", 1))
        print(f"Rate limited. Retrying after {retry_after} seconds.")
        time.sleep(retry_after)
        return get_audio_features(track_id, access_token)  # Recursive call after waiting                          

    return response.json()

### Make Spotify requests for all songs in rand_samp

In [65]:
from config import SpotifyCredentials
import requests
import pandas as pd
import requests
import time


# POST                                                                                                        
auth_response = requests.post(SpotifyCredentials.AUTH_URL, {
    'grant_type': 'client_credentials',
    'client_id': SpotifyCredentials.CLIENT_ID,
    'client_secret': SpotifyCredentials.CLIENT_SECRET,
})

# Convert the response to JSON                                                                                 
auth_response_data = auth_response.json()


# Save the access token                                                                                        
access_token = auth_response_data['access_token']

# Save the headers
headers = {
    'Authorization': 'Bearer {token}'.format(token=access_token)
}

# Establish the data
data = []

# Iterate through the songs and request variables
for song in rand_samp:

    # Get the serach parameters
    search_params = {
        "q": song,
        "type": "track",
        "limit": 1 
    }
    
    # Make the request
    r = requests.get(SpotifyCredentials.SEARCH_API, headers=headers, params=search_params)
    search_results = r.json()
    
    # If search results appear
    if search_results['tracks']['items']:
        
        # Get the info info related variables
        track = search_results['tracks']['items'][0]
        track_id = track['id']
        track_name = track['name']
        artist_name = track['artists'][0]['name']
        album_name = track['album']['name']
        popularity = track['popularity']
        
        # Get audio feature variables
        resp = get_audio_features(track_id, access_token)
        danceability = resp['danceability']
        energy = resp['energy']
        key = resp['key']
        loudness = resp['loudness']
        mode = resp['mode']
        speechiness = resp['speechiness']
        acousticness = resp['acousticness']
        instrumentalness = resp['instrumentalness']
        liveness = resp['liveness']
        valence = resp['valence']
        tempo = resp['tempo']
        duration_ms = resp['duration_ms']
        time_signature = resp['time_signature']

        # Append the variables to the data
        data.append([track_name,
                    artist_name,
                    album_name,
                    popularity,
                    danceability,
                    energy,
                    key,
                    loudness,
                    mode,
                    speechiness,
                    acousticness,
                    instrumentalness,
                    liveness,
                    valence,
                    tempo,
                    duration_ms,
                    time_signature])

## Creating and Cleaning the Main DataFrame

In [66]:
import pandas as pd

# Create headers for the data
data_columns = ['Track Name', 
                'Artist Name', 
                'Album Name', 
                'Popularity', 
                'Danceability', 
                'Energy', 
                'Key', 
                'Loudness',
                'Mode',
                'Speechiness',
                'Acousticness',
                'Insturmentalness',
                'Liveness',
                'Valence',
                'Tempo',
                'Duration MS',
                'Time Signature'
               ]

# Create the dataframe
df = pd.DataFrame(data, columns=data_columns)

# Glimpse the Head of the DataFrame
display(df)
print(f"Number of rows in total dataset: {len(df)}")

Unnamed: 0,Track Name,Artist Name,Album Name,Popularity,Danceability,Energy,Key,Loudness,Mode,Speechiness,Acousticness,Insturmentalness,Liveness,Valence,Tempo,Duration MS,Time Signature
0,Hawthorn,Ishara,Hawthorn,45,0.209,0.0236,3,-25.440,1,0.0398,0.992000,0.9710,0.1110,0.0862,68.913,153913,4
1,The Seed - Edit,Tony Allen,Tomorrow Comes The Harvest,26,0.791,0.7820,10,-11.954,0,0.0865,0.001290,0.9010,0.0783,0.7230,111.462,240422,4
2,Lady Marian (2003 - Remaster),Clannad,Legend (2003 - Remaster),39,0.265,0.0227,0,-22.602,1,0.0425,0.978000,0.8250,0.0807,0.1430,113.776,201867,3
3,Empires Lost to Time,Dark Tranquillity,Moment,12,0.455,0.9890,7,-6.000,1,0.0682,0.000235,0.0534,0.0849,0.3430,94.989,249971,4
4,Digane - Raphael Afro Vibe Remix Extended,Bob Sinclar,Digane (Raphael Afro Vibe Remix),32,0.762,0.9310,2,-5.143,1,0.0436,0.013900,0.5180,0.0627,0.3860,123.994,300121,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
883,Prófugos - Remasterizado 2007,Soda Stereo,Signos (Remastered),64,0.551,0.8950,4,-4.267,0,0.0391,0.002680,0.0402,0.2990,0.3590,127.221,317613,4
884,Nymphetamine Fix,Cradle Of Filth,Nymphetamine Special Edition,52,0.462,0.9050,0,-3.825,0,0.0438,0.000660,0.0402,0.0839,0.2530,122.925,302360,3
885,Éclosion,Tony Anderson,Nuit,45,0.515,0.1770,9,-28.931,0,0.0537,0.964000,0.9540,0.0999,0.4960,149.964,295200,4
886,Rise Of The Antichrist,Bewitched,Rise Of The Antichrist,13,0.252,0.9310,9,-5.644,1,0.1080,0.000834,0.1620,0.2800,0.5320,95.016,225760,4


Number of rows in total dataset: 888


- The songs were pulled succesfully! Out of the 1000 songs, 888 were registered not to be '' or 'no song found'
- The variable is saved below as csv and stored in Juypter

In [67]:
# Store variable
%store df

# Save to csv
df.to_csv('./data/song_data')

Stored 'df' (DataFrame)
