# Creating our own variables

**We decided the features we read in were good, but we wanted more targeted features that we could use to help predict genre.**

**In the code below, we take the API keys that were offered in the original data frame and use them to gather more data from the Spotify API.**

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import requests
import pickle
pd.options.display.max_rows = 10
import time

**This is the data frame that contains the API keys per song.**

In [3]:
info_df = pd.read_csv("https://raw.githubusercontent.com/RosebudAnwuri/TheArtandScienceofData/master/The%20Making%20of%20Great%20Music/data/music_df.csv")
info_df

Unnamed: 0,lyrics,num_syllables,pos,year,fog_index,flesch_index,num_words,num_lines,title,f_k_grade,...,tempo,duration_ms,time_signature,uri,analysis_url,artist_with_features,year_bin,image,cluster,Gender
0,"Mona Lisa, Mona Lisa, men have named you\nYou'...",189.0,0.199,1950,5.2,88.74,145,17,Mona Lisa,2.9,...,86.198,207573.0,3,spotify:track:3k5ycyXX5qsCjLd7R2vphp,https://api.spotify.com/v1/audio-analysis/3k5y...,,50s,https://i.scdn.co/image/a4c0918f13b67aa8d9f4ea...,String Lover,male
1,I wanna be Loved\nBy Andrews Sisters\n\nOooo-o...,270.9,0.224,1950,4.4,82.31,189,31,I Wanna Be Loved,3.3,...,170.869,198027.0,5,spotify:track:4UY81WrDU3jTROGaKuz4uZ,https://api.spotify.com/v1/audio-analysis/4UY8...,Gordon Jenkins,50s,https://i.scdn.co/image/42e4dc3ab9b190056a1ca1...,String Lover,Group
2,I was dancing with my darling to the Tennessee...,174.6,0.351,1950,5.2,88.74,138,16,Tennessee Waltz,2.9,...,86.335,182733.0,3,spotify:track:6DKt9vMnMN0HmlnK3EAHRQ,https://api.spotify.com/v1/audio-analysis/6DKt...,,50s,https://i.scdn.co/image/353b05113b1a140d64d83d...,String Lover,female
3,Each time I hold someone new\nMy arms grow col...,135.9,0.231,1950,4.4,99.23,117,18,I'll Never Be Free,0.9,...,82.184,158000.0,3,spotify:track:0KnD456yC5JuweN932Ems3,https://api.spotify.com/v1/audio-analysis/0KnD...,Kay Starr,50s,https://i.scdn.co/image/4bd427bb9181914d0fa448...,String Lover,male
4,"Unfortunately, we are not licensed to display ...",46.8,0.079,1950,6.0,69.79,32,3,All My Love,6.0,...,123.314,190933.0,4,spotify:track:05sXHTLqIpwywbpui1JT4o,https://api.spotify.com/v1/audio-analysis/05sX...,,50s,https://i.scdn.co/image/353b05113b1a140d64d83d...,String Lover,female
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4023,(I guess right now you've got the last laugh)\...,570.6,0.120,2015,5.6,96.18,469,51,Here,2.1,...,123.909,199453.0,4,spotify:track:664gdARxaClFsoF5SXKOws,https://api.spotify.com/v1/audio-analysis/664g...,,10s,https://i.scdn.co/image/46677af3f55e432727e7bf...,Poetic,female
4024,My face above the water\nMy feet can't touch t...,169.2,0.184,2015,4.0,91.78,126,22,Waves,1.7,...,119.993,208133.0,4,spotify:track:5Sf3GyLEAzJXxZ5mbCPXTu,https://api.spotify.com/v1/audio-analysis/5Sf3...,Robin Schulz,10s,https://i.scdn.co/image/261cf047c334ad684d0c8e...,String Lover,male
4025,You know from the moment she turned around\nSh...,217.8,0.067,2015,4.8,106.67,193,26,She Knows,0.1,...,139.988,214726.0,4,spotify:track:0XETcdHr7EkjfoZFSj6Asv,https://api.spotify.com/v1/audio-analysis/0XET...,Juicy J,10s,https://i.scdn.co/image/d48ffaffc20a3c76f79d81...,Poetic,male
4026,Going out tonight\nChanges into something red\...,399.6,0.026,2015,4.0,83.32,296,52,Night Changes,2.9,...,120.001,226600.0,4,spotify:track:5O2P9iiztwhomNh8xkR9lJ,https://api.spotify.com/v1/audio-analysis/5O2P...,,10s,https://i.scdn.co/image/5bb443424a1ad71603c43d...,String Lover,Group


In [5]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import sys

client_credentials_manager = SpotifyClientCredentials(client_id="1c34fcb7752e4114b252a73ab061bdb8", client_secret="afdd85ae9ca04482b2c5650bade56177")
spotify = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
from collections import OrderedDict


In [6]:
import requests
import json

**Here is where we gathered the number of segments and the number of sections per song.**

**A segment is defined as a portion of a song that contains roughly consistent sound throughout its duration.**

**A sections corresponds to the different parts of a song such as chorus, verse, bridge, solo, etc.**

In [9]:
rows = []

oldIds = info_df["id"]
for identifier in oldIds:
    try:
        features = spotify.audio_analysis(identifier)
        segments_len = len(features["segments"])
        features_len = len(features["sections"])
        new_row = ({"num_segments": segments_len, "num_sections": features_len})
        rows.append(new_row)
    except:
        new_row = ({"num_segments": 0, "num_sections": 0})
        rows.append(new_row) 
    time.sleep(0.1)
df_lens = pd.DataFrame(rows)

In [10]:
pickle.dump(df_lens, open("lengths_df.pkl", "wb"))
df_lens

Unnamed: 0,num_sections,num_segments
0,6,598
1,9,551
2,9,524
3,7,412
4,12,702
...,...,...
4023,10,790
4024,13,836
4025,8,768
4026,12,711


**Within each segment we gathered interesting information such as `loudness_start_avg` (dB), `loudness_max_time_avg` (dB),`loudness_max_avg` (dB), `loudness_max_avg` (dB) and `duration_avg`.**

In [11]:
from pandas.io.json import json_normalize
audio_analysis_list = []
oldIds = info_df["id"]
for identifier in oldIds:
    try:
        features = spotify.audio_analysis(identifier)
    
        segment_df = json_normalize(features["segments"])
        loudness_start_avg = segment_df["loudness_start"].mean()
        loudness_max_time_avg = segment_df["loudness_max_time"].mean()
        loudness_max_avg = segment_df["loudness_max"].mean()
        loudness_end_avg = segment_df["loudness_end"].mean()
        duration_avg = segment_df["duration"].mean()

        new_row = ({"loudness_start_avg": loudness_start_avg,
                    "loudness_max_time_avg": loudness_max_time_avg,
                    "loudness_max_avg": loudness_max_avg,
                    "c": loudness_end_avg,
                    "duration_avg": duration_avg
                    })
        audio_analysis_list.append(new_row)
    except:
        new_row = ({"loudness_start_avg": 0,
                    "loudness_max_time_avg": 0,
                    "loudness_max_avg": 0,
                    "loudness_end_avg": 0,
                    "duration_avg": 0
                    })
        audio_analysis_list.append(new_row)
    time.sleep(0.1)

        
df_audio_analysis = pd.DataFrame(audio_analysis_list)

In [12]:
df_audio_analysis


Unnamed: 0,duration_avg,loudness_end_avg,loudness_max_avg,loudness_max_time_avg,loudness_start_avg
0,0.347113,-60.000,-17.186855,0.083278,-22.742452
1,0.359395,-60.000,-17.411563,0.099854,-26.226034
2,0.348728,-59.680,-17.758643,0.076399,-24.168794
3,0.383495,-60.000,-14.676978,0.084908,-25.330629
4,0.271985,-60.000,-13.451605,0.076851,-23.094671
...,...,...,...,...,...
4023,0.252473,-60.000,-4.439994,0.072672,-10.863947
4024,0.248963,-60.000,-10.917855,0.047444,-21.661817
4025,0.279591,-42.228,-5.696760,0.064101,-14.621098
4026,0.318706,-60.000,-10.272871,0.072335,-17.766540


In [13]:
pickle.dump(df_audio_analysis, open("audio_analysis_df.pkl", "wb"))

**Finally we calculated the number of key changes in each song.**

In [14]:
from pandas.io.json import json_normalize
key_analysis_list = []
oldIds = info_df["id"]
for identifier in oldIds:
    try:
        features = spotify.audio_analysis(identifier)
        
        segment_df = json_normalize(features["sections"])
        keyCount = 0
        
        key_list = list(segment_df["key"])
        
        tempKey = key_list[0]
        for key in key_list:
            if key != tempKey:
                tempKey = key
                keyCount += 1
    
        new_row = ({"key_changes": keyCount,
                    })
        key_analysis_list.append(new_row)
    except:
        new_row = ({"key_changes": 0,
                    })
        key_analysis_list.append(new_row)
        
    time.sleep(0.1)

df_key_analysis = pd.DataFrame(key_analysis_list)

In [15]:
df_key_analysis

Unnamed: 0,key_changes
0,1
1,8
2,4
3,2
4,5
...,...
4023,3
4024,11
4025,7
4026,5


In [16]:
pickle.dump(df_key_analysis, open("key_analysis_df.pkl", "wb"))