## Spotipy installation and testing guidelines:

Quick setup:

https://medium.com/@maxtingle/getting-started-with-spotifys-api-spotipy-197c3dc6353b

Overall spotipy documentation:

https://spotipy.readthedocs.io/en/2.13.0/#

1. Install spotipy:

In [None]:
!pip install spotipy

2. Import client ID:

Steps to be followed to create project here: https://developer.spotify.com/dashboard/login

Create spotify account and sign up for developer account and create an application

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
cid = '49789426e6c04428a2befbd3c2d79b02'
secret = 'd18032252a0d4634aa8ca21356894e79'
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

3. Retreive sample data:

Spotify methods to access Artist, Track, and Album data endpoints all require you to know their individual Spotify ID. I started with the search endpoint, exemplified below, because it does not require a Spotify ID.
The following code collects 1,000 Track IDs and their associated track name, artist name, and popularity score.


In [2]:
artist_name = []
track_name = []
popularity = []
track_id = []
for i in range(0,1000,50):
    track_results = sp.search(q='year:2018', type='track', limit=50,offset=i)
    for i, t in enumerate(track_results['tracks']['items']):
        artist_name.append(t['artists'][0]['name'])
        track_name.append(t['name'])
        track_id.append(t['id'])
        popularity.append(t['popularity'])

4. Load data in for analysis

Requires pandas installation

In [None]:
!pip install pandas

In [3]:
import pandas as pd

track_dataframe = pd.DataFrame({'artist_name' : artist_name, 'track_name' : track_name, 'track_id' : track_id, 'popularity' : popularity})
print(track_dataframe.shape)
track_dataframe.head()

(1000, 4)


Unnamed: 0,artist_name,track_name,track_id,popularity
0,Morgan Wallen,Whiskey Glasses,6foY66mWZN0pSRjZ408c00,79
1,Morgan Wallen,Chasin' You,5MwynWK9s4hlyKHqhkNn4A,79
2,Juice WRLD,All Girls Are The Same,4VXIryQMWpIdGgYR4TrjT1,84
3,Lil Baby,Drip Too Hard (Lil Baby & Gunna),78QR3Wp35dqAhFEc2qAGjE,82
4,The Weeknd,I Was Never There,1cKHdTo9u0ZymJdPGSh6nq,88


Additional requirements could include getting long term authorization code in the case that this is a deployed application:

- https://developer.spotify.com/documentation/general/guides/authorization/
- https://developer.spotify.com/documentation/general/guides/authorization/code-flow/

***Note that this is more for an application pipeline. For the purposes of collecting labels, this may not be necessary***

Getting audio features API: https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-features

In [None]:
import requests

# Step 1: Obtain an access token
auth_url = 'https://accounts.spotify.com/api/token'

auth_response = requests.post(
    auth_url,
    data={
        'grant_type': 'client_credentials',
        'client_id': cid,
        'client_secret': secret
    }
)

auth_response_data = auth_response.json()
access_token = auth_response_data['access_token']

print(access_token)

headers = {
    'Authorization': f'Bearer {access_token}'
}

# Step 2: Get the user's ID
user_id = 'by6gio94n7zv678vo8bvip3nf'

# Step 3: Fetch the user's playlists
playlists_url = f'https://api.spotify.com/v1/users/{user_id}/playlists'
playlists_response = requests.get(playlists_url, headers=headers)
playlists_data = playlists_response.json()

print(playlists_data.keys())

# Step 4: Iterate through each playlist and fetch the songs
for playlist in playlists_data['items']:
    playlist_id = playlist['id']
    playlist_name = playlist['name']
    print(f'Playlist: {playlist_name}')
    
    tracks_url = f'https://api.spotify.com/v1/playlists/{playlist_id}/tracks'
    tracks_response = requests.get(tracks_url, headers=headers)
    tracks_data = tracks_response.json()

    for track in tracks_data['items']:
        track_name = track['track']['name']
        track_artist = track['track']['artists'][0]['name']
        print(f'\t{track_name} - {track_artist}')

#### Label collection for sample user

Create individual dataframes for each of the desired features that we want to predict. Also collect dataframe for all tracks involved. Print total number of collected labels. We will make sure we can request the desired tracks before adding them on to our final csv file. We can do this process for multiple users and this will be our entire data collection process. If we can grab audio, we can grab tempo and such using librosa. 

In [6]:
import requests

# Step 1: Obtain an access token
auth_url = 'https://accounts.spotify.com/api/token'

auth_response = requests.post(
    auth_url,
    data={
        'grant_type': 'client_credentials',
        'client_id': cid,
        'client_secret': secret
    }
)

auth_response_data = auth_response.json()
access_token = auth_response_data['access_token']

print(access_token)

headers = {
    'Authorization': f'Bearer {access_token}'
}

# Step 2: Get the user's ID
user_id = 'by6gio94n7zv678vo8bvip3nf'

# Step 3: Fetch the user's playlists
playlists_url = f'https://api.spotify.com/v1/users/{user_id}/playlists'
playlists_response = requests.get(playlists_url, headers=headers)
playlists_data = playlists_response.json()

print(playlists_data.keys())

# Set a threshold for desired features
danceability_threshold = 0.85
non_danceability_threshold = 0.30

instrumentalness_threshold = 0.75
non_instrumentalness_threshold = 0.15

speechiness_threshold = 0.60
non_speechiness_threshold = 0.10

acousticness_threshold = 0.70
non_acousticness_threshold = 0.10

energy_threshold = 0.60
non_energy_threshold = 0.30


# Step 4: Iterate through each playlist and fetch the songs
total_danceable = 0
total_non_danceable = 0

total_instrumental = 0
total_non_instrumental = 0

total_speechy = 0
total_non_speechy = 0

total_acoustic = 0
total_non_acoustic = 0

total_energetic = 0
total_non_energetic = 0

total = 0

# Create an empty DataFrame with the appropriate columns
danceable_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'danceability', 'preview_url'])
non_danceable_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'danceability', 'preview_url'])

instrumental_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'instrumentalness', 'preview_url'])
non_instrumental_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'instrumentalness', 'preview_url'])

speechy_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'speechiness', 'preview_url'])
non_speechy_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'speechiness', 'preview_url'])

acoustic_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'acousticness', 'preview_url'])
non_acoustic_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'acousticness', 'preview_url'])

energetic_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'energy', 'preview_url'])
non_energetic_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'energy', 'preview_url'])

all_tracks_df = pd.DataFrame(columns=['track_id', 'track_name', 'track_artist', 'danceability', 'instrumentalness', 'speechiness', 'acousticness', 'energy', 'valence', 'tempo', 'loudness', 'liveness', 'key', 'mode', 'time_signature', 'preview_url'])


k = 0  
for playlist in playlists_data['items']:
    if k < 200:
        playlist_id = playlist['id']
        playlist_name = playlist['name']
        print(f'Playlist: {playlist_name}')
        
        tracks_url = f'https://api.spotify.com/v1/playlists/{playlist_id}/tracks'
        tracks_response = requests.get(tracks_url, headers=headers)
        tracks_data = tracks_response.json()

        for track in tracks_data['items']:
            k += 1
            track_id = track['track']['id']
            track_name = track['track']['name']
            track_artist = track['track']['artists'][0]['name']
            preview_url = track['track']['preview_url']
            
            if preview_url and track_id not in all_tracks_df['track_id'].values:
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()
                
                # add to all 
                track_df = pd.DataFrame({
                    'track_id': [track_id],
                    'track_name': [track_name],
                    'track_artist': [track_artist],
                    'danceability': [audio_features_data['danceability']],
                    'instrumentalness': [audio_features_data['instrumentalness']],
                    'speechiness': [audio_features_data['speechiness']],
                    'acousticness': [audio_features_data['acousticness']],
                    'energy': [audio_features_data['energy']],
                    'valence': [audio_features_data['valence']],
                    'tempo': [audio_features_data['tempo']],
                    'loudness': [audio_features_data['loudness']],
                    'liveness': [audio_features_data['liveness']],
                    'key': [audio_features_data['key']],
                    'mode': [audio_features_data['mode']],
                    'time_signature': [audio_features_data['time_signature']],
                    'preview_url' : [preview_url]
                })
                all_tracks_df = pd.concat([all_tracks_df, track_df], ignore_index=True)
                total = total + 1
            
            
            # Check if the track is already in the DataFrame
            if preview_url and track_id not in danceable_tracks_df['track_id'].values:
                
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()

                # Check if the track's danceability is greater than the threshold
                if audio_features_data['danceability'] > danceability_threshold:
                    print(f'\t{track_name} - {track_artist} (Danceability: {audio_features_data["danceability"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'danceability': [audio_features_data['danceability']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    danceable_tracks_df = pd.concat([danceable_tracks_df, track_df], ignore_index=True)
                    
                    total_danceable = total_danceable + 1
                    
            # Check if the track is already in the DataFrame
            if preview_url and track_id not in non_danceable_tracks_df['track_id'].values:
            
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()

                # Check if the track's danceability is less than the threshold
                if audio_features_data['danceability'] < non_danceability_threshold:
                    print(f'\t{track_name} - {track_artist} (Danceability: {audio_features_data["danceability"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'danceability': [audio_features_data['danceability']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    non_danceable_tracks_df = pd.concat([danceable_tracks_df, track_df], ignore_index=True)
                    
                    total_non_danceable = total_non_danceable + 1
                    
            # Check if the track is already in the DataFrame
            if preview_url and track_id not in instrumental_tracks_df['track_id'].values:
            
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()

                # Check if the track's instrumentalness is greater than the threshold
                if audio_features_data['instrumentalness'] > instrumentalness_threshold:
                    print(f'\t{track_name} - {track_artist} (Instrumentalness: {audio_features_data["instrumentalness"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'instrumentalness': [audio_features_data['instrumentalness']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    instrumental_tracks_df = pd.concat([instrumental_tracks_df, track_df], ignore_index=True)
                    
                    total_instrumental = total_instrumental + 1
                    
            # Check if the track is already in the DataFrame
            if preview_url and track_id not in non_instrumental_tracks_df['track_id'].values:
            
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()

                # Check if the track's instrumentalness is less than the threshold
                if audio_features_data['instrumentalness'] < non_instrumentalness_threshold:
                    print(f'\t{track_name} - {track_artist} (Instrumentalness: {audio_features_data["instrumentalness"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'instrumentalness': [audio_features_data['instrumentalness']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    non_instrumental_tracks_df = pd.concat([instrumental_tracks_df, track_df], ignore_index=True)
                    
                    total_non_instrumental = total_non_instrumental + 1
                    
            # Check if the track is already in the DataFrame
            if preview_url and track_id not in speechy_tracks_df['track_id'].values:
            
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()

                # Check if the track's speechiness is greater than the threshold
                if audio_features_data['speechiness'] > speechiness_threshold:
                    print(f'\t{track_name} - {track_artist} (Speechiness: {audio_features_data["speechiness"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'speechiness': [audio_features_data['speechiness']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    speechy_tracks_df = pd.concat([speechy_tracks_df, track_df], ignore_index=True)
                    
                    total_speechy = total_speechy + 1
                    
            if preview_url and track_id not in non_speechy_tracks_df['track_id'].values:
                
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()
                
                # Check if the track's speechiness is less than the threshold
                if audio_features_data['speechiness'] < non_speechiness_threshold:
                    print(f'\t{track_name} - {track_artist} (Speechiness: {audio_features_data["speechiness"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'speechiness': [audio_features_data['speechiness']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    non_speechy_tracks_df = pd.concat([non_speechy_tracks_df, track_df], ignore_index=True)
                    
                    total_non_speechy = total_non_speechy + 1
                    
            if preview_url and track_id not in acoustic_tracks_df['track_id'].values:
                
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()
                
                # Check if the track's acousticness is greater than the threshold
                if audio_features_data['acousticness'] > acousticness_threshold:
                    print(f'\t{track_name} - {track_artist} (Acousticness: {audio_features_data["acousticness"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'acousticness': [audio_features_data['acousticness']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    acoustic_tracks_df = pd.concat([acoustic_tracks_df, track_df], ignore_index=True)
                    
                    total_acoustic = total_acoustic + 1
                    
            if preview_url and track_id not in non_acoustic_tracks_df['track_id'].values:
                
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()
                
                # Check if the track's acousticness is less than the threshold
                if audio_features_data['acousticness'] < non_acousticness_threshold:
                    print(f'\t{track_name} - {track_artist} (Acousticness: {audio_features_data["acousticness"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'acousticness': [audio_features_data['acousticness']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    non_acoustic_tracks_df = pd.concat([non_acoustic_tracks_df, track_df], ignore_index=True)
                    
                    total_non_acoustic = total_non_acoustic + 1
                    
            if preview_url and track_id not in energetic_tracks_df['track_id'].values:
                
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()
                
                # Check if the track's energy is greater than the threshold
                if audio_features_data['energy'] > energy_threshold:
                    print(f'\t{track_name} - {track_artist} (Energy: {audio_features_data["energy"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'energy': [audio_features_data['energy']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    energetic_tracks_df = pd.concat([energetic_tracks_df, track_df], ignore_index=True)
                    
                    total_energetic = total_energetic + 1
                    
                    
            if preview_url and track_id not in non_energetic_tracks_df['track_id'].values:
                
                # Fetch track's audio features
                audio_features_url = f'https://api.spotify.com/v1/audio-features/{track_id}'
                audio_features_response = requests.get(audio_features_url, headers=headers)
                audio_features_data = audio_features_response.json()
                
                # Check if the track's energy is less than the threshold
                if audio_features_data['energy'] < non_energy_threshold:
                    print(f'\t{track_name} - {track_artist} (Energy: {audio_features_data["energy"]})')
                    # Append the new track to the DataFrame
                    track_df = pd.DataFrame({
                        'track_id': [track_id],
                        'track_name': [track_name],
                        'track_artist': [track_artist],
                        'energy': [audio_features_data['energy']],
                        'preview_url' : [preview_url]
                    })
                    
                    # Concatenate the new DataFrame with the existing DataFrame
                    non_energetic_tracks_df = pd.concat([non_energetic_tracks_df, track_df], ignore_index=True)
                    
                    total_non_energetic = total_non_energetic + 1
                    
dataframes = {
    'Danceable Tracks': danceable_tracks_df,
    'Non-danceable Tracks': non_danceable_tracks_df,
    'Instrumental Tracks': instrumental_tracks_df,
    'Non-instrumental Tracks': non_instrumental_tracks_df,
    'Speechy Tracks': speechy_tracks_df,
    'Non-speechy Tracks': non_speechy_tracks_df,
    'Acoustic Tracks': acoustic_tracks_df,
    'Non-acoustic Tracks': non_acoustic_tracks_df,
    'Energetic Tracks': energetic_tracks_df,
    'Non-energetic Tracks': non_energetic_tracks_df,
    'All Tracks': all_tracks_df
}
           
print(f'Total tracks: {total}')
print(f'Total danceable tracks: {total_danceable}')
print(f'Total non-danceable tracks: {total_non_danceable}')
print(f'Total speechy tracks: {total_speechy}')
print(f'Total non-speechy tracks: {total_non_speechy}')
print(f'Total acoustic tracks: {total_acoustic}')
print(f'Total non-acoustic tracks: {total_non_acoustic}')
print(f'Total energetic tracks: {total_energetic}')
print(f'Total non-energetic tracks: {total_non_energetic}')
print(f'Total instrumental tracks: {total_instrumental}')
print(f'Total non-instrumental tracks: {total_non_instrumental}')

# # Display the DataFrame with danceable tracks
# print(danceable_tracks_df)

BQBp3n5lhAlRtb5VeCTDC5y3_ONWLp8XsdC5WHrlwV-q942ZdzXZsl6F6PVjgIDmKLAQORioeOujcA3xwwYt3bsILp65gn8vUhxW2R_aMYS5yUhAgWyt
dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])
Playlist: Mastery
	I Got Summer On My Mind - Jay Dunham (Speechiness: 0.0631)
	I Got Summer On My Mind - Jay Dunham (Acousticness: 0.00148)
	I Got Summer On My Mind - Jay Dunham (Energy: 0.835)
	Serve a Boat - Shoreline Mafia (Instrumentalness: 0)
	Serve a Boat - Shoreline Mafia (Acousticness: 0.0312)
	Serve a Boat - Shoreline Mafia (Energy: 0.636)
	Fruit Punch - Jay Faded (Instrumentalness: 0.0218)
	Fruit Punch - Jay Faded (Speechiness: 0.0806)
	Skank N Flex (with Scrufizzer) - Wax Motif (Danceability: 0.969)
	Skank N Flex (with Scrufizzer) - Wax Motif (Instrumentalness: 0.00593)
	Skank N Flex (with Scrufizzer) - Wax Motif (Acousticness: 0.00416)
	Skank N Flex (with Scrufizzer) - Wax Motif (Energy: 0.87)
	Bang Bang - Rita Ora (Instrumentalness: 0.000263)
	Bang Bang - Rita Ora (Speechiness: 0.06

Now convert all the track dataframes to csv files:

In [None]:
def save_dataframes_to_csv(dataframes):
    for df_name, df in dataframes.items():
        filename = df_name.replace(" ", "_").lower() + "_sample.csv"
        df.to_csv(filename, index=False)
        print(f"Saved {df_name} to {filename}")

save_dataframes_to_csv(dataframes)


#### Sanitize the dataframes (No longer needed)

Check if audio data for each song in each dataframe is retrievable 

In [None]:
from requests.exceptions import RequestException

dataframes = {
    'Danceable Tracks': danceable_tracks_df,
    'Non-danceable Tracks': non_danceable_tracks_df,
    'Instrumental Tracks': instrumental_tracks_df,
    'Non-instrumental Tracks': non_instrumental_tracks_df,
    'Speechy Tracks': speechy_tracks_df,
    'Non-speechy Tracks': non_speechy_tracks_df,
    'Acoustic Tracks': acoustic_tracks_df,
    'Non-acoustic Tracks': non_acoustic_tracks_df,
    'Energetic Tracks': energetic_tracks_df,
    'Non-energetic Tracks': non_energetic_tracks_df,
    'All Tracks': all_tracks_df
}

for df_name, df in dataframes.items():
    print(f"{df_name}: {df.shape[0]} rows")

# Search for the songs in the itunes store
for index, track in all_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            all_tracks_df.drop(index, inplace=True)
                   
# do this for all dataframes
for index, track in danceable_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            danceable_tracks_df.drop(index, inplace=True)
            
for index, track in non_danceable_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            non_danceable_tracks_df.drop(index, inplace=True)
            
for index, track in speechy_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            speechy_tracks_df.drop(index, inplace=True)
            
for index, track in non_speechy_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"


        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            non_speechy_tracks_df.drop(index, inplace=True)
                
for index, track in acoustic_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            acoustic_tracks_df.drop(index, inplace=True)
            
for index, track in non_acoustic_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            non_acoustic_tracks_df.drop(index, inplace=True)
            
for index, track in instrumental_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            instrumental_tracks_df.drop(index, inplace=True)
            
for index, track in non_instrumental_tracks_df.iterrows():
    if index:
        # song_title = "Imagine"
        # artist_name = "John Lennon"
        song_title = track['track_name']
        artist_name = track['track_artist']
        track_id = track['track_id']
        
        search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            # Check if the response is valid
            if response.status_code == 200:
                # Check if there is at least one result
                if results['resultCount'] < 0:
                    all_tracks_df.drop(index, inplace=True)
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            non_instrumental_tracks_df.drop(index, inplace=True)
            

for df_name, df in dataframes.items():
    print(f"{df_name}: {df.shape[0]} rows")


We are facing an error for requestin the itunes store more than 20 times per minute. Lets try adding a counter and a timer. Itunes also seem to perform horribly in this sense, so we will aquire clips using the spotipy api

In [None]:
def get_preview_url(song_title, artist_name):
    query = f"{song_title} artist:{artist_name}"
    results = sp.search(query, type="track", limit=1)

    if results["tracks"]["total"] > 0:
        track = results["tracks"]["items"][0]
        return track["preview_url"]
    else:
        return None

In [None]:
import time
from requests.exceptions import RequestException

dataframes = {
    'Danceable Tracks': danceable_tracks_df,
    'Non-danceable Tracks': non_danceable_tracks_df,
    'Instrumental Tracks': instrumental_tracks_df,
    'Non-instrumental Tracks': non_instrumental_tracks_df,
    'Speechy Tracks': speechy_tracks_df,
    'Non-speechy Tracks': non_speechy_tracks_df,
    'Acoustic Tracks': acoustic_tracks_df,
    'Non-acoustic Tracks': non_acoustic_tracks_df,
    'Energetic Tracks': energetic_tracks_df,
    'Non-energetic Tracks': non_energetic_tracks_df,
    'All Tracks': all_tracks_df
}

# Function to handle requests and rate limiting
def process_track_dataframe(df):
    counter = 0
    start_time = time.time()
    
    for index, track in df.iterrows():
        if counter == 20:
            elapsed_time = time.time() - start_time
            if elapsed_time < 30:
                time.sleep(30 - elapsed_time)
            start_time = time.time()
            counter = 0

        song_title = track['track_name']
        artist_name = track['track_artist']
        
        #search_url = f"https://itunes.apple.com/search?term={song_title}+{artist_name}&media=music&entity=song&limit=1"
        search_url = get_preview_url(song_title, artist_name)

        try:
            response = requests.get(search_url)
            response.raise_for_status()
            results = response.json()

            if response.status_code == 200:
                if results['resultCount'] < 0:
                    df.drop(index, inplace=True)
            counter += 1
        except (RequestException, ValueError):
            print(f"Error occurred while processing {song_title} by {artist_name}. Skipping...")
            df.drop(index, inplace=True)
            counter += 1

# Process all DataFrames
for df_name, df in dataframes.items():
    print(f"Processing {df_name}...")
    process_track_dataframe(df)
    print(f"{df_name}: {df.shape[0]} rows")


In [None]:
import requests
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

def get_preview_url(song_title, artist_name):
    query = f"{song_title} artist:{artist_name}"
    results = sp.search(query, type="track", limit=1)

    if results["tracks"]["total"] > 0:
        track = results["tracks"]["items"][0]
        return track["preview_url"]
    else:
        return None

song_title = "Imagine"
artist_name = "John Lennon"
preview_url = get_preview_url(song_title, artist_name)

if preview_url:
    print(f"Preview URL for {song_title} by {artist_name}: {preview_url}")
else:
    print(f"No preview available for {song_title} by {artist_name}")
    
response = requests.get(preview_url)

with open("preview.mp3", "wb") as f:
    f.write(response.content)

# Load the audio file and compute the spectrogram
y, sr = librosa.load("preview.mp3")
S = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)

# Plot the spectrogram
plt.figure(figsize=(10, 4))
librosa.display.specshow(S, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.tight_layout()
plt.show()