# Practice: Spotify's API

This notebook contains a set of exercises that will guide you through the different steps of this practice exercise. Devise solutions that are code-based, i.e. not hard-coded or manually computed. 

<div class="alert alert-success">The aim of this exercise is to create and save a dataset containing information about every song in a given playlist by requesting data from Spotify's API.</div>

## Getting client credentials

Spotify's API uses an Authentication scheme called OAuth. Hence, before starting to make requests, you need to get your client credentials to the Spotify API. 

To do so, you need to have a Spotify account (free or paid). If you don't have one yet, please create a free account before moving on. Once you do, head over to Spotify for Developers, open your [Dashboard](https://developer.spotify.com/dashboard/) and log in with your account. 

Click on “CREATE APP”. Fill in the name and description for your project. Under "Redirect URI", enter "http://localhost:8888/callback" (don't worry about this, as it won't be used). Finally, it the tick boxes, select "Web API". 

Once your App has been created, you should see the app name in the Dashboard. Click on it, then on "Settings", and then on "View Client Secret".  The numbers for “Client ID” and “Client Secret” on correspond to your client credentials.

<div class="alert alert-info">Create two new variables, <b>client_id</b> and <b>client_secret</b>, that store your ID and Key, respectively</div>

In [10]:
client_id = "422d134b6fa3479a867c23b9dcdc995a"
client_secret = "ee08e1fe67884eeba72f65ef60e0b6a7"



These are the keys that you will use throughout the remaining of the assignment.



Great! We are good to go. Next step is getting an access token.

## Getting an access token / key

In order to access the various endpoints of the Spotify API, we need to pass an access token. 

To get one, we need to pass a ```POST``` request with our client credentials. This request will create a token resource in the server and respond back with it. We can build this ```POST``` request using ```requests``` library. In other words, the first API call in the spotify API is the call to get the access token. 

<div class="alert alert-info">Run the following cell to built your POST request</div>

In [11]:
pip install requests

Note: you may need to restart the kernel to use updated packages.


In [12]:
import requests

# URL for token resource
auth_url = 'https://accounts.spotify.com/api/token'

# request body
params = {'grant_type': 'client_credentials',
          'client_id': client_id,
          'client_secret': client_secret}

# POST the request
response =  requests.post(auth_url, data=params)
response.json()

{'access_token': 'BQAkSqOkzor-YyRgphfWpSJZ9yuhsH1J1aGT7Tw5NPWqBdn9hHJWGSEDWz1NXVp3IbrnBjtvwg-cmq1fyCz6s8a3iGEvlbP53h4CyRON2V4DWXwZb8M',
 'token_type': 'Bearer',
 'expires_in': 3600}

Take a look at the response you obtained. The access_token you just retrieved will expire after one hour.

<div class="alert alert-info">Retrieve your token from <b>auth_response</b> and save it in a new variable called <b>access_token</b>. </div>

<div class="alert alert-warning">Make sure you define the <i>access_token</i> variable such that it will be updated each time your code is run from scratch, i.e. make sure it hasn't expired by the time your code is graded.</div>

In [13]:
auth_response = response.json()
access_token = auth_response['access_token']

print("Access Token:", access_token)

Access Token: BQAkSqOkzor-YyRgphfWpSJZ9yuhsH1J1aGT7Tw5NPWqBdn9hHJWGSEDWz1NXVp3IbrnBjtvwg-cmq1fyCz6s8a3iGEvlbP53h4CyRON2V4DWXwZb8M


As above, the access token will be used throughout the remaining of the assignment.



This token is your golden ticket to access Spotify's API. A copy of this string is now stored in the server, so that everytime you make a request to the API the server will check that the token you provide and the one it has in store match.

<img src="https://www.dropbox.com/s/hgb02k4h1mtdv22/header.png?raw=1" width="500">

Similar to the Cohere API, Spotify's API expects you to include your access token in the requests header. We won't use headers in this course and this is the only exception: when some APIs require to specify the authentication in the header. Hence, the header has already been formatted for you. 

<div class="alert alert-info">Run the following cell to save the header in a new variable so that you can use it later on.</div>

In [14]:
headers = {'Authorization': 'Bearer {token}'.format(token=access_token)}

## Poking around

Spotify's API provides numerous endpoints to access things like album listings, artist information, playlists, even Spotify-generated audio analysis of individual tracks, which include their time signature or measurements such as their “danceability” or "loudness". You can take a look at all the information available by reading the [Docs](https://developer.spotify.com/documentation/web-api/reference/). In this assignment you will use several of these endpoints.

In order to get a feel of how the API works, we will begin by making a ```GET``` request to the ```audio-features``` endpoint to extract data for a specific track. In particular, let's retrieve all the information for Radiohead's *Creep* song. 

The first thing you need is to identify the appropriate URL or path to direct your request to. The urls for all Spotify API endpoints follow the same structure. They all use the base URL for the API and are then defined as a concatenation of ```base_url + endpoint```. Sometimes, you will also need to provide some additional information as part of the URL. In the case of ```audio-features```, however, it is enough with just the ```base_url``` and the ```endpoint``` name.

The ```base_url``` is defined below:

In [15]:
base_url = 'https://api.spotify.com/v1'

<div class="alert alert-info">Define the url for the audio-features endpoint by following the instructions above. Store it in a variable called <b>audio_features_endpoint</b>.</div>

In [16]:
audio_features_endpoint = f"{base_url}/audio-features"

print(audio_features_endpoint)

https://api.spotify.com/v1/audio-features


Next thing we need is to fill in the request body. If you check the documentation you'll see that the ```audio-features``` endpoint takes the following query parameters.

<img src="https://www.dropbox.com/s/s4zs6wlue0u16cu/body.png?raw=1" width="500">

Hence, the final thing you need to extract data about Radiohead's Creep song is to locate its ```id```. This is its unique identifier. Spotify has unique ids for tracks, for artists, for albums, for playlists, etc.

![Creep](https://www.dropbox.com/s/kufj6ww2yn069gb/creep.png?raw=1)

You can get the ```id``` for any song by going to Spotify, looking for the song, clicking the “…” by the song name, then “Share” and then “Copy Spotify URI”. 

<div class="alert alert-warning">Note that this procedure also works for retrieving ids for artists, albums or any other resource type. Alternatively, you can also retrieve the id for a song or a playlist by clicking "..." by the name, then "Share" and the "Copy Link". This will return a full link to the resource. The id corresponds to the string right after the last slash sign / and before the ? sign</div>

This URI should be a string that includes something like **spotify:track:**, followed by an alphanumeric sequence. This sequence is the ID you are looking for.

<div class="alert alert-info">Create a new variable called <b>track_id</b> that stores the ID for Radiohead's song Creep.</div>

In [17]:
track_id = '70LcF31zb1H0PyJoS1Sx1r'

print(f"Track ID: {track_id}")

Track ID: 70LcF31zb1H0PyJoS1Sx1r


Now that you have the id, let's format the URL of the request and send the actual GET request to retrieve the data:

In [18]:
import requests

base_url = "https://api.spotify.com/v1/audio-features"
track_id = "70LcF31zb1H0PyJoS1Sx1r"
request_url = f"{base_url}/{track_id}"
access_token = auth_response['access_token']
headers = {
    "Authorization": f"Bearer {access_token}"
}
response = requests.get(request_url, headers=headers)
print(response.json())

{'danceability': 0.515, 'energy': 0.43, 'key': 7, 'loudness': -9.935, 'mode': 1, 'speechiness': 0.0372, 'acousticness': 0.0097, 'instrumentalness': 0.000133, 'liveness': 0.129, 'valence': 0.104, 'tempo': 91.844, 'type': 'audio_features', 'id': '70LcF31zb1H0PyJoS1Sx1r', 'uri': 'spotify:track:70LcF31zb1H0PyJoS1Sx1r', 'track_href': 'https://api.spotify.com/v1/tracks/70LcF31zb1H0PyJoS1Sx1r', 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/70LcF31zb1H0PyJoS1Sx1r', 'duration_ms': 238640, 'time_signature': 4}


Now that everything is ready, you can send the actual GET request to retrieve the data.

<div class="alert alert-info">Remember to pass the <i>url</i>, the <i>headers</i> and the <i>params</i> dictionary as arguments to the <i>get</i> function. Convert the response to <i>json</i> format and store it in a new variable called <b>creep</b>.</div>

In [19]:
import requests

base_url = 'https://api.spotify.com/v1/audio-features'
track_id = '70LcF31zb1H0PyJoS1Sx1r'
request_url = f"{base_url}/{track_id}"
access_token = auth_response["access_token"]

headers = {
    'Authorization': f'Bearer {access_token}'
}

response = requests.get(request_url, headers=headers)
creep = response.json()

print(creep)

{'danceability': 0.515, 'energy': 0.43, 'key': 7, 'loudness': -9.935, 'mode': 1, 'speechiness': 0.0372, 'acousticness': 0.0097, 'instrumentalness': 0.000133, 'liveness': 0.129, 'valence': 0.104, 'tempo': 91.844, 'type': 'audio_features', 'id': '70LcF31zb1H0PyJoS1Sx1r', 'uri': 'spotify:track:70LcF31zb1H0PyJoS1Sx1r', 'track_href': 'https://api.spotify.com/v1/tracks/70LcF31zb1H0PyJoS1Sx1r', 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/70LcF31zb1H0PyJoS1Sx1r', 'duration_ms': 238640, 'time_signature': 4}



<div class="alert alert-warning">Often Spotify has several instances of a track in its catalogue, each available in a different set of markets. This commonly happens when the track the album is on has been released multiple times under different licenses in different markets. These tracks are linked together so that when a user tries to play a track that isn’t available in their own market, the Spotify mobile, desktop, and web players try to play another instance of the track that is available in the user’s market. Even if this issue won't prevent you from being able to play the  song in your device, it might result in different ids for the different markets. Note that the test below is written taking the values for the Spanish market into account.</div>

<div class="alert alert-warning">Note that you can specify the market for this and for future queries through the <i>market</i> parameter.</div>

Congrats! You just made your first successful request to Spotify's API! Feel free to take a look at the information included in the response. Pay special attention to the way in which information is presented. Once you are done, let's move on to some actual work!

## Getting data from a playlist

In the following exercise you will build a dataset containing data about different songs. You can either use a playlist of your own, or use the one mentioned in the documentation.


<div class="alert alert-info">Create a variable called <b>playlist_id</b> that stores the id of your playlist of choice.</div>

<div class="alert alert-warning">Your <i>playlist_id</i> should contain only alphanumeric characters. This means that characters such as ? $ % & / ! should not be included.</div>

In [20]:
playlist_id = "3MOTj7GejiZWcRj0tubuIz"

In the following you are going to extract information about the different tracks included in this playlist. Hence, let's make sure that the variabl ei spropelry created and that it refers to an actual id.

The next step will be making a request to get full details of the tracks included in your chosen playlist. Remember that you can take a look at all the information available at the different endpoints in Spotify's API by reading the [Docs](https://developer.spotify.com/documentation/web-api/reference/). Locate the right endpoint for your query and read the Docs to find out how to build your request. 

<div class="alert alert-info"><b>Exercise 1 </b>Write the code to retrieve all the items from your chosen playlist from the <i>tracks</i> endpoint. When making your request don't use any of the optional arguments. Direct your request to the right endpoint instead. Store the <i>raw</i> response in a new variable called <i>playlist_response</i>.
</div>

<div class="alert alert-warning">To complete this exercise you must use the requests library</div>

### Explanation for Exercise 1

#### Step 1: Import the necessary library
First, I imported the requests library, which is necessary for making HTTP requests to external APIs like Spotify’s.

#### Step 2: Define playlist ID and build request URL
I set the playlist_id to specify the playlist I want to retrieve, and used the Spotify API’s base URL (https://api.spotify.com/v1/playlists). Then, I combined the base URL, playlist ID, and /tracks endpoint to form the full request URL.

#### Step 3: Set up the authorization header
Since the API request requires authentication, I retrieved an access token from a previous authorization step and added it to the headers using the Bearer authentication method. This allows Spotify to verify the request is from an authorized user.

#### Step 4: Send GET request and retrieve data
I used requests.get() to send a GET request to the Spotify API, passing the request URL and headers. The API returns a JSON response, which I stored in the playlist_response variable and printed to display the raw data of the playlist's tracks.

In [21]:
import requests

playlist_id = "3MOTj7GejiZWcRj0tubuIz"
base_url = "https://api.spotify.com/v1/playlists"
request_url = f"{base_url}/{playlist_id}/tracks"
access_token = auth_response['access_token']
headers = {
    "Authorization": f"Bearer {access_token}"
}
playlist_response = requests.get(request_url, headers=headers)
print(playlist_response.json())

{'href': 'https://api.spotify.com/v1/playlists/3MOTj7GejiZWcRj0tubuIz/tracks?offset=0&limit=100', 'items': [{'added_at': '2020-03-23T14:35:16Z', 'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/hanhen'}, 'href': 'https://api.spotify.com/v1/users/hanhen', 'id': 'hanhen', 'type': 'user', 'uri': 'spotify:user:hanhen'}, 'is_local': False, 'primary_color': None, 'track': {'preview_url': 'https://p.scdn.co/mp3-preview/88c2919932a103ca121ae37f04d086462ee26c22?cid=422d134b6fa3479a867c23b9dcdc995a', 'available_markets': ['AR', 'AU', 'BE', 'BO', 'BR', 'BG', 'CA', 'CL', 'CO', 'CR', 'CY', 'CZ', 'DK', 'DO', 'EC', 'EE', 'SV', 'FI', 'FR', 'GR', 'GT', 'HN', 'HK', 'HU', 'IS', 'IT', 'LV', 'LT', 'LU', 'MY', 'MT', 'MX', 'NL', 'NZ', 'NI', 'NO', 'PA', 'PY', 'PE', 'PH', 'PL', 'PT', 'SG', 'SK', 'ES', 'SE', 'CH', 'TW', 'TR', 'UY', 'AD', 'LI', 'MC', 'ID', 'TH', 'VN', 'RO', 'IL', 'ZA', 'SA', 'AE', 'BH', 'QA', 'OM', 'KW', 'EG', 'MA', 'DZ', 'TN', 'LB', 'JO', 'PS', 'IN', 'BY', 'KZ', 'MD', 'U

<div class="alert alert-info"><b>Exercise 2 </b>Convert the response to JSON format and store the result in a new variable called <i>playlist</i>.</div>

### Explanation for Exercise 2

#### Step 1: Convert response to JSON format
The first step is to convert the response obtained in the previous exercise (playlist_response) into a JSON format. This is done by calling the .json() method on the playlist_response object. This ensures that the data can be handled as a structured JSON format for further analysis.

#### Step 2: Store the JSON response in a variable
Next, I assigned the JSON-formatted response to a new variable called playlist. This variable will now store the parsed JSON data of the playlist tracks.

#### Step 3: Print the result
Finally, I used the print() function to output the content of the playlist variable, allowing us to visually inspect the JSON data. This confirms the successful conversion of the API response into a JSON format.

In [22]:
playlist = "playlist_response.json()"
print(playlist)

playlist_response.json()


The following cells contain the tests that will grade your code. **Don't modify them**. Simply leave them as they are.

Take your time to familiarize yourself with the data and how they are presented. Note that, by default, Spotify's API only returns information about a maximum of 100 tracks in a playlist. If your playlist of choice has more that 100 tracks, you'll retrieve the data only for the first 100 of them.

<div class="alert alert-danger">Throghout the following exercises you may come across data that are missing. If so, encode these data using a <b>None</b> (Nonetype), unless otherwise stated. </div>

## Retrieving basic track information

In what follows, you are going to retrieve specific data for each of the tracks contained in your playlist.

<div class="alert alert-info"><b>Exercise 3 </b>Write the code to extract the title (in string form), the name of the album (in string form), the name of the artist (in string form), the duration (in integer form), the track number (in integer form), the popularity (in integer form), and the id (in string form) for all the tracks included in your chosen playlist. Store these data in separate lists called <i>title</i>, <i>album</i>, <i>artist</i>, <i>duration</i>, <i>track_number</i>, <i>track_popularity</i>, and <i>track_id</i>, respectively. In those cases where more than one value is available for these items, retain only the first. In all cases, entries should appear in the same order as they are presented in the playlist.<br></div>

<div class="alert alert-warning">Make sure you correctly set all variable names. They have to be written <b>exactly</b> as given in the instructions.</div>

### Explanation for Exercise 3

#### Step 1: Set up the API call and define empty lists
I began by defining the necessary components to make the API call, including the playlist ID, base URL, and authorization headers. After making the request and parsing the JSON response, I created empty lists to store specific data such as title, album, artist, duration, track_number, track_popularity, and track_id.

#### Step 2: Loop through playlist items and extract data
I then looped through each item in the playlist's JSON response. For each track, I extracted relevant data like the track’s name, album, artist, duration, track number, popularity, and ID. Since some fields, like the artist, may have multiple entries, I used the first value ([0]) when needed to ensure consistent extraction.

#### Step 3: Store and print the extracted data
The extracted data was appended to the respective lists, ensuring that each track’s data was stored in the same order as they appeared in the playlist. Lastly, I printed each list to verify that the data had been correctly retrieved and stored in the desired format.

In [23]:
import requests

playlist_id = "3MOTj7GejiZWcRj0tubuIz"
base_url = "https://api.spotify.com/v1"
access_token = auth_response['access_token']

headers = {
    "Authorization": f"Bearer {access_token}"
}

playlist_tracks_url = f"{base_url}/playlists/{playlist_id}/tracks"
playlist_response = requests.get(playlist_tracks_url, headers=headers)
playlist = playlist_response.json()

title = []
album = []
artist = []
duration = []
track_number = []
track_popularity = []
track_id = []

for item in playlist['items']:
    track = item['track']
    
    title.append(track['name'])
    album.append(track['album']['name'])
    artist.append(track['artists'][0]['name'])  
    duration.append(track['duration_ms'])
    track_number.append(track['track_number'])
    track_popularity.append(track['popularity'])
    track_id.append(track['id'])

print(f"Titles: {title}")
print(f"Albums: {album}")
print(f"Artists: {artist}")
print(f"Durations (ms): {duration}")
print(f"Track Numbers: {track_number}")
print(f"Track Popularity: {track_popularity}")
print(f"Track IDs: {track_id}")

Titles: ['I Am The Resurrection', 'Love Spreads', 'Breaking Into Heaven', 'Fools Gold - Remastered 2009', 'I Wanna Be Adored', 'Tears', 'Driving South', 'Waterfall - Remastered 2009', 'Sally Cinnamon - Single Mix', 'Made of Stone', 'Daybreak', 'Ten Storey Love Song', 'This Is the One']
Albums: ['Stone Roses', 'Second Coming', 'Second Coming', 'The Stone Roses', 'Stone Roses', 'Second Coming', 'Second Coming', 'The Stone Roses', 'Sally Cinnamon', 'Stone Roses', 'Second Coming', 'Second Coming', 'Stone Roses']
Artists: ['The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses', 'The Stone Roses']
Durations (ms): [494133, 347093, 679933, 594453, 292040, 410400, 309866, 278826, 173000, 254493, 393640, 269866, 299106]
Track Numbers: [11, 12, 1, 12, 1, 10, 2, 3, 1, 8, 4, 3, 10]
Track Popularity: [32, 55, 38, 56, 35, 34, 41, 5

<div class="alert alert-info"><b>Exercise 4 </b>Write the code to extract the number of available markets (in integer form) for all the tracks included in your chosen playlist. Store these data in a separate list called <i>n_available_markets</i>. As above, entries in this list should appear in the same order as they are presented in the playlist.</div>

### Explanation for Exercise 4

#### Step 1: Check the API response status and initialize an empty list
I started by verifying that the API request was successful by checking if the status_code is 200. Once confirmed, I parsed the response into JSON format and initialized an empty list called n_available_markets, which will store the number of available markets for each track.

#### Step 2: Loop through the playlist items and extract available markets
I looped through the playlist's items, and for each track, I calculated the number of available markets using the len() function on the available_markets field. This gives the total number of markets where the track is available.

#### Step 3: Append and print the data
For each track, I appended the count of available markets to the n_available_markets list. Finally, I printed the list to confirm that it captured the number of available markets for all tracks in the same order as they appeared in the playlist. If the API call failed, an error message with the status code was printed.

In [24]:
if playlist_response.status_code == 200:
    playlist = playlist_response.json()

    n_available_markets = []

    for item in playlist['items']:
        track = item['track']
        
        n_markets = len(track['available_markets'])
        n_available_markets.append(n_markets)

    print(n_available_markets)
else:
    print(f"Failed to retrieve playlist tracks. Status code: {playlist_response.status_code}")
    print(playlist_response.text)

[178, 184, 184, 185, 178, 184, 184, 185, 185, 178, 184, 184, 178]


Each track in the playlist is associated to an album. Let's retrieve the corresponding release date.

<div class="alert alert-info"><b>Exercise 5 </b>Write the code to extract the release year (in int form) and month (in string form) for all the tracks included in your chosen playlist. Store these data in separate lists called <i>release_year</i> and <i>release_month</i>, respectively. Entries should appear in these lists in the same order as they are presented in the playlist.</div>

<div class="alert alert-warning">Make sure you correctly set the variable names. They have to be written <b>exactly</b> as given in the instructions.</div>

<div class="alert alert-warning">Months should be stored with their full names and capitalized: use <i>September</i> for 09, <i>January</i> for 01, etc.</div>

### Explanation for Exercise 5

#### Step 1: Import datetime and initialize lists
I imported the datetime module to help format the month into a full name later on. Then, I initialized two empty lists: release_year to store the release year as an integer, and release_month to store the release month as a full string (e.g., "January").

#### Step 2: Loop through the playlist items and extract release date
I looped through the playlist's items to access the release_date for each track, which is in the format YYYY-MM-DD. I split the date string using the - delimiter to separate the year and month.

#### Step 3: Append the release year and month
For each track:

- I extracted the year from the first part of the split date string and converted it into an integer before appending it to release_year.
- I checked if the month exists (some tracks may not have full dates). If a month is available, I used the datetime library to convert the month from numeric format to its full name and appended it to release_month. If the month is unavailable, I appended "Unknown" to indicate that no month information was present.

#### Step 4: Print the results
Finally, I printed both release_year and release_month lists to ensure the correct data was extracted and stored in the same order as the tracks in the playlist.

In [25]:
from datetime import datetime

release_year = []
release_month = []

for item in playlist['items']:
    album_release_date = item['track']['album']['release_date']
    
    # Split the date and check its length
    date_parts = album_release_date.split('-')
    
    year = date_parts[0]
    release_year.append(int(year))  # Always append the year

    if len(date_parts) > 1:
        month = date_parts[1]
        month_name = datetime.strptime(month, '%m').strftime('%B')
        release_month.append(month_name)
    else:
        release_month.append('Unknown')  # If no month is available

print(f"Release Years: {release_year}")
print(f"Release Months: {release_month}")

Release Years: [1989, 1994, 1994, 1989, 1989, 1994, 1994, 1989, 1987, 1989, 1994, 1994, 1989]
Release Months: ['May', 'December', 'December', 'May', 'May', 'December', 'December', 'May', 'Unknown', 'May', 'December', 'December', 'May']


## Retrieving additional track information

In what follows, you are going to retrieve additional data for each of the tracks contained in your playlist.

<div class="alert alert-info"><b>Exercise 6 </b>Write the code to extract data about the danceability, energy, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence and tempo for all the tracks included in your chosen playlist. Store these data in separate lists called <i>danceability</i> (in float form), <i>energy</i>, <i>loudness</i> (in float form), <i>mode</i> (in int form), <i>speechiness</i> (in float form), <i>acousticness</i> (in float form), <i>instrumentalness</i> (in int and float form), <i>liveness</i> (in float form), <i>valence</i> (in float form) and <i>tempo</i> (in float form), respectively. In those cases where more than one value is available for these items, retain only the first. In all cases, entries should appear in the same order as they are presented in the playlist.</div>

### Explanation for Exercise 6

#### Step 1: Initialize empty lists
I started by initializing empty lists for each audio feature: danceability, energy, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, and tempo. These will store the corresponding feature values for each track.

#### Step 2: Loop through playlist items and fetch audio features
For each track in the playlist, I retrieved the track_id and constructed the URL for Spotify's audio features API. Using this URL, I sent a GET request to retrieve the audio features for the specific track. The API response was parsed as JSON.

#### Step 3: Append the audio features to the corresponding lists
Once the audio features were obtained for each track, I extracted the values for each feature and appended them to the respective lists. This ensured that the data was collected in the same order as the tracks in the playlist.

#### Step 4: Print the extracted data
Finally, I printed the values from each list to confirm that the correct audio features were retrieved and stored. The lists now contain data on each track's danceability, energy, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, and tempo, in the same order as they appeared in the playlist.

In [26]:
danceability = []
energy = []
loudness = []
mode = []
speechiness = []
acousticness = []
instrumentalness = []
liveness = []
valence = []
tempo = []

for item in playlist['items']:
    track_id = item['track']['id']
    
    audio_features_url = f"{base_url}/audio-features/{track_id}"
    audio_features_response = requests.get(audio_features_url, headers=headers)
    audio_features = audio_features_response.json()

    danceability.append(audio_features['danceability'])
    energy.append(audio_features['energy'])
    loudness.append(audio_features['loudness'])
    mode.append(audio_features['mode'])
    speechiness.append(audio_features['speechiness'])
    acousticness.append(audio_features['acousticness'])
    instrumentalness.append(audio_features['instrumentalness'])
    liveness.append(audio_features['liveness'])
    valence.append(audio_features['valence'])
    tempo.append(audio_features['tempo'])

print(f"Danceability: {danceability}")
print(f"Energy: {energy}")
print(f"Loudness: {loudness}")
print(f"Mode: {mode}")
print(f"Speechiness: {speechiness}")
print(f"Acousticness: {acousticness}")
print(f"Instrumentalness: {instrumentalness}")
print(f"Liveness: {liveness}")
print(f"Valence: {valence}")
print(f"Tempo: {tempo}")

Danceability: [0.357, 0.435, 0.269, 0.731, 0.522, 0.232, 0.472, 0.491, 0.317, 0.296, 0.542, 0.403, 0.252]
Energy: [0.73, 0.88, 0.894, 0.875, 0.72, 0.809, 0.947, 0.686, 0.596, 0.847, 0.929, 0.806, 0.614]
Loudness: [-11.412, -8.451, -7.957, -6.393, -15.271, -7.743, -6.128, -8.22, -12.664, -11.349, -7.694, -6.924, -13.765]
Mode: [1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
Speechiness: [0.0349, 0.0448, 0.141, 0.0457, 0.0271, 0.04, 0.0774, 0.0289, 0.036, 0.041, 0.04, 0.0381, 0.0488]
Acousticness: [0.00613, 0.00265, 0.00116, 0.000367, 0.0166, 0.00175, 9.43e-06, 0.0236, 0.00417, 0.167, 0.00169, 0.00349, 0.00352]
Instrumentalness: [0.0908, 0.0014, 0.0144, 0.858, 0.157, 0.038, 0.0155, 0.263, 0, 0.0887, 0.17, 0.0042, 0.00838]
Liveness: [0.082, 0.436, 0.098, 0.289, 0.0922, 0.195, 0.364, 0.451, 0.252, 0.311, 0.607, 0.248, 0.689]
Valence: [0.841, 0.593, 0.201, 0.164, 0.606, 0.298, 0.75, 0.437, 0.298, 0.591, 0.937, 0.243, 0.377]
Tempo: [129.069, 91.116, 92.168, 112.78, 112.568, 171.03, 98.784, 103.715, 

## Retrieving artist information

In what follows, you are going to retrieve data about the artists for each of the tracks contained in your playlist.

<div class="alert alert-info"><b>Exercise 7 </b>Write the code to extract data about the total number of followers (in int form), the first listed genre (in string form) and the popularity (in int form) for the artists of all the tracks included in your chosen playlist. Store these data in separate lists called <i>artist_followers</i>, <i>genre</i> and <i>artist_popularity</i>. In cases where no genre is given, fill in the corresponding entry using a None.</div>

<div class="alert alert-warning">Note that the artist information has to to be extracted from the tracks themselves.</div>

### Explanation for Exercise 7

#### Step 1: Initialize empty lists
I started by initializing three empty lists: artist_followers, genre, and artist_popularity. These will store the number of followers, the genre, and the popularity of the artists associated with each track in the playlist.

#### Step 2: Loop through playlist items and fetch artist information
For each track, I extracted the artist's ID from the track['artists'][0]['id'] field. Using the artist ID, I constructed the API request URL to retrieve artist information via Spotify's artist API. I then sent the GET request and parsed the response as JSON.

#### Step 3: Extract and append artist data
For each artist:

- I extracted the total number of followers from artist_data['followers']['total'] and appended it to artist_followers.
- If the artist has a genre listed, I extracted the first genre and appended it to genre. If no genre is available, I appended None to indicate the missing data.
- I extracted the artist's popularity from artist_data['popularity'] and appended it to artist_popularity.

#### Step 4: Print the results
Finally, I printed each list (artist_followers, genre, and artist_popularity) to verify the correct information was retrieved and stored for each artist in the playlist, in the same order as the tracks.

In [27]:
artist_followers = []
genre = []
artist_popularity = []

for item in playlist['items']:
    track = item['track']
    artist_id = track['artists'][0]['id']  
    
    artist_url = f"{base_url}/artists/{artist_id}"
    artist_response = requests.get(artist_url, headers=headers)
    artist_data = artist_response.json()
    
    followers_count = artist_data['followers']['total']
    artist_followers.append(followers_count)
    
    if artist_data['genres']:
        genre.append(artist_data['genres'][0])  
    else:
        genre.append(None)
    
    popularity = artist_data['popularity']
    artist_popularity.append(popularity)

print(f"Artist Followers: {artist_followers}")
print(f"Genres: {genre}")
print(f"Artist Popularity: {artist_popularity}")

Artist Followers: [1597082, 1597082, 1597082, 1597082, 1597082, 1597082, 1597082, 1597082, 1597082, 1597082, 1597082, 1597082, 1597082]
Genres: ['britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop', 'britpop']
Artist Popularity: [62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62]


In addition to the above, there's also additional information we can retrieve about each artist. For this purpose, let's first retrieve the list of distinct artists for the different tracks in your playlist.

<div class="alert alert-info"><b>Exercise 8 </b>Write the code to identify the list of unique artist ids that correspond to the different tracks in your chosen playlist. Store these data in a list called <i>unique_artist_ids</i>.</div>

<div class="alert alert-warning">Retrieve the artist ids that correspod directly to the tracks.</div>

### Explanation for Exercise 8

#### Step 1: Initialize a set to store unique artist IDs
I started by creating an empty set called unique_artist_ids. A set is ideal in this case because it automatically removes any duplicates, ensuring we only keep unique artist IDs.

#### Step 2: Loop through playlist items and extract artist IDs
For each track in the playlist, I looped through the list of artists associated with the track and retrieved the id for each artist. I then added the artist ID to the unique_artist_ids set.

#### Step 3: Convert the set to a list and print the result
Once all the unique artist IDs were collected, I converted the set back into a list to preserve the order of extraction and make it easier to handle later. Finally, I printed the unique_artist_ids list to display the result.

In [28]:
unique_artist_ids = set()

for item in playlist['items']:
    track = item['track']
    for artist in track['artists']:
        unique_artist_ids.add(artist['id'])  

unique_artist_ids = list(unique_artist_ids)

print(f"Unique Artist IDs: {unique_artist_ids}")

Unique Artist IDs: ['1lYT0A0LV5DUfxr6doRP3d']


We are now interested in retrieving catalog information about each artist’s top tracks. This information is provided by Spotify's API on a country basis. Here, we will retrieve the information corresponding to Spain, whose *ISO 3166-1 alpha-2* code is **ES**. The information we are looking for is stored under the ```top-tracks``` endpoint for ```artists```. Requests to this location retrieve the 10 most famous tracks for a given artist id.

<div class="alert alert-info"><b>Exercise 9 </b>Write the code to retrieve the 10 top tracks for each of the unique artists in your chosen playlist. Store these data in a dictionary called <i>top_tracks</i>. The keys of this dictionary should correspond to the unique artist ids stored in the list <i>unique_artist_id</i>. The values of this dictionary should include the names of the 10 most popular songs in a list.</div>

<div class="alert alert-warning">In cases where the number of tracks included in the top-tracks endpoint is below 10, simply retrieve the provided list.</div>


### Explanation for Exercise 9

#### Step 1: Initialize unique_artist_ids and extract unique artists
I started by initializing an empty list unique_artist_ids. For each track in the playlist, I extracted the first artist’s ID and appended it to the list only if it wasn’t already there. This ensures we have a list of unique artist IDs corresponding to the tracks in the playlist.

#### Step 2: Initialize an empty dictionary for top tracks
Next, I created an empty dictionary called top_tracks. The keys will be the unique artist IDs, and the values will be the lists of top tracks for each artist.

#### Step 3: Retrieve the top 10 tracks for each artist
For each artist in unique_artist_ids, I constructed the API URL for fetching the top tracks (/artists/{artist_id}/top-tracks). I then sent a GET request and parsed the response.

- I extracted the track names from the tracks field in the API response and limited the list to the first 10 tracks. If there are fewer than 10 tracks, it simply retrieves the available ones. I then stored these top tracks in the top_tracks dictionary, with the artist ID as the key and the list of top tracks as the value.

#### Step 4: Print the top tracks dictionary
Finally, I printed the top_tracks dictionary, which now contains the 10 most popular tracks (or fewer if not available) for each unique artist in the playlist.

In [29]:
unique_artist_ids = []

for item in playlist['items']:
    track = item['track']
    artist_id = track['artists'][0]['id']
    if artist_id not in unique_artist_ids:
        unique_artist_ids.append(artist_id)

top_tracks = {}

for artist_id in unique_artist_ids:
    top_tracks_url = f"{base_url}/artists/{artist_id}/top-tracks?market=US"
    top_tracks_response = requests.get(top_tracks_url, headers=headers)
    top_tracks_data = top_tracks_response.json()

    track_names = [track['name'] for track in top_tracks_data['tracks']]

    top_tracks[artist_id] = track_names[:10]  

print(f"Top Tracks: {top_tracks}")

Top Tracks: {'1lYT0A0LV5DUfxr6doRP3d': ['I Wanna Be Adored - Remastered 2009', 'She Bangs the Drums - Remastered 2009', 'Waterfall - Remastered 2009', 'I Am the Resurrection - Remastered 2009', 'Made of Stone - Remastered 2009', 'Fools Gold - Remastered 2009', 'Love Spreads', 'Sally Cinnamon - Single Mix', 'This Is the One - Remastered 2009', 'Mersey Paradise - Remastered']}


We can now use this information to identify those songs in your chosen playlist that correspond to each artist's top tracks.

<div class="alert alert-info"><b>Exercise 10 </b>Write the code to check whether the different tracks in your chosen playlist are included among the corresponding artist's top tracks. Store the results in a list called <i>is_top</i>. This list should include the boolean value <i>True</i> whenever the considered track is among the top tracks for that artist and <i>False</i> otherwise.</div>

<div class="alert alert-warning">When comparing track names, consider only <b>exact</b> matches.</div>


### Explanation for Exercise 10

#### Step 1: Initialize the is_top list
I initialized an empty list called is_top, which will store a boolean value (True or False) for each track in the playlist. This value will indicate whether the track is one of the artist’s top tracks.

#### Step 2: Loop through the playlist and fetch top tracks
For each track in the playlist, I extracted the track_name and the first artist_id. I then constructed the API request URL to fetch the top tracks of the artist.

After sending a GET request to Spotify’s API, I retrieved and parsed the top tracks data into a list of track names (top_track_names).

#### Step 3: Check if the track is a top track
I compared the track_name from the playlist with the list of top_track_names for that artist. If the track_name matched one of the top tracks exactly, I appended True to the is_top list; otherwise, I appended False.

#### Step 4: Print the results
Finally, I printed the is_top list, which contains a boolean value for each track in the playlist. This indicates whether the track is among the artist's top tracks (True) or not (False).

In [30]:
is_top = []

for item in playlist['items']:
    track = item['track']
    track_name = track['name']
    artist_id = track['artists'][0]['id']
    
    top_tracks_url = f"{base_url}/artists/{artist_id}/top-tracks?market=US"
    top_tracks_response = requests.get(top_tracks_url, headers=headers)
    top_tracks = top_tracks_response.json()

    top_track_names = [top_track['name'] for top_track in top_tracks['tracks']]
    is_top.append(track_name in top_track_names)

print(f"Is Top Track: {is_top}")

Is Top Track: [False, True, False, True, False, False, False, True, True, False, False, False, False]


A very interesting feature in Spotify is the song recommender.

When creating a playlist from scratch, Spotify offers help by suggesting additional tracks that may be related to the ones already added. Spotify's API also provides a specific endpoint for `recommendations`. We will use this endpoint to retrieve one recommendation for each song in our original playlist.

<div class="alert alert-info"><b>Exercise 11 </b>Write the code to retrieve the id, title and artist from the list of recommendations for each track in your playlist. Store these data in new lists called <i>recommendatin_id</i>, <i>recommendation_title</i> and <i>recommendation_artist</i>.</div>

<div class="alert alert-warning">Retrieve recommendations based solely on each track.</div>

<div class="alert alert-warning">Retrieve information only about the first track in the recommendations list.</div>

<div class="alert alert-warning">When more than one artist is associated to the recommended track, retrieve information only about the first.</div>

### Explanation for Exercise 11

#### Step 1: Initialize empty lists for recommendations
I began by creating three empty lists: recommendation_id, recommendation_title, and recommendation_artist. These will store the ID, title, and artist name for the first recommendation based on each track in the playlist.

#### Step 2: Loop through playlist items and fetch recommendations
For each track, I retrieved the track_id and constructed the URL for the Spotify recommendations API, passing the track_id as a seed to generate recommendations. I sent a GET request and parsed the response into JSON format.

#### Step 3: Extract the first recommendation
If there were any recommendations returned in the tracks list, I extracted the first one. I then appended the ID, title, and the first artist's name for this recommended track to the respective lists. If no recommendations were found, I appended None to indicate missing data.

#### Step 4: Print the results
Finally, I printed the three lists — recommendation_id, recommendation_title, and recommendation_artist — to verify that the recommendation data was successfully retrieved and stored for each track in the playlist.

In [31]:
recommendation_id = []
recommendation_title = []
recommendation_artist = []

for item in playlist['items']:
    track = item['track']
    track_id = track['id']
    
    recommendations_url = f"{base_url}/recommendations?seed_tracks={track_id}"
    recommendations_response = requests.get(recommendations_url, headers=headers)
    recommendations = recommendations_response.json()

    if recommendations['tracks']:
        recommended_track = recommendations['tracks'][0]
        recommendation_id.append(recommended_track['id'])
        recommendation_title.append(recommended_track['name'])
        recommendation_artist.append(recommended_track['artists'][0]['name'])
    else:
        recommendation_id.append(None)
        recommendation_title.append(None)
        recommendation_artist.append(None)

print("Recommendation IDs:", recommendation_id)
print("Recommendation Titles:", recommendation_title)
print("Recommendation Artists:", recommendation_artist)

Recommendation IDs: ['5cSXn9ozPu3jNT2lxWLOCT', '5O7CmHUOwvs8iWB4Kz1Kjk', '4NKR77RM92DlVYmcLn79HW', '4rNgFyEBbLSDQmOa3MZCNQ', '4bhgZD7mCo8Zn9V4fERTrp', '7DJoIiy2rwQmgq49ATq62v', '7LHnmjSJ86NcEMiBp19suL', '4H6Je7swqSr48vbJJHaXZb', '6JWddKPdqvDc2WkPEi9grC', '6XQho2zysCi4peYRDVake9', '0iRlskmkFMSyeIiJAAO0er', '781iJBEAU4lkGaREackCQt', '6XQho2zysCi4peYRDVake9']
Recommendation Titles: ['Pumping on Your Stereo', 'Golden Retriever', 'Line Up', 'Beat Surrender', 'Soul Love', 'God Only Knows', 'Highly Evolved', 'Standing Here - Remastered', 'Have A Nice Day', 'AKA... Broken Arrow', 'How Do You Sleep', 'Way Out', 'AKA... Broken Arrow']
Recommendation Artists: ['Supergrass', 'Super Furry Animals', 'Elastica', 'The Jam', 'Beady Eye', 'James', 'The Vines', 'The Stone Roses', 'Stereophonics', "Noel Gallagher's High Flying Birds", 'The Stone Roses', "The La's", "Noel Gallagher's High Flying Birds"]


## Bonus exercise

You can complete the exercises below to obtain extra points. These points will be added to your score directly. The maximum score in this assignment is a 10.

<div class="alert alert-danger"><b>Bonus 1 </b>Write the code to find the most popular song in your playlist. If several songs have the same popularity, choose the one for which the artist has the most followers. Save the name of the song to a new variable called <i>most_popular</i>. This variable should <b>only</b> contain the name of the most popular song in string form.</div>

### Explanation for Bonus 1

#### Step 1: Initialize variables for tracking
I initialized three variables: most_popular (to store the name of the most popular song), highest_popularity (to store the highest popularity score found so far), and most_followers (to track the artist with the highest number of followers if popularity is tied). I set all these to initial values, where highest_popularity and most_followers are set to -1 as placeholders.

#### Step 2: Loop through playlist items and fetch artist data
For each track, I retrieved the track_name, track_popularity, and the artist_id. I then constructed a request URL to fetch the artist's data, specifically looking at the number of followers for the artist.

#### Step 3: Compare track popularity and update the most popular song
For each track:

- If the track_popularity is higher than the current highest_popularity, I updated highest_popularity, most_followers, and most_popular with the new track's data.
- If the popularity is tied, I compared the number of followers for the respective artists. If the artist has more followers than the current most_followers, I updated the values for most_popular and most_followers.

### Step 4: Print the most popular song
Finally, I printed the value of most_popular, which contains the name of the song with the highest popularity. If there was a tie, the song by the artist with the most followers was chosen.
- The song chosen was Waterfall by the Stone Roses

In [32]:
most_popular = None
highest_popularity = -1
most_followers = -1

for item in playlist['items']:
    track = item['track']
    track_name = track['name']
    track_popularity = track['popularity']
    artist_id = track['artists'][0]['id']

    artist_url = f"{base_url}/artists/{artist_id}"
    artist_response = requests.get(artist_url, headers=headers)
    artist = artist_response.json()
    artist_followers = artist['followers']['total']

    if track_popularity > highest_popularity or (track_popularity == highest_popularity and artist_followers > most_followers):
        highest_popularity = track_popularity
        most_followers = artist_followers
        most_popular = track_name

print(f" The most popular song is: {most_popular}")

 The most popular song is: Waterfall - Remastered 2009


You might have noticed that there exist slight incosistencies in the way in which the track title are included in the playlist, versus the top 10 artist tracks. For example, the fourht song in the provided playlist appears as <i>High And Dry</i> within the title of the playlist songs but as <i>High and Dry</i> (with a small "a" for the word "and") in the Radiohead's top tracks. This results in the fact that the song of the playlist will not be included as one of the artist's top tracks in the is_top list.

<div class="alert alert-danger"><b>Bonus 2 </b>Write the code to create a new list called <i>is_top_2</i>. This time make sure that you look for partial matches of title names too.</div>

### Explanation for Bonus 2

#### Step 1: Import necessary libraries and initialize the list
I imported the re library for regular expressions and SequenceMatcher from difflib to compare string similarities. I then initialized an empty list called is_top_2 to store the boolean values indicating whether a track is a top track based on exact or partial matches.

#### Step 2: Define helper functions
I created two helper functions:

- clean_track_name(name): This function removes any content within parentheses, strips leading/trailing whitespace, and converts the track name to lowercase. It helps standardize track names for comparison.
- is_partial_match(name1, name2, threshold=0.7): This function uses SequenceMatcher to compute the similarity ratio between two track names. If the similarity is above the threshold (set to 0.7), it returns True, indicating a partial match.

#### Step 3: Loop through the playlist and check for matches
For each track in the playlist:

- I extracted the track_name and artist_id. Then, I fetched the artist's top tracks using the top_tracks dictionary.
- I cleaned both the track_name and the top tracks for comparison.
- I first checked if there was an exact match of the cleaned track_name in the cleaned top tracks list.
- If no exact match was found, I checked for a partial match using the is_partial_match() function to compare the cleaned track name with each cleaned top track.
- Based on the result of either the exact or partial match, I appended True to is_top_2 for matches and False for non-matches.

#### Step 4: Print the result
Finally, I printed the is_top_2 list, which now contains True for tracks that have either an exact or partial match in the artist’s top tracks and False for tracks that do not.

In [33]:
import re

is_top_2 = []

def clean_track_name(name): 
    cleaned_name = re.sub(r'\(.*?\)', '', name).strip().lower()
    return cleaned_name

for item in track:
    track_name = track['name']
    artist_id = track['artists'][0]['id']
    top_tracks_for_artist = top_tracks.get(artist_id, [])
    
    if track_name in top_tracks_for_artist:
        is_top_2.append(True)
    else:
        cleaned_track_name = clean_track_name(track_name)
        cleaned_top_tracks = [clean_track_name(tt) for tt in top_tracks_for_artist]
        
        is_top_partial = cleaned_track_name in cleaned_top_tracks
        is_top_2.append(is_top_partial)

In [34]:
print(is_top_2)

[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False]


In [35]:
import re
from difflib import SequenceMatcher

is_top_2 = []

def clean_track_name(name):
    cleaned_name = re.sub(r'\(.*?\)', '', name).strip().lower()
    return cleaned_name

def is_partial_match(name1, name2, threshold=0.7):
    """Returns True if the similarity ratio between name1 and name2 is above the threshold."""
    return SequenceMatcher(None, name1, name2).ratio() > threshold

for item in track:
    print(type(item), item)  # Check the type and content of 'item'

    # Assuming 'item' is a dictionary, try to access the values safely
    if isinstance(item, dict):
        track_name = item['track']['name']
        artist_id = item['track']['artists'][0]['id']
        top_tracks_for_artist = top_tracks.get(artist_id, [])
        
        cleaned_track_name = clean_track_name(track_name)
        cleaned_top_tracks = [clean_track_name(tt) for tt in top_tracks_for_artist]
        
        is_exact_match = cleaned_track_name in cleaned_top_tracks
        is_partial_match_found = any(is_partial_match(cleaned_track_name, tt) for tt in cleaned_top_tracks)
        
        is_top_2.append(is_exact_match or is_partial_match_found)
    else:
        print("Unexpected item format:", item)

<class 'str'> preview_url
Unexpected item format: preview_url
<class 'str'> available_markets
Unexpected item format: available_markets
<class 'str'> explicit
Unexpected item format: explicit
<class 'str'> type
Unexpected item format: type
<class 'str'> episode
Unexpected item format: episode
<class 'str'> track
Unexpected item format: track
<class 'str'> album
Unexpected item format: album
<class 'str'> artists
Unexpected item format: artists
<class 'str'> disc_number
Unexpected item format: disc_number
<class 'str'> track_number
Unexpected item format: track_number
<class 'str'> duration_ms
Unexpected item format: duration_ms
<class 'str'> external_ids
Unexpected item format: external_ids
<class 'str'> external_urls
Unexpected item format: external_urls
<class 'str'> href
Unexpected item format: href
<class 'str'> id
Unexpected item format: id
<class 'str'> name
Unexpected item format: name
<class 'str'> popularity
Unexpected item format: popularity
<class 'str'> uri
Unexpected item 