# Working with Spotify albums
Getting data for an album from a Spotify object and running basic operations with the album (like getting the track list).

## Import statements

In [1]:
# Enable import from my_spotify.py. Based on https://stackoverflow.com/a/35273613/1899061.
# Putting my_spotify.py in the root directory (without this cell) used to work, but then suddenly stopped working.
import os
import sys
my_spotify_module_path = os.path.abspath(os.path.join('..'))
if my_spotify_module_path not in sys.path:
    sys.path.append(my_spotify_module_path)

In [3]:
# import os
#
# from pathlib import Path
# from dotenv import load_dotenv
#
# import spotipy
# from spotipy.oauth2 import SpotifyClientCredentials
#
# # Based on https://github.com/ipython/ipynb; the reported compiler errors don't seem to matter
# import ipynb
# # from ipynb.fs.full.spotify_authentication import get_spotify_object

import pandas as pd

from my_spotify.my_spotify import get_spotify_object

# from my_spotify import get_spotify_object
# import my_spotify
# %run "M:\\Vladan\\Courses\\P3\\Python\\Python projects\\Jupyter projects\\spotipy-getting-started\\spotify_authentication.ipynb"

## Define albums to work with
A Spotify album can be identified by its ID, its URI, and its URL. See the differences [here](https://spotipy.readthedocs.io/en/2.22.1/#ids-uris-and-urls). Spotify URLs are used in this notebook. See also [this](https://towardsdatascience.com/extracting-song-data-from-the-spotify-api-using-python-b1e79388d50), section *Extracting Tracks From a Playlist*.

To get any of these album identifiers for an album, go to Spotify, open an album of your choice, click the '.&nbsp;.&nbsp;.' under the album title, and then select *Share > Copy Album Link*.

In [4]:
ANTHOLOGY_1 = 'https://open.spotify.com/album/1pBBIxK5yURfbv8Xd5lta1?si=7kvHosStSz-GsAdMGgvpnw'
ANTHOLOGY_2 = 'https://open.spotify.com/album/3Lf8VA23cMAl5imbABTZoo?si=2epR2z-MQiKVLQBeItcVmg'
ANTHOLOGY_3 = 'https://open.spotify.com/album/4l0xO28Y37MBBXQEcBIbXQ?si=TIk8Z7rjTxyLVSgHZNUjqQ'
LIVE_AT_THE_BBC = 'https://open.spotify.com/album/2EowTulHWqSY6QZfTDf5vW?si=dku4AlpiSM6_pc5jWCcJ5Q'
ON_AIR_LIVE_AT_THE_BBC_VOL_2 = 'https://open.spotify.com/album/4On0Hf7VJC1jz5gXY2cU8p?si=anEpTqzLSfOABJEq-I2gnQ'
BOOTLEG_RECORDINGS_1963 = 'https://open.spotify.com/playlist/2a1FL67XYHT9Ael1sRqk7E?si=06d4acb817bf46f1'         # playlist, not an album!
LET_IT_BE_NAKED_FLY_ON_THE_WALL = 'https://open.spotify.com/playlist/0ITHPrUrItehOsn7FHUd2V?si=6a1481a1a9a94fbd' # playlist, not an album!
RARITIES = 'https://open.spotify.com/playlist/5fnevdIUs5MTBpvFr7FY82?si=0e8866e19da24606'                        # playlist, not an album!

Note that it was necessary to remove various non-song tracks from many of these albums and playlists (things like radio host's intros, studio chats, interviews, short demo versions of the songs that appear elsewhere, The Beatles songs performed by other artists (like those in the end of *The Beatles - Bootleg Recordings 1963*), etc.). In practice, it was the most efficient to do it manually on Spotify directly.

In fact, this is how it's been done in order to create the dataset of all The Beatles songs not included on the official albums and relevant compilations:
* an auxiliary playlist, *The Beatles Rarities*, has been created on Spotify by adding all *albums* from the above list
* it has been cleaned manually
* the playlists from the above list have been treated individually
* in the end, when all the datasets have been completed and merged with the starting one, the duplicates have been eliminated manually (keeping only official or the most elaborated versions of each song).

The cleaned version of the auxiliary *The Beatles Rarities* playlist included 200+ songs. They have included many duplicates; these have been eliminated manually (in Excel) after merging the starting dataset with the one obtained from the auxiliary playlist).

## Get the Spotify object

In [5]:
# display(get_spotify_object('env/.env'))
spot = get_spotify_object('env/.env')

## Get all tracks from an album

In [6]:
tracks = spot.album_tracks(ANTHOLOGY_1)         # type(tracks): <class 'dict'>
for track in tracks['items']:
    print(track['name'])                        # type(track): <class 'dict'>

Free As A Bird - Anthology 1 Version
We Were Four Guys... That's All - Anthology 1 Version
That'll Be The Day - Anthology 1 Version
In Spite Of All The Danger - Anthology 1 Version
Sometimes I'd Borrow...Those Still Exist - Anthology 1 Version
Hallelujah I Love Her So - Anthology 1 Version
You'll Be Mine - Anthology 1 Version
Cayenne - Anthology 1 Version
First Of All... It Didn't Do A Thing Here - Anthology 1 Version
My Bonnie - Anthology 1 Version
Ain't She Sweet - Anthology 1 Version
Cry For A Shadow - Anthology 1 Version
Brian Was A Beautiful Guy...He Presented Us Well - Anthology 1 Version
I Secured Them... A Beatle Drink Even Then - Anthology 1 Version
Searchin' - Anthology 1 Version
Three Cool Cats - Anthology 1 Version
The Sheik Of Araby - Anthology 1 Version
Like Dreamers Do - Anthology 1 Version
Hello Little Girl - Anthology 1 Version
Well, The Recording Test... By My Artists - Anthology 1 Version
Besame Mucho - Anthology 1 Version
Love Me Do - Anthology 1 Version
How Do You 

In [184]:
# A more detailed version; gets URI for each song first, and then gets all data from it because it contains more info
track_uris = [t['uri'] for t in spot.album_tracks(ANTHOLOGY_1)['items']]
for track_uri in track_uris:
    track = spot.track(track_uri)
    print(track.keys())                        # type(track): <class 'dict'>
    print(track['name'])                       # type(track): <class 'dict'>

dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])
Free As A Bird - Anthology 1 Version
dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])
We Were Four Guys... That's All - Anthology 1 Version
dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri'])
That'll Be The Day - Anthology 1 Version
dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_numbe

## Bundle it all together in the *get_all_tracks_from_album()*, *get_album_tracks_data()*, *get_album_tracks_audio_features()* and *get_album_tracks_df()* functionsfunction
Assumption: the *.env* file is already created as explained in *spotify_authentication.ipynb*, and its relative path is passed as an argument.

### *get_all_tracks_from_album()*

In [188]:
def get_all_tracks_from_album(album_id: str, env_file_path: str) -> list:
    spot = get_spotify_object(env_file_path)
    # all_tracks = spot.album_tracks(album_id)
    track_uris = [t['uri'] for t in spot.album_tracks(album_id)['items']]
    all_tracks = []
    for track_uri in track_uris:
        all_tracks.append(spot.track(track_uri))
    return all_tracks

In [192]:
# Test get_all_tracks_from_playlist()
all_tracks = get_all_tracks_from_album(ANTHOLOGY_3, 'env/.env')

In [193]:
display(all_tracks)

[{'album': {'album_type': 'compilation',
   'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3WrFJ7ztbogyGnTHbHJFl2'},
     'href': 'https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2',
     'id': '3WrFJ7ztbogyGnTHbHJFl2',
     'name': 'The Beatles',
     'type': 'artist',
     'uri': 'spotify:artist:3WrFJ7ztbogyGnTHbHJFl2'}],
   'available_markets': ['AD',
    'AE',
    'AG',
    'AL',
    'AM',
    'AO',
    'AR',
    'AT',
    'AU',
    'AZ',
    'BA',
    'BB',
    'BD',
    'BE',
    'BF',
    'BG',
    'BH',
    'BI',
    'BJ',
    'BN',
    'BO',
    'BR',
    'BS',
    'BT',
    'BW',
    'BY',
    'BZ',
    'CA',
    'CD',
    'CG',
    'CH',
    'CI',
    'CL',
    'CM',
    'CO',
    'CR',
    'CV',
    'CW',
    'CY',
    'CZ',
    'DE',
    'DJ',
    'DK',
    'DM',
    'DO',
    'DZ',
    'EC',
    'EE',
    'EG',
    'ES',
    'ET',
    'FI',
    'FJ',
    'FM',
    'FR',
    'GA',
    'GB',
    'GD',
    'GE',
    'GH',
    'GM',
    'GN

In [177]:
display(type(all_tracks))
display(all_tracks.keys())
display(len(all_tracks))
display(len(all_tracks['items']))
display(len(all_tracks['items'][0]))
display(type(all_tracks['items'][0]))
display(all_tracks['items'][0].keys())
display(all_tracks['items'][0]['uri'])
display(all_tracks['items'][0]['name'])
display(spot.track(all_tracks['items'][0]['uri']))
# display(all_tracks[0])

dict

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

7

45

14

dict

dict_keys(['artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_urls', 'href', 'id', 'is_local', 'name', 'preview_url', 'track_number', 'type', 'uri'])

'spotify:track:5frMgt4jqRGJk3yMKfqOyl'

'Real Love - Anthology 2 Version'

{'album': {'album_type': 'compilation',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3WrFJ7ztbogyGnTHbHJFl2'},
    'href': 'https://api.spotify.com/v1/artists/3WrFJ7ztbogyGnTHbHJFl2',
    'id': '3WrFJ7ztbogyGnTHbHJFl2',
    'name': 'The Beatles',
    'type': 'artist',
    'uri': 'spotify:artist:3WrFJ7ztbogyGnTHbHJFl2'}],
  'available_markets': ['AD',
   'AE',
   'AG',
   'AL',
   'AM',
   'AO',
   'AR',
   'AT',
   'AU',
   'AZ',
   'BA',
   'BB',
   'BD',
   'BE',
   'BF',
   'BG',
   'BH',
   'BI',
   'BJ',
   'BN',
   'BO',
   'BR',
   'BS',
   'BT',
   'BW',
   'BY',
   'BZ',
   'CA',
   'CD',
   'CG',
   'CH',
   'CI',
   'CL',
   'CM',
   'CO',
   'CR',
   'CV',
   'CW',
   'CY',
   'CZ',
   'DE',
   'DJ',
   'DK',
   'DM',
   'DO',
   'DZ',
   'EC',
   'EE',
   'EG',
   'ES',
   'ET',
   'FI',
   'FJ',
   'FM',
   'FR',
   'GA',
   'GB',
   'GD',
   'GE',
   'GH',
   'GM',
   'GN',
   'GQ',
   'GR',
   'GT',
   'GW',
   'GY',
   'HK',
   'HN',
   

### *get_album_tracks_data()*

In [202]:
def get_album_tracks_data(album_id: str, env_file_path: str) -> list:
    tracks = get_all_tracks_from_album(album_id, env_file_path)
    tracks_data = [(t['uri'],
                    t['name'].split(' - Remastered')[0].split(' / Remastered')[0],
                    t['album']['name'].split(' (Remastered)')[0],
                    t['popularity'],
                    int(round(t['duration_ms'] / 1000, 0))) for t in tracks]
    return tracks_data

In [201]:
# Test get_tracks(data)
display(get_album_tracks_data(ANTHOLOGY_1, 'env/.env'))

[('spotify:track:6hLY3Tz1Xt5kBuKNDTs4ib',
  'Free As A Bird - Anthology 1 Version',
  'Anthology 1',
  50,
  265),
 ('spotify:track:0jDbkso4fyDcabgMbNeWSk',
  "We Were Four Guys... That's All - Anthology 1 Version",
  'Anthology 1',
  0,
  12),
 ('spotify:track:5AxDzt8VmHreQaDatK6TD5',
  "That'll Be The Day - Anthology 1 Version",
  'Anthology 1',
  36,
  128),
 ('spotify:track:7zvMaTcCspbRMahT4DcjQG',
  'In Spite Of All The Danger - Anthology 1 Version',
  'Anthology 1',
  43,
  164),
 ('spotify:track:4c2liWgvHai8TW45FAwxG6',
  "Sometimes I'd Borrow...Those Still Exist - Anthology 1 Version",
  'Anthology 1',
  1,
  18),
 ('spotify:track:5aOteYP6rA7vvBA7TG4Zpx',
  'Hallelujah I Love Her So - Anthology 1 Version',
  'Anthology 1',
  28,
  73),
 ('spotify:track:175jiRVJ6N6ayJQ1FXim57',
  "You'll Be Mine - Anthology 1 Version",
  'Anthology 1',
  26,
  99),
 ('spotify:track:2VXAlfWBNhdiMEVUTGzGOb',
  'Cayenne - Anthology 1 Version',
  'Anthology 1',
  28,
  74),
 ('spotify:track:2PRmNoN1

### *get_album_tracks_audio_features()*

In [210]:
def get_album_tracks_audio_features(album_id: str, env_file_path: str) -> list:
    spot = get_spotify_object(env_file_path)
    tracks = get_all_tracks_from_album(album_id, env_file_path)
    uri_list = [t['uri'] for t in tracks]
    tracks_audio_features_dicts = [d for d in spot.audio_features(uri_list)]

    # tracks_audio_features = [(t['key'],
    #                           t['mode'],
    #                           t['tempo'],
    #                           t['time_signature'],
    #                           t['valence'],
    #                           t['danceability'],
    #                           t['energy'],
    #                           t['loudness'],
    #                           t['acousticness'],
    #                           t['instrumentalness'],
    #                           t['liveness'],
    #                           t['speechiness']) for t in tracks_audio_features_dicts]

    audio_features = ['key', 'mode', 'tempo', 'time_signature', 'valence', 'danceability',
                      'energy', 'loudness', 'acousticness', 'instrumentalness', 'liveness', 'speechiness']
    tracks_audio_features = [tuple(t[key] for key in audio_features) for t in tracks_audio_features_dicts]

    return tracks_audio_features

In [211]:
# Test get_tracks(data)
display(get_album_tracks_audio_features(ANTHOLOGY_2, 'env/.env'))

[(8, 1, 175.726, 4, 0.405, 0.375, 0.694, -7.334, 0.0458, 0.019, 0.257, 0.031),
 (9, 1, 69.1, 4, 0.376, 0.57, 0.302, -8.818, 0.802, 0, 0.15, 0.0251),
 (7, 1, 160.531, 4, 0.688, 0.473, 0.795, -6.991, 0.631, 7.6e-06, 0.388, 0.139),
 (7, 1, 86.285, 4, 0.619, 0.587, 0.275, -10.634, 0.324, 0, 0.218, 0.0431),
 (9, 1, 144.794, 4, 0.965, 0.301, 0.842, -6.261, 0.16, 0, 0.113, 0.0409),
 (9,
  1,
  124.225,
  4,
  0.523,
  0.433,
  0.813,
  -7.096,
  0.00446,
  0.00015,
  0.0546,
  0.0432),
 (10, 1, 95.383, 4, 0.325, 0.643, 0.118, -13.443, 0.86, 0, 0.0987, 0.0361),
 (0, 1, 114.859, 4, 0.656, 0.639, 0.612, -6.742, 0.397, 0, 0.0989, 0.0347),
 (7,
  1,
  91.703,
  4,
  0.731,
  0.463,
  0.889,
  -6.378,
  0.025,
  0.000461,
  0.673,
  0.0497),
 (9, 1, 123.02, 4, 0.604, 0.494, 0.915, -5.382, 0.52, 4.49e-06, 0.969, 0.0762),
 (9, 0, 93.975, 4, 0.43, 0.406, 0.605, -8.557, 0.882, 0, 0.964, 0.0589),
 (11, 0, 94.197, 4, 0.474, 0.432, 0.847, -5.588, 0.388, 0, 0.41, 0.0626),
 (9, 1, 178.077, 4, 0.653, 0.2, 0.

### *get_album_tracks_df()*

In [214]:
def get_album_tracks_df(album_id: str, env_file_path: str) -> pd.DataFrame:
    COLUMNS = [
        'URI',
        'Title',
        'Album',
        'Popularity',
        'Duration',
        'Key',
        'Mode',
        'Tempo',
        'Time_signature',
        'Valence',
        'Danceability',
        'Energy',
        'Loudness',
        'Acousticness',
        'Instrumentalness',
        'Liveness',
        'Speechiness'
    ]

    tracks_data = get_album_tracks_data(album_id, env_file_path)
    tracks_audio_features = get_album_tracks_audio_features(album_id, env_file_path)
    tracks_data_and_audio_features = [(d + af) for d, af in zip(tracks_data, tracks_audio_features)]
    return pd.DataFrame(tracks_data_and_audio_features, columns=COLUMNS)

In [215]:
tracks_df = get_album_tracks_df(ANTHOLOGY_2, 'env/.env')
display(tracks_df)

Unnamed: 0,URI,Title,Album,Popularity,Duration,Key,Mode,Tempo,Time_signature,Valence,Danceability,Energy,Loudness,Acousticness,Instrumentalness,Liveness,Speechiness
0,spotify:track:5frMgt4jqRGJk3yMKfqOyl,Real Love - Anthology 2 Version,Anthology 2,51,234,8,1,175.726,4,0.405,0.375,0.694,-7.334,0.0458,0.019,0.257,0.031
1,spotify:track:4QejjzjGI7oysGZ9tDIAqF,Yes It Is - Anthology 2 Version,Anthology 2,32,110,9,1,69.1,4,0.376,0.57,0.302,-8.818,0.802,0.0,0.15,0.0251
2,spotify:track:7DHWYI1t2GDsNszCVy3VUk,I'm Down - Take 1 / Anthology 2 Version,Anthology 2,34,173,7,1,160.531,4,0.688,0.473,0.795,-6.991,0.631,8e-06,0.388,0.139
3,spotify:track:0uEW2HfyPuREnnMzFxmdbH,You've Got To Hide Your Love Away - Take 5 / A...,Anthology 2,33,165,7,1,86.285,4,0.619,0.587,0.275,-10.634,0.324,0.0,0.218,0.0431
4,spotify:track:4H01EkzG7KtRdcz5JP6xFa,If You've Got Trouble - Anthology 2 Version,Anthology 2,29,168,9,1,144.794,4,0.965,0.301,0.842,-6.261,0.16,0.0,0.113,0.0409
5,spotify:track:5845nYfGBbBomqEnf7pC7Q,That Means A Lot - Anthology 2 Version,Anthology 2,32,146,9,1,124.225,4,0.523,0.433,0.813,-7.096,0.00446,0.00015,0.0546,0.0432
6,spotify:track:5pAtqxDnVQFsZrc3SviBHt,Yesterday - Anthology 2 Version,Anthology 2,35,154,10,1,95.383,4,0.325,0.643,0.118,-13.443,0.86,0.0,0.0987,0.0361
7,spotify:track:2lC0VVD7xHu1IOIOVrLrj5,It's Only Love - Anthology 2 Version,Anthology 2,32,118,0,1,114.859,4,0.656,0.639,0.612,-6.742,0.397,0.0,0.0989,0.0347
8,spotify:track:0xgduXfu5QpXzhLcS1B3b6,"I Feel Fine - Live From The ABC Theatre, Black...",Anthology 2,29,136,7,1,91.703,4,0.731,0.463,0.889,-6.378,0.025,0.000461,0.673,0.0497
9,spotify:track:1ORS4W9bT6v8v3Yy45KsDY,"Ticket To Ride - Live From The ABC Theatre, Bl...",Anthology 2,28,165,9,1,123.02,4,0.604,0.494,0.915,-5.382,0.52,4e-06,0.969,0.0762
