# Album sequencing

This is a project to visualize the popularity of songs based on where they occur within an album. There are a variety of articles about how someone putting together an album may want to structure their sequence, but this is an attempt to see if there are any overarching trends visible across many of music's best and most popular albums.

My main hypothesis is that artists will typically put their top stuff in the first half and in particular put what they believe to be their most popular song in the third or fourth spot.

Beyond that, I am also curious if there are differences in sequencing based on a few dimensions: genre of the album, when the album was released, where the singles appear/occur.

This notebook is how I will pull the data used for the analysis and you can find the actual write-up/analysis [here](#).

## Determine + pull the "top albums" to consider for analysis

For this, I will be using existing "top albums" lists to put together my universe of albums to analyze.

### [#1. Rolling Stone's 500 Greatest Albums of All Time](https://en.wikipedia.org/wiki/Rolling_Stone's_500_Greatest_Albums_of_All_Time)

I downloaded this list of albums from [this Kaggle dataset](https://www.kaggle.com/notgibs/500-greatest-albums-of-all-time-rolling-stone). 

In [64]:
import pandas as pd
rolling_stone_raw = pd.read_csv("../data/00_raw_album_lists/rolling_stones_top_500_albumlist.csv", 
                                encoding = "unicode_escape")[["Artist", "Album", "Year"]]
rolling_stone_raw["Source"] = "Rolling Stone 500 Greatest"
rolling_stone_raw.columns = [e.lower() for e in rolling_stone_raw.columns]
rolling_stone_raw.head()

Unnamed: 0,artist,album,year,source
0,The Beatles,Sgt. Pepper's Lonely Hearts Club Band,1967,Rolling Stone 500 Greatest
1,The Beach Boys,Pet Sounds,1966,Rolling Stone 500 Greatest
2,The Beatles,Revolver,1966,Rolling Stone 500 Greatest
3,Bob Dylan,Highway 61 Revisited,1965,Rolling Stone 500 Greatest
4,The Beatles,Rubber Soul,1965,Rolling Stone 500 Greatest


### [#2. Billboard's 200 Greatest Albums](https://www.billboard.com/charts/greatest-billboard-200-albums)

I'll scrape the album information for this list using BeautifulSoup. The list is technically behind a paywall, but you can get the underlying data when scraping :)

In [69]:
import requests
import bs4 as BeautifulSoup

# request the page
res = requests.get("https://www.billboard.com/charts/greatest-billboard-200-albums")
soup = BeautifulSoup.BeautifulSoup(res.text, "lxml")

# parse out the titles, and artists
titles_html = soup.find_all("span", {"class": "chart-list-item__title-text"})
artists_html = soup.find_all("div", {"class": "chart-list-item__artist"})  
titles = [title.text.lstrip().replace("\n", "") for title in titles_html]
artists = [artist.text.lstrip().replace("\n", "") for artist in artists_html]

# throw it all into a df
bbg200_raw = pd.DataFrame({"artist": artists, "album": titles, "source": "Billboard 200 Greatest"})
bbg200_raw.head()

Unnamed: 0,artist,album,source
0,Adele,21,Billboard 200 Greatest
1,Soundtrack,The Sound Of Music,Billboard 200 Greatest
2,Michael Jackson,Thriller,Billboard 200 Greatest
3,Taylor Swift,Fearless,Billboard 200 Greatest
4,Bruce Springsteen,Born In The U.S.A.,Billboard 200 Greatest


### [#3. Billboard's Year-End Album Charts for 2002 - 2020](https://www.billboard.com/charts/year-end/2020/top-billboard-200-albums)

This is Billboard's top 200 albums for each year. Between these 19 year-lists (200 albums each), plus the 200 albums from the "200 Greatest List," we should end up with around 4,000 albums (though including some duplicates) from Billboard.

In [71]:
bbye_dfs = [ ]

# for each year...
for year in range(2002, 2021):
    
    # request the page
    src = "Billboard Year-End %d" % year
    url = "https://www.billboard.com/charts/year-end/%d/top-billboard-200-albums" % year
    print("%s -- %s" % (src, url))
    res = requests.get(url)
    soup = BeautifulSoup.BeautifulSoup(res.text, "lxml")
    
    # parse out the titles, and artists
    titles_html = soup.find_all("div", {"class": "ye-chart-item__title"})
    artists_html = soup.find_all("div", {"class": "ye-chart-item__artist"})
    titles = [title.text.lstrip().replace("\n", "") for title in titles_html]
    artists = [artist.text.lstrip().replace("\n", "") for artist in artists_html]
    
    # throw it all into a df
    bbye_dfs.append(pd.DataFrame({"artist": artists, "album": titles, "source": src}))
    
# combine all yearly lists into one
bbye_raw = pd.concat(bbye_dfs)
bbye_raw.head()

Billboard Year-End 2002 -- https://www.billboard.com/charts/year-end/2002/top-billboard-200-albums
Billboard Year-End 2003 -- https://www.billboard.com/charts/year-end/2003/top-billboard-200-albums
Billboard Year-End 2004 -- https://www.billboard.com/charts/year-end/2004/top-billboard-200-albums
Billboard Year-End 2005 -- https://www.billboard.com/charts/year-end/2005/top-billboard-200-albums
Billboard Year-End 2006 -- https://www.billboard.com/charts/year-end/2006/top-billboard-200-albums
Billboard Year-End 2007 -- https://www.billboard.com/charts/year-end/2007/top-billboard-200-albums
Billboard Year-End 2008 -- https://www.billboard.com/charts/year-end/2008/top-billboard-200-albums
Billboard Year-End 2009 -- https://www.billboard.com/charts/year-end/2009/top-billboard-200-albums
Billboard Year-End 2010 -- https://www.billboard.com/charts/year-end/2010/top-billboard-200-albums
Billboard Year-End 2011 -- https://www.billboard.com/charts/year-end/2011/top-billboard-200-albums
Billboard 

Unnamed: 0,artist,album,source
0,Eminem,The Eminem Show,Billboard Year-End 2002
1,Creed,Weathered,Billboard Year-End 2002
2,Nelly,Nellyville,Billboard Year-End 2002
3,P!nk,M!ssundaztood,Billboard Year-End 2002
4,Linkin Park,[Hybrid Theory],Billboard Year-End 2002


### #4. Metacritic Year-End Best Albums for 2009 - 2019

In [73]:
import re

# define the metacritic urls to pull from
urls = ["https://www.metacritic.com/feature/best-albums-of-2009", 
        "https://www.metacritic.com/feature/best-music-of-2010", 
        "https://www.metacritic.com/feature/best-albums-of-2011", 
        "https://www.metacritic.com/feature/best-albums-of-2012", 
        "https://www.metacritic.com/feature/best-albums-of-2013", 
        "https://www.metacritic.com/feature/best-albums-of-2014", 
        "https://www.metacritic.com/feature/best-albums-of-2015", 
        "https://www.metacritic.com/feature/best-albums-of-2016", 
        "https://www.metacritic.com/feature/best-albums-released-in-2017",
        "https://www.metacritic.com/feature/best-albums-released-in-2018",
        "https://www.metacritic.com/feature/best-albums-released-in-2019"]

# define src names for each source
srcs = ["Metacritic Year-End %d" % year for year in range(2009, 2020)]

# create empty list to store each list's df
mcye_dfs = [ ]

# for each list, parse out each items rank/title/artist and store in df
for i in range(len(urls)):
    url = urls[i]
    src = srcs[i]    
    print("%s -- %s" % (src, url))
        
    res = requests.get(url, headers = {'User-Agent': 'Chrome/32.0.1700.76 m'})
    soup = BeautifulSoup.BeautifulSoup(res.text, "lxml")
    
    titles_html = soup.select("h3.special > a")
    artists_html = soup.select("h3.special > strong")
    
    if src > "Metacritic Year-End 2010":
        titles_html = soup.select("h3.special > a")
        artists_html = soup.select("h3.special > strong")
        
        td_ix = 3 if src > "Metacritic Year-End 2014" else 2
        
        titles_html = titles_html + soup.find("table", {"class": "listtable"}).select("tr > td:nth-of-type(%d) > a" % td_ix)
        artists_html = artists_html + soup.find("table", {"class": "listtable"}).select("tr > td:nth-of-type(%d) > strong" % td_ix)
        
        titles = [title.text.lstrip().replace("\n", "") for title in titles_html]
        artists = [artist.text.lstrip().replace("\n", "") for artist in artists_html]
    else:
        titles_html = soup.select("td.title > a")
        artists_html = soup.select("td.title")
        
        titles = [title.text.lstrip().replace("\n", "") for title in titles_html]
        artists = [re.sub(r".*(\n +)?by ", "", text.text).lstrip().replace("\n", "") for text in artists_html]
    
    # confirm that there are an equal number of titles/artists (didn't miss anything)
    assert (40 == len(titles) or src == "Metacritic Year-End 2009") and len(titles) == len(artists)
    
    mcye_dfs.append(pd.DataFrame({"artist": artists, "album": titles, "source": src}))

# concat all lists together
mcye_raw = pd.concat(mcye_dfs)
mcye_raw.head()

Metacritic Year-End 2009 -- https://www.metacritic.com/feature/best-albums-of-2009
Metacritic Year-End 2010 -- https://www.metacritic.com/feature/best-music-of-2010
Metacritic Year-End 2011 -- https://www.metacritic.com/feature/best-albums-of-2011
Metacritic Year-End 2012 -- https://www.metacritic.com/feature/best-albums-of-2012
Metacritic Year-End 2013 -- https://www.metacritic.com/feature/best-albums-of-2013
Metacritic Year-End 2014 -- https://www.metacritic.com/feature/best-albums-of-2014
Metacritic Year-End 2015 -- https://www.metacritic.com/feature/best-albums-of-2015
Metacritic Year-End 2016 -- https://www.metacritic.com/feature/best-albums-of-2016
Metacritic Year-End 2017 -- https://www.metacritic.com/feature/best-albums-released-in-2017
Metacritic Year-End 2018 -- https://www.metacritic.com/feature/best-albums-released-in-2018
Metacritic Year-End 2019 -- https://www.metacritic.com/feature/best-albums-released-in-2019


Unnamed: 0,artist,album,source
0,Animal Collective,Merriweather Post Pavilion,Metacritic Year-End 2009
1,Raekwon,Only Built 4 Cuban Linx... Pt. II,Metacritic Year-End 2009
2,Baroness,Blue Record,Metacritic Year-End 2009
3,Sunn O))),Monoliths & Dimensions,Metacritic Year-End 2009
4,Brad Paisley,American Saturday Night,Metacritic Year-End 2009


### [#5. Top 200 Albums on iTunes (05/31/2021)](http://www.popvortex.com/music/charts/itunes-top-400-albums.php)

I downloaded this list of albums manual from PopVortex.

In [106]:
itunes_raw = pd.read_csv("../data/00_raw_album_lists/popvortex_top_200_itunes_albumlist.csv")[["Artist", "Title"]]
itunes_raw["Source"] = "iTunes Top 200, 05/31/2021"
itunes_raw.columns = [e.lower() for e in itunes_raw.columns]
itunes_raw = itunes_raw.rename(columns = {"title": "album"})
itunes_raw.head()

Unnamed: 0,artist,album,source
0,DMX,Exodus,"iTunes Top 200, 05/31/2021"
1,Olivia Rodrigo,SOUR,"iTunes Top 200, 05/31/2021"
2,TOMORROW X TOGETHER,The Chaos Chapter : FREEZE,"iTunes Top 200, 05/31/2021"
3,Various Artists,Cruella (Original Motion Picture Soundtrack),"iTunes Top 200, 05/31/2021"
4,JOY,Hello - Special Album - EP,"iTunes Top 200, 05/31/2021"


### #6. Concept Albums

Since concept albums are largely structured around some sort of narrative that does not necessarily adhere to a popularity-based album sequence, I was curious to see if notable concept albums do indeed differ in structure to other albums. For this, I referenced a few lists ([A](https://www.pastemagazine.com/music/the-18-best-concept-albums-of-the-21st-century-so-far/), [B](https://www.udiscovermusic.com/stories/best-concept-albums/), [C](https://www.nme.com/photos/23-of-the-maddest-and-most-memorable-concept-albums-1425767)) and pulled the albums manually.

In [105]:
concept_raw = pd.read_csv("../data/00_raw_album_lists/concept_albumlist.csv")
concept_raw.head()

Unnamed: 0,artist,album,year,source
0,Alice Cooper,School's Out,1972,Concept Albums
1,Arcade Fire,The Suburbs,2010,Concept Albums
2,Bon Iver,"For Emma, Forever Ago",2007,Concept Albums
3,David Bowie,The Rise and Fall of Ziggy Stardust And the Sp...,1972,Concept Albums
4,Drive-By Truckers,Southern Rock Opera,2001,Concept Albums


---

In [100]:
combined_albumlist = pd.concat([rolling_stone_raw, bbg200_raw, 
                                bbye_raw, mcye_raw, itunes_raw, 
                                concept_raw]).groupby(["artist", "album"]).agg(lambda col: ', '.join(col))
combined_albumlist = combined_albumlist.reset_index()
print("%d total albums" % len(combined_albumlist))
combined_albumlist.head()

3814 total albums


Unnamed: 0,artist,album,source
0,"""Weird Al"" Yankovic",Mandatory Fun,Billboard Year-End 2014
1,'N Sync,'N Sync,Billboard 200 Greatest
2,'N Sync,Celebrity,Billboard Year-End 2002
3,'N Sync,No Strings Attached,Billboard 200 Greatest
4,(Sandy) Alex G,House of Sugar,Metacritic Year-End 2019


In [156]:
from datetime import date
combined_albumlist.to_csv("../data/01_combined_album_list_%s.csv" % 
                          date.today().strftime("%Y%m%d"), 
                          index = False)

In [157]:
combined_albumlist = pd.read_csv("../data/01_combined_album_list_%s.csv" % date.today().strftime("%Y%m%d"))

In total, across these six sources, we end up with close to 4,000 albums in total to analyze.

---

### Search for albums on Spotify + download track data for each

In [181]:
import os, json, spotipy
from json.decoder import JSONDecodeError
import spotipy.util as util

# load in api keys
api_keys = json.load(open("../data/api-keys.json"))

# init API for Spotify
os.environ["SPOTIPY_CLIENT_ID"]     = api_keys["spotipy_client_id"]
os.environ["SPOTIPY_CLIENT_SECRET"] = api_keys["spotipy_client_secret"]
os.environ["SPOTIPY_REDIRECT_URI"]  = api_keys["redirect_url"]
user_id = '129874447'

try:
    token = util.prompt_for_user_token(username = user_id)
except (AttributeError, JSONDecodeError):
    os.remove(f".cache-{user_id}")
    token = util.prompt_for_user_token(username = user_id)
sp = spotipy.Spotify(auth = token)

In [117]:
row = combined_albumlist.loc[0]

artist        "Weird Al" Yankovic
album               Mandatory Fun
source    Billboard Year-End 2014
Name: 0, dtype: object

In [124]:
combined_albumlist[1:10]

Unnamed: 0,artist,album,source
1,'N Sync,'N Sync,Billboard 200 Greatest
2,'N Sync,Celebrity,Billboard Year-End 2002
3,'N Sync,No Strings Attached,Billboard 200 Greatest
4,(Sandy) Alex G,House of Sugar,Metacritic Year-End 2019
5,112,Pleasure & Pain,Billboard Year-End 2005
6,2 Chainz,B.O.A.T.S. II #METIME,Billboard Year-End 2013
7,2 Chainz,Based On A T.R.U. Story,"Billboard Year-End 2012, Billboard Year-End 2013"
8,2 Chainz,ColleGrove,Billboard Year-End 2016
9,2 Chainz,Pretty Girls Like Trap Music,"Billboard Year-End 2017, Billboard Year-End 2018"


In [137]:
# for each album, search for name + artist, put results into DF
albums = [ ]
for ix, row in combined_albumlist.iterrows():    
    q_artist = row['artist']
    q_album = row['album']    
    print("%d/%d: %s -- %s" % (ix + 1, len(combined_albumlist), q_artist, q_album))

    res = sp.search(q = "artist:%s album:%s" % (q_artist, q_album), 
                    type = "album", limit = 50)['albums']
    res_albums = res['items']
    ### could consider looping through all results, BUT the match should come up in the top 50
    # while res['next']:
    #     res = sp.next(res)['albums']
    #     res_albums.extend(res['items'])
        
    albums.extend([{'query_artist': q_artist,
                    'query_title': q_album,
                    'id': a['id'],
                    'name': a['name'],
                    'artist': '|'.join([artist['name'] for artist in a['artists']]),
                    'artist_id': '|'.join([artist['id'] for artist in a['artists']]),
                    'n_artists': len(a['artists']),
                    'type': a['type'],
                    'release_date': a['release_date'], 
                    'total_tracks': a['total_tracks']} for a in res_albums])

1/3814: "Weird Al" Yankovic -- Mandatory Fun
2/3814: 'N Sync -- 'N Sync
3/3814: 'N Sync -- Celebrity
4/3814: 'N Sync -- No Strings Attached
5/3814: (Sandy) Alex G -- House of Sugar
6/3814: 112 -- Pleasure & Pain
7/3814: 2 Chainz -- B.O.A.T.S. II #METIME
8/3814: 2 Chainz -- Based On A T.R.U. Story
9/3814: 2 Chainz -- ColleGrove
10/3814: 2 Chainz -- Pretty Girls Like Trap Music
11/3814: 21 Savage -- I Am > I Was
12/3814: 21 Savage -- Issa Album
13/3814: 21 Savage & Metro Boomin -- Savage Mode
14/3814: 21 Savage & Metro Boomin -- Savage Mode II
16/3814: 2Pac -- Better Dayz
17/3814: 2Pac -- Greatest Hits
18/3814: 2Pac -- Loyal To The Game
19/3814: 2Pac -- Pac's Life
20/3814: 3 Doors Down -- 3 Doors Down
21/3814: 3 Doors Down -- Another 700 Miles (EP)
22/3814: 3 Doors Down -- Away From The Sun
23/3814: 3 Doors Down -- Seventeen Days
24/3814: 3OH!3 -- Want
25/3814: 5 Seconds Of Summer -- 5 Seconds Of Summer
26/3814: 5 Seconds Of Summer -- C A L M
27/3814: 5 Seconds Of Summer -- LIVESOS
28/38

197/3814: Ariana Grande -- Yours Truly
198/3814: Ariana Grande -- thank u, next
199/3814: Ariel Pink's Haunted Graffiti -- Before Today
200/3814: Ashanti -- Ashanti
201/3814: Ashanti -- Chapter II
202/3814: Ashanti -- Concrete Rose
203/3814: Ashanti -- The Declaration
204/3814: Ashlee Simpson -- Autobiography
205/3814: Ashlee Simpson -- I Am Me
206/3814: Ashley Tisdale -- Headstrong
207/3814: Asia -- Asia
208/3814: Atlas Sound -- Logos 
209/3814: Atlas Sound -- Parallax
210/3814: Audioslave -- Audioslave
211/3814: Audioslave -- Out Of Exile
212/3814: Audioslave -- Revelations
213/3814: August Alsina -- Testimony
214/3814: Ava Max -- Heaven & Hell
215/3814: Avant -- Director
216/3814: Avant -- Ecstasy
217/3814: Avant -- Private Room
218/3814: Avenged Sevenfold -- Avenged Sevenfold
219/3814: Avenged Sevenfold -- City Of Evil
220/3814: Avenged Sevenfold -- Hail To The King
221/3814: Avenged Sevenfold -- Nightmare
222/3814: Aventura -- The Last
223/3814: Avicii -- True
224/3814: Avril Lavi

389/3814: Bob Dylan -- Tempest
390/3814: Bob Dylan -- The Freewheelin' Bob Dylan
391/3814: Bob Dylan -- Time Out of Mind
392/3814: Bob Dylan -- Together Through Life
393/3814: Bob Dylan  -- Shadows in the Night
394/3814: Bob Dylan  -- Tempest
395/3814: Bob Dylan and the Band -- The Basement Tapes
396/3814: Bob Marley & The Wailers -- Exodus
397/3814: Bob Marley & The Wailers -- Kaya (Remastered)
398/3814: Bob Marley & The Wailers -- Legend (Deluxe Edition)
399/3814: Bob Marley & The Wailers -- Legend: The Best of Bob Marley and The Wailers
400/3814: Bob Marley & The Wailers -- Legend: The Best of Bob Marley and the Wailers (Remastered)
401/3814: Bob Marley & The Wailers -- Natty Dread
402/3814: Bob Marley & The Wailers -- Uprising (Remastered) [Bonus Track Version]
403/3814: Bob Marley And The Wailers -- Legend: The Best Of Bob Marley And The Wailers
404/3814: Bob Mould -- Patch the    Sky
405/3814: Bob Seger -- Face The Promise
406/3814: Bob Seger -- Ride Out
407/3814: Bob Seger & The

573/3814: Chance the Rapper  -- Coloring Book
574/3814: Charlie Puth -- Nine Track Mind
575/3814: Charlie Puth -- Voicenotes
576/3814: Charlie Wilson -- Just Charlie
577/3814: Charlie Wilson -- Love, Charlie
578/3814: Charlie Wilson -- Uncle Charlie
579/3814: Charlotte Church -- Enchantment
580/3814: Chase Rice -- Ignite The Night
581/3814: Chase Rice -- The Album
582/3814: Cheap Trick -- Cheap Trick at Budokan
583/3814: Cheap Trick -- In Color
584/3814: Cher -- Closer To The Truth
585/3814: Cher -- Dancing Queen
586/3814: Cher -- Living Proof
587/3814: Cher -- The Very Best Of Cher
588/3814: Cherish -- Unappreciated
589/3814: Chevelle -- Hats Off To The Bull
590/3814: Chevelle -- La Gargola
591/3814: Chevelle -- Wonder What's Next
592/3814: Chicago -- Chicago II
593/3814: Chicago -- Chicago V
594/3814: Chickenfoot -- Chickenfoot
595/3814: Chief Keef -- Finally Rich
596/3814: Childish Gambino -- "Awaken, My Love!"
597/3814: Childish Gambino -- Because The Internet
598/3814: Childish Ga

764/3814: David Archuleta -- David Archuleta
765/3814: David Banner -- Mississippi: The Album
766/3814: David Bowie -- Aladdin Sane
767/3814: David Bowie -- Best Of Bowie
768/3814: David Bowie -- Best of Bowie
769/3814: David Bowie -- Blackstar
770/3814: David Bowie -- Hunky Dory
771/3814: David Bowie -- Low
772/3814: David Bowie -- Station to Station
773/3814: David Bowie -- The Next Day
774/3814: David Bowie -- The Rise and Fall of Ziggy Stardust And the Spiders from Mars
775/3814: David Bowie -- The Rise and Fall of Ziggy Stardust and the Spiders From Mars
776/3814: David Bowie  -- Blackstar
777/3814: David Cook -- David Cook
778/3814: David Foster -- Hit Man: David Foster & Friends
779/3814: David Guetta -- Listen
780/3814: David Guetta -- Nothing But The Beat
781/3814: Daya -- Daya (EP)
782/3814: De La Soul -- 3 Feet High and Rising
783/3814: Deafheaven -- Ordinary Corrupt Human Love
784/3814: Deafheaven -- Sunbather
785/3814: Deafheaven  -- New Bermuda
786/3814: Dean Martin -- Di

956/3814: Eminem -- The Marshall Mathers LP 2
957/3814: Eminem -- The Slim Shady LP
958/3814: Enrique Iglesias -- 95/08
959/3814: Enrique Iglesias -- Escape
960/3814: Enrique Iglesias -- Euphoria
961/3814: Enrique Iglesias -- Sex And Love
962/3814: Enya -- A Day Without Rain
963/3814: Enya -- Amarantine
964/3814: Enya -- And Winter Came...
965/3814: Enya -- Dark Sky Island
966/3814: Eric B. & Rakim -- Paid in Full
967/3814: Eric Church -- Carolina
968/3814: Eric Church -- Caught In The Act: Live
969/3814: Eric Church -- Chief
970/3814: Eric Church -- Heart
971/3814: Eric Church -- Mr. Misunderstood
972/3814: Eric Church -- Soul
973/3814: Eric Church -- The Outsiders
974/3814: Eric Clapton -- 461 Ocean Boulevard
975/3814: Eric Clapton -- Me And Mr Johnson
976/3814: Eric Clapton -- Slowhand
977/3814: Eric Clapton -- The Best Of Eric Clapton: 20th Century Masters The Millennium Collection
978/3814: Eric Clapton -- Unplugged
979/3814: Eric Clapton & Friends -- The Breeze - An Appreciation 

1142/3814: George Michael -- Faith
1143/3814: George Strait -- 22 More Hits
1144/3814: George Strait -- 50 Number Ones
1145/3814: George Strait -- Cold Beer Conversation
1146/3814: George Strait -- For The Last Time: Live From The Astrodome
1147/3814: George Strait -- Here For A Good Time
1148/3814: George Strait -- Honkytonkville
1149/3814: George Strait -- It Just Comes Natural
1150/3814: George Strait -- Love Is Everything
1151/3814: George Strait -- Somewhere Down In Texas
1152/3814: George Strait -- The Cowboy Rides Away: Live From AT&T Stadium
1153/3814: George Strait -- The Road Less Traveled
1154/3814: George Strait -- Troubadour
1155/3814: George Strait -- Twang
1156/3814: Gerald Levert -- In My Songs
1157/3814: Ghostface Killah  -- Apollo Kids
1158/3814: Gil Scott-Heron and Jamie xx -- We're New Here
1159/3814: Gillian Welch -- The  Harrow and the Harvest 
1160/3814: Ginuwine -- The Senior
1161/3814: Gnarls Barkley -- St. Elsewhere
1162/3814: Godsmack -- 1000hp
1163/3814: God

1331/3814: Jackie Evancho -- O Holy Night (EP)
1332/3814: Jackie Wilson -- Mr. Excitement!
1333/3814: Jackson Browne -- For Everyman
1334/3814: Jackson Browne -- Late for the Sky
1335/3814: Jackson Browne -- The Pretender
1336/3814: Jadakiss -- Kiss Of Death
1337/3814: Jadakiss -- The Last Kiss
1338/3814: Jagged Edge -- Hard
1339/3814: Jaheim -- Another Round
1340/3814: Jaheim -- Ghetto Classics
1341/3814: Jaheim -- Still Ghetto
1342/3814: Jaheim -- The Makings Of A Man
1343/3814: Jaheim -- [Ghetto Love]
1344/3814: Jake Owen -- Barefoot Blue Jean Night
1345/3814: Jake Owen -- Days Of Gold
1346/3814: James Arthur -- Back From The Edge
1347/3814: James Bay -- Chaos And The Calm
1348/3814: James Blackshaw -- The Glass Bead Game 
1349/3814: James Blunt -- Back To Bedlam
1350/3814: James Brown -- In the Jungle Groove
1351/3814: James Brown -- Live at the Apollo, 1962
1352/3814: James Brown -- Star Time
1353/3814: James Otto -- SUNSET MAN
1354/3814: James Taylor -- Before This World
1355/381

1516/3814: Johnny Cash -- American V: A Hundred Highways
1517/3814: Johnny Cash -- At Folsom Prison
1518/3814: Johnny Cash -- Out Among The Stars
1519/3814: Johnny Cash -- The Essential Johnny Cash
1520/3814: Johnny Cash -- The Legend Of Johnny Cash
1521/3814: Joji -- BALLADS 1
1522/3814: Jon Bellion -- The Human Condition
1523/3814: Jon Hopkins -- Immunity
1524/3814: Jon Hopkins -- Singularity
1525/3814: Jon Pardi -- California Sunrise
1526/3814: Jonas Brothers -- A Little Bit Longer
1527/3814: Jonas Brothers -- Happiness Begins
1528/3814: Jonas Brothers -- Jonas Brothers
1529/3814: Jonas Brothers -- Lines, Vines And Trying Times
1530/3814: Joni Mitchell -- Blue
1531/3814: Joni Mitchell -- Court and Spark
1532/3814: Jordan Davis -- Buy Dirt
1533/3814: Jordin Sparks -- Jordin Sparks
1534/3814: Josh Groban -- All That Echoes
1535/3814: Josh Groban -- Awake
1536/3814: Josh Groban -- Closer
1537/3814: Josh Groban -- Illuminations
1538/3814: Josh Groban -- Josh Groban
1539/3814: Josh Groba

1710/3814: Kids See Ghosts -- Kids See Ghosts
1711/3814: Kidz Bop Kids -- Kidz Bop 10
1712/3814: Kidz Bop Kids -- Kidz Bop 11
1713/3814: Kidz Bop Kids -- Kidz Bop 13
1714/3814: Kidz Bop Kids -- Kidz Bop 15
1715/3814: Kidz Bop Kids -- Kidz Bop 17
1716/3814: Kidz Bop Kids -- Kidz Bop 18
1717/3814: Kidz Bop Kids -- Kidz Bop 19
1718/3814: Kidz Bop Kids -- Kidz Bop 20
1719/3814: Kidz Bop Kids -- Kidz Bop 21
1720/3814: Kidz Bop Kids -- Kidz Bop 22
1721/3814: Kidz Bop Kids -- Kidz Bop 23
1722/3814: Kidz Bop Kids -- Kidz Bop 24
1723/3814: Kidz Bop Kids -- Kidz Bop 25
1724/3814: Kidz Bop Kids -- Kidz Bop 26
1725/3814: Kidz Bop Kids -- Kidz Bop 27
1726/3814: Kidz Bop Kids -- Kidz Bop 28
1727/3814: Kidz Bop Kids -- Kidz Bop 3
1728/3814: Kidz Bop Kids -- Kidz Bop 30
1729/3814: Kidz Bop Kids -- Kidz Bop 31
1730/3814: Kidz Bop Kids -- Kidz Bop 7
1731/3814: Kidz Bop Kids -- Kidz Bop 9
1732/3814: Kiiara -- Low Kii Savage (EP)
1733/3814: Killer Mike -- R.A.P. Music
1734/3814: Kings Of Leon -- Come Arou

1898/3814: Linkin Park -- [Reanimation]
1899/3814: Lionel Richie -- Can't Slow Down
1900/3814: Lionel Richie -- The Best Of Lionel Richie: 20th Century Masters The Millennium Collection
1901/3814: Lionel Richie -- The Definitive Collection
1902/3814: Lionel Richie -- Tuskegee
1903/3814: Lisa Marie Presley -- To Whom It May Concern
1904/3814: Little Big Town -- Pain Killer
1905/3814: Little Big Town -- The Breaker
1906/3814: Little Big Town -- The Road To Here
1907/3814: Little Big Town -- Tornado
1908/3814: Little Richard -- Here's Little Richard
1909/3814: Little Simz  -- Grey Area
1910/3814: Little Walter -- The Best of Little Walter
1911/3814: Liz Phair -- Exile in Guyville
1912/3814: Lizzo -- Cuz I Love You
1913/3814: Lloyd -- Street Love
1914/3814: Lloyd Banks -- The Hunger For More
1915/3814: Logic -- Bobby Tarantino
1916/3814: Logic -- Bobby Tarantino II
1917/3814: Logic -- Confessions Of A Dangerous Mind
1918/3814: Logic -- Everybody
1919/3814: Logic -- No Pressure
1920/3814: L

2081/3814: Megan Thee Stallion -- Suga
2082/3814: Meghan Trainor -- Thank You
2083/3814: Meghan Trainor -- Title
2084/3814: Meghan Trainor -- Title (EP)
2085/3814: Melanie Fiona -- The Bridge
2086/3814: Melanie Martinez -- Cry Baby
2087/3814: Memphis Bleek -- M.A.D.E.
2088/3814: Men At Work -- Business As Usual
2089/3814: MercyMe -- All That Is Within Me
2090/3814: MercyMe -- Almost There
2091/3814: MercyMe -- I Can Only Imagine: The Very Best Of MercyMe
2092/3814: MercyMe -- The Generous Mr. Lovewell 
2093/3814: MercyMe -- Welcome To The New
2094/3814: MercyMe -- inhale (exhale)
2095/3814: Merle Haggard -- Down Every Road
2096/3814: Metallica -- Death Magnetic
2097/3814: Metallica -- Hardwired...To Self-Destruct
2098/3814: Metallica -- Master Of Puppets
2099/3814: Metallica -- Master of Puppets
2100/3814: Metallica -- Metallica
2101/3814: Metallica -- Metallica 
2102/3814: Metallica -- Metallica ("The Black Album")
2103/3814: Metallica -- Ride the Lightning (Remastered)
2104/3814: Met

2275/3814: Neil Young With Crazy Horse -- Americana
2276/3814: Neko Case -- Hell-On
2277/3814: Neko Case -- The  Worse Things Get, the Harder I Fight, the Harder I Fight, the More I Love You
2278/3814: Nelly -- Country Grammar
2279/3814: Nelly -- Da Derrty Versions - The Reinvention
2280/3814: Nelly -- Nellyville
2281/3814: Nelly -- Suit
2282/3814: Nelly -- Sweat
2283/3814: Nelly -- Sweatsuit
2284/3814: Nelly Furtado -- Loose
2285/3814: Nelly Furtado -- Whoa, Nelly!
2286/3814: New Found Glory -- Catalyst
2287/3814: New Found Glory -- Sticks And Stones
2288/3814: New Kids On The Block -- Hangin' Tough
2289/3814: New Kids On The Block -- The Block
2290/3814: New Order -- Substance 1987
2291/3814: New York Dolls -- New York Dolls
2292/3814: Niall Horan -- Flicker
2293/3814: Nicholas Britell -- Cruella (Original Score)
2294/3814: Nick Cave & The Bad Seeds -- Skeleton Tree
2295/3814: Nick Cave & the Bad Seeds  -- Ghosteen
2296/3814: Nick Drake -- Pink Moon
2297/3814: Nick Jonas -- Last Year

2465/3814: Phoenix -- Wolfgang Amadeus Phoenix 
2466/3814: Phosphorescent  -- Muchacho
2467/3814: Pink Floyd -- The Dark Side Of The Moon
2468/3814: Pink Floyd -- The Dark Side of the Moon
2469/3814: Pink Floyd -- The Endless River
2470/3814: Pink Floyd -- The Piper at the Gates of Dawn
2471/3814: Pink Floyd -- The Wall
2472/3814: Pink Floyd -- Wish You Were Here
2473/3814: Pissed Jeans -- Honeys
2474/3814: Pissed Jeans -- King Of Jeans 
2475/3814: Pistol Annies -- Annie Up
2476/3814: Pistol Annies -- Hell On Heels
2477/3814: Pitbull -- Global Warming
2478/3814: Pitbull -- Globalization
2479/3814: Pitbull -- Planet Pit
2480/3814: Pixies -- Doolittle
2481/3814: Pixies -- Surfer Rosa
2482/3814: Plain White T's -- Every Second Counts
2483/3814: Playboi Carti -- Die Lit
2484/3814: Playboi Carti -- Playboi Carti
2485/3814: Plies -- DEFINITION OF REAL
2486/3814: Plies -- Da REAList
2487/3814: Plies -- The Real Testament
2488/3814: PnB Rock -- GTTM: Goin Thru The Motions
2489/3814: Polo G -- 

2656/3814: Rod Stewart -- Soulbook
2657/3814: Rod Stewart -- Stardust... The Great American Songbook Vol. III
2658/3814: Rod Stewart -- Still The Same... Great Rock Classics Of Our Time
2659/3814: Rod Stewart -- Thanks For The Memory... The Great American Songbook Vol. IV
2660/3814: Rod Stewart -- The Very Best Of Rod Stewart
2661/3814: Rod Wave -- Ghetto Gospel
2662/3814: Rod Wave -- Pray 4 Love
2663/3814: Roddy Ricch -- Feed Tha Streets II
2664/3814: Roddy Ricch -- Please Excuse Me For Being Antisocial
2665/3814: Rodney Atkins -- If You're Going Through Hell
2666/3814: Rolling Blackouts Coastal Fever -- Hope Downs
2667/3814: Romeo Santos -- Formula: Vol. 1
2668/3814: Romeo Santos -- Formula: Vol. 2
2669/3814: Rosanne Cash -- The List 
2670/3814: Rosanne Cash  -- The River & The Thread
2671/3814: Roxy Music -- For Your Pleasure
2672/3814: Roxy Music -- Siren
2673/3814: Ruben Studdard -- I Need An Angel
2674/3814: Ruben Studdard -- Soulful
2675/3814: Run D.M.C. -- Raising Hell
2676/381

2842/3814: Soundtrack -- Burlesque
2843/3814: Soundtrack -- Camp Rock
2844/3814: Soundtrack -- Camp Rock 2: The Final Jam
2845/3814: Soundtrack -- Cars
2846/3814: Soundtrack -- Chicago
2847/3814: Soundtrack -- Country Strong
2848/3814: Soundtrack -- Coyote Ugly
2849/3814: Soundtrack -- Cradle 2 The Grave
2850/3814: Soundtrack -- Crazy Heart
2851/3814: Soundtrack -- Daredevil: The Album
2852/3814: Soundtrack -- Descendants
2853/3814: Soundtrack -- Descendants 2
2854/3814: Soundtrack -- Dirty Dancing
2855/3814: Soundtrack -- Disney's Lilo & Stitch
2856/3814: Soundtrack -- Doctor Zhivago
2857/3814: Soundtrack -- Dreamgirls
2858/3814: Soundtrack -- Empire: Original Soundtrack From Season 1
2859/3814: Soundtrack -- Enchanted
2860/3814: Soundtrack -- Fifty Shades Darker
2861/3814: Soundtrack -- Fifty Shades Freed
2862/3814: Soundtrack -- Fifty Shades Of Grey
2863/3814: Soundtrack -- Freaky Friday
2864/3814: Soundtrack -- Frozen
2865/3814: Soundtrack -- Frozen II
2866/3814: Soundtrack -- Froz

3023/3814: Supertramp -- Breakfast In America
3024/3814: Surfaces -- Where The Light Is
3025/3814: Susan Boyle -- Home For Christmas
3026/3814: Susan Boyle -- I Dreamed A Dream
3027/3814: Susan Boyle -- Someone To Watch Over Me
3028/3814: Susan Boyle -- Standing Ovation: The Greatest Songs From The Stage
3029/3814: Susan Boyle -- The Gift
3030/3814: Swans -- My Father Will Guide Me Up a Rope to the Sky
3031/3814: Swans -- The Seer
3032/3814: Swans -- To Be Kind
3033/3814: Switchfoot -- The Beautiful Letdown
3034/3814: System Of A Down -- Hypnotize
3035/3814: System Of A Down -- Mezmerize
3036/3814: System Of A Down -- Steal This Album!
3037/3814: System Of A Down -- Toxicity
3038/3814: T-Pain -- Epiphany
3039/3814: T-Pain -- Rappa Ternt Sanga
3040/3814: T-Pain -- THR33 RINGZ
3041/3814: T. Rex -- Electric Warrior
3042/3814: T.I. -- King
3043/3814: T.I. -- No Mercy
3044/3814: T.I. -- Paper Trail
3045/3814: T.I. -- Paperwork
3046/3814: T.I. -- T.I. Vs T.I.P.
3047/3814: T.I. -- Trap Muzik


3209/3814: The Killers -- Battle Born
3210/3814: The Killers -- Day & Age
3211/3814: The Killers -- Hot Fuss
3212/3814: The Killers -- Sam's Town
3213/3814: The Killers -- Sawdust
3214/3814: The Kinks -- Something Else by The Kinks
3215/3814: The Kinks -- The Kink Kronikles
3216/3814: The Kinks -- The Kinks Are The Village Green Preservation Society
3217/3814: The Knife -- Shaking the Habitual
3218/3814: The Lonely Island -- Incredibad
3219/3814: The Lonely Island -- Turtleneck & Chain
3220/3814: The Lumineers -- Cleopatra
3221/3814: The Lumineers -- The Lumineers
3222/3814: The Magnetic Fields -- 69 Love Songs
3223/3814: The Mamas and the Papas -- If You Can Believe Your Eyes and Ears
3224/3814: The Mars Volta -- Frances The Mute
3225/3814: The Men -- Open Your Heart
3226/3814: The Meters -- Look-Ka Py Py
3227/3814: The Meters -- Rejuvenation
3228/3814: The Modern Lovers -- The Modern Lovers
3229/3814: The Monkees -- More Of The Monkees
3230/3814: The Monkees -- The Monkees
3231/3814:

3391/3814: Tindersticks -- The    Waiting Room
3392/3814: Titus Andronicus -- The Monitor
3393/3814: Toby Keith -- 35 Biggest Hits
3394/3814: Toby Keith -- A Toby Keith Classic Christmas: Volumes One & Two
3395/3814: Toby Keith -- American Ride
3396/3814: Toby Keith -- Big Dog Daddy
3397/3814: Toby Keith -- Bullets In The Gun
3398/3814: Toby Keith -- Clancy's Tavern
3399/3814: Toby Keith -- Greatest Hits 2
3400/3814: Toby Keith -- Honkytonk University
3401/3814: Toby Keith -- Pull My Chain
3402/3814: Toby Keith -- Shock'n Y'All
3403/3814: Toby Keith -- That Don't Make Me A Bad Guy
3404/3814: Toby Keith -- Unleashed
3405/3814: Toby Keith -- White Trash With Money
3406/3814: Todd Rundgren -- Something/Anything?
3407/3814: Todd Snider -- The Excitement Plan 
3408/3814: Tom Petty -- Highway Companion
3409/3814: Tom Petty & The Heartbreakers -- Greatest Hits
3410/3814: Tom Petty And The Heartbreakers -- Greatest Hits
3411/3814: Tom Petty And The Heartbreakers -- Hypnotic Eye
3412/3814: Tom 

3565/3814: Various Artists -- Music From Baz Luhrmann's Film Moulin Rouge (Original Motion Picture Soundtrack)
3566/3814: Various Artists -- NOW #1's
3567/3814: Various Artists -- NOW 20
3568/3814: Various Artists -- NOW 21
3569/3814: Various Artists -- NOW 22
3570/3814: Various Artists -- NOW 23
3571/3814: Various Artists -- NOW 24
3572/3814: Various Artists -- NOW 25
3573/3814: Various Artists -- NOW 26
3574/3814: Various Artists -- NOW 27
3575/3814: Various Artists -- NOW 28
3576/3814: Various Artists -- NOW 29
3577/3814: Various Artists -- NOW 30
3578/3814: Various Artists -- NOW 31
3579/3814: Various Artists -- NOW 32
3580/3814: Various Artists -- NOW 33
3581/3814: Various Artists -- NOW 34
3582/3814: Various Artists -- NOW 35
3583/3814: Various Artists -- NOW 36
3584/3814: Various Artists -- NOW 37
3585/3814: Various Artists -- NOW 38
3586/3814: Various Artists -- NOW 39
3587/3814: Various Artists -- NOW 40
3588/3814: Various Artists -- NOW 41
3589/3814: Various Artists -- NOW 42

3744/3814: YNW Melly -- I Am You
3745/3814: YNW Melly -- Melly vs. Melvin
3746/3814: YNW Melly -- We All Shine
3747/3814: Yeah Yeah Yeahs -- It's Blitz 
3748/3814: Yeah Yeah Yeahs -- It's Blitz!
3749/3814: Yellowcard -- Ocean Avenue
3750/3814: Ying Yang Twins -- Me & My Brother
3751/3814: Ying Yang Twins -- U.S.A.: United State Of Atlanta
3752/3814: Yo Gotti -- I Am
3753/3814: Yo Gotti -- The Art Of Hustle
3754/3814: Yo La Tengo -- Fade
3755/3814: Yo-Yo Ma -- Yo-Yo Ma & Friends: Songs Of Joy & Peace
3756/3814: Yoko Ono Plastic Ono Band -- Between My Head and the Sky 
3757/3814: Yolanda Adams -- Believe
3758/3814: Young Buck -- Buck The World
3759/3814: Young Buck -- Straight Outta Ca$hville
3760/3814: Young Fathers -- Cocoa Sugar
3761/3814: Young Fathers  -- White Men Are Black Men Too
3762/3814: Young Jeezy -- Let's Get It: Thug Motivation 101
3763/3814: Young Jeezy -- TM:103: Hustlerz Ambition
3764/3814: Young Jeezy -- The Inspiration
3765/3814: Young Jeezy -- The Recession
3766/3814

In [142]:
spotify_albums_df = pd.DataFrame(albums)
print("%d distinct ids pulled for %d original albums" % (len(spotify_albums_df), 
                                                         len(combined_albumlist)))
spotify_albums_df.head()

8410 distinct ids pulled for 3814 original albums


Unnamed: 0,query_artist,query_title,id,name,artist,artist_id,n_artists,type,release_date,total_tracks
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,Mandatory Fun,"""Weird Al"" Yankovic",1bDWGdIC2hardyt55nlQgG,1,album,2014-07-15,12
1,'N Sync,'N Sync,20RMokVwJ2wjQ0s8FOdOFC,No Strings Attached,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,2000-03-21,12
2,'N Sync,'N Sync,0CADmCXbIx4F9m6TBwLtFd,'N Sync,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,1997-05-26,13
3,'N Sync,'N Sync,7K5qlneuWF1CcY6ERzwkLB,'N Sync UK Version,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,1997,14
4,'N Sync,'N Sync,3bhFoH4PFnY4ifK4981U8X,The Essential *NSYNC,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,2014-07-29,34


In [147]:
# for each identified album, 
# pull in the popularity so we can (probably) take the most popular when there are multiple
spotify_album_ids = list(set(spotify_albums_df['id'].tolist()))
spotify_albums_pop = [ ]
for ix in range(0, len(spotify_album_ids), 10):
    print("ids %d-%d (out of %d)" % (ix, ix + 9, len(spotify_album_ids)))
    res = sp.albums(spotify_album_ids[ix:ix+10])
    assert len(res['albums']) == 10 or ix + 10 > len(spotify_album_ids)
    spotify_albums_pop.extend([{'id': a['id'], 
                                'popularity': a['popularity']} for a in res['albums']])

ids 0-9 (out of 8410)
ids 10-19 (out of 8410)
ids 20-29 (out of 8410)
ids 30-39 (out of 8410)
ids 40-49 (out of 8410)
ids 50-59 (out of 8410)
ids 60-69 (out of 8410)
ids 70-79 (out of 8410)
ids 80-89 (out of 8410)
ids 90-99 (out of 8410)
ids 100-109 (out of 8410)
ids 110-119 (out of 8410)
ids 120-129 (out of 8410)
ids 130-139 (out of 8410)
ids 140-149 (out of 8410)
ids 150-159 (out of 8410)
ids 160-169 (out of 8410)
ids 170-179 (out of 8410)
ids 180-189 (out of 8410)
ids 190-199 (out of 8410)
ids 200-209 (out of 8410)
ids 210-219 (out of 8410)
ids 220-229 (out of 8410)
ids 230-239 (out of 8410)
ids 240-249 (out of 8410)
ids 250-259 (out of 8410)
ids 260-269 (out of 8410)
ids 270-279 (out of 8410)
ids 280-289 (out of 8410)
ids 290-299 (out of 8410)
ids 300-309 (out of 8410)
ids 310-319 (out of 8410)
ids 320-329 (out of 8410)
ids 330-339 (out of 8410)
ids 340-349 (out of 8410)
ids 350-359 (out of 8410)
ids 360-369 (out of 8410)
ids 370-379 (out of 8410)
ids 380-389 (out of 8410)
ids 390-

ids 3020-3029 (out of 8410)
ids 3030-3039 (out of 8410)
ids 3040-3049 (out of 8410)
ids 3050-3059 (out of 8410)
ids 3060-3069 (out of 8410)
ids 3070-3079 (out of 8410)
ids 3080-3089 (out of 8410)
ids 3090-3099 (out of 8410)
ids 3100-3109 (out of 8410)
ids 3110-3119 (out of 8410)
ids 3120-3129 (out of 8410)
ids 3130-3139 (out of 8410)
ids 3140-3149 (out of 8410)
ids 3150-3159 (out of 8410)
ids 3160-3169 (out of 8410)
ids 3170-3179 (out of 8410)
ids 3180-3189 (out of 8410)
ids 3190-3199 (out of 8410)
ids 3200-3209 (out of 8410)
ids 3210-3219 (out of 8410)
ids 3220-3229 (out of 8410)
ids 3230-3239 (out of 8410)
ids 3240-3249 (out of 8410)
ids 3250-3259 (out of 8410)
ids 3260-3269 (out of 8410)
ids 3270-3279 (out of 8410)
ids 3280-3289 (out of 8410)
ids 3290-3299 (out of 8410)
ids 3300-3309 (out of 8410)
ids 3310-3319 (out of 8410)
ids 3320-3329 (out of 8410)
ids 3330-3339 (out of 8410)
ids 3340-3349 (out of 8410)
ids 3350-3359 (out of 8410)
ids 3360-3369 (out of 8410)
ids 3370-3379 (out o

ids 5950-5959 (out of 8410)
ids 5960-5969 (out of 8410)
ids 5970-5979 (out of 8410)
ids 5980-5989 (out of 8410)
ids 5990-5999 (out of 8410)
ids 6000-6009 (out of 8410)
ids 6010-6019 (out of 8410)
ids 6020-6029 (out of 8410)
ids 6030-6039 (out of 8410)
ids 6040-6049 (out of 8410)
ids 6050-6059 (out of 8410)
ids 6060-6069 (out of 8410)
ids 6070-6079 (out of 8410)
ids 6080-6089 (out of 8410)
ids 6090-6099 (out of 8410)
ids 6100-6109 (out of 8410)
ids 6110-6119 (out of 8410)
ids 6120-6129 (out of 8410)
ids 6130-6139 (out of 8410)
ids 6140-6149 (out of 8410)
ids 6150-6159 (out of 8410)
ids 6160-6169 (out of 8410)
ids 6170-6179 (out of 8410)
ids 6180-6189 (out of 8410)
ids 6190-6199 (out of 8410)
ids 6200-6209 (out of 8410)
ids 6210-6219 (out of 8410)
ids 6220-6229 (out of 8410)
ids 6230-6239 (out of 8410)
ids 6240-6249 (out of 8410)
ids 6250-6259 (out of 8410)
ids 6260-6269 (out of 8410)
ids 6270-6279 (out of 8410)
ids 6280-6289 (out of 8410)
ids 6290-6299 (out of 8410)
ids 6300-6309 (out o

In [149]:
spotify_albums_df_w_pop = spotify_albums_df.merge(pd.DataFrame(spotify_albums_pop), 
                                                  how = 'left', on = 'id')
spotify_albums_df_w_pop.head()

Unnamed: 0,query_artist,query_title,id,name,artist,artist_id,n_artists,type,release_date,total_tracks,popularity
0,"""Weird Al"" Yankovic",Mandatory Fun,36jlZKG1sNZQA2HbWdYveV,Mandatory Fun,"""Weird Al"" Yankovic",1bDWGdIC2hardyt55nlQgG,1,album,2014-07-15,12,51
1,'N Sync,'N Sync,20RMokVwJ2wjQ0s8FOdOFC,No Strings Attached,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,2000-03-21,12,71
2,'N Sync,'N Sync,0CADmCXbIx4F9m6TBwLtFd,'N Sync,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,1997-05-26,13,58
3,'N Sync,'N Sync,7K5qlneuWF1CcY6ERzwkLB,'N Sync UK Version,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,1997,14,63
4,'N Sync,'N Sync,3bhFoH4PFnY4ifK4981U8X,The Essential *NSYNC,*NSYNC,6Ff53KvcvAj5U7Z1vojB5o,1,album,2014-07-29,34,51


In [154]:
spotify_albums_df_w_pop.to_csv("../data/02_spotify_album_list_w_popularity_%s.csv" %
                               date.today().strftime("%Y%m%d"), 
                               index = False)

In [155]:
spotify_albums_df_w_pop = pd.read_csv("../data/02_spotify_album_list_w_popularity_%s.csv" % 
                                      date.today().strftime("%Y%m%d"))

In [187]:
_album_tracks = [ ]

for ix in range(0, len(spotify_album_ids), 10):
    print("pulling data for album ids %d-%d (of %d)" % (ix, ix + 9, len(spotify_album_ids)))
    
    # query 10 ids at a time
    ix_ids = spotify_album_ids[ix:ix+10]
    res = sp.albums(ix_ids)
    
    # parse out track information
    # store associated album_id
    _album_tracks.extend([a['tracks'] for a in res['albums']])
    list(map(lambda t, i: t.update({'album_id': i}), _album_tracks[-len(ix_ids):], ix_ids))
    
# since result tracks are limited to 50 results, 
# go back and get tracks for albums with >50 tracks
for ix in range(len(_album_tracks)):
    track_el = _album_tracks[ix]
    if track_el["total"] > 50:
        print("pulling extra tracks for ix %d -- n = %d" % (ix, track_el["total"]))
        extra_res = sp.next(track_el)
        _album_tracks[ix]['items'].extend(extra_res['items'])
        while extra_res['next']:
            extra_res = sp.next(extra_res)
            _album_tracks[ix]['items'].extend(extra_res['items']) 

pulling data for album ids 0-9 (of 7948)
pulling data for album ids 10-19 (of 7948)
pulling data for album ids 20-29 (of 7948)
pulling data for album ids 30-39 (of 7948)
pulling data for album ids 40-49 (of 7948)
pulling data for album ids 50-59 (of 7948)
pulling data for album ids 60-69 (of 7948)
pulling data for album ids 70-79 (of 7948)
pulling data for album ids 80-89 (of 7948)
pulling data for album ids 90-99 (of 7948)
pulling data for album ids 100-109 (of 7948)
pulling data for album ids 110-119 (of 7948)
pulling data for album ids 120-129 (of 7948)
pulling data for album ids 130-139 (of 7948)
pulling data for album ids 140-149 (of 7948)
pulling data for album ids 150-159 (of 7948)
pulling data for album ids 160-169 (of 7948)
pulling data for album ids 170-179 (of 7948)
pulling data for album ids 180-189 (of 7948)
pulling data for album ids 190-199 (of 7948)
pulling data for album ids 200-209 (of 7948)
pulling data for album ids 210-219 (of 7948)
pulling data for album ids 220-2

pulling data for album ids 1810-1819 (of 7948)
pulling data for album ids 1820-1829 (of 7948)
pulling data for album ids 1830-1839 (of 7948)
pulling data for album ids 1840-1849 (of 7948)
pulling data for album ids 1850-1859 (of 7948)
pulling data for album ids 1860-1869 (of 7948)
pulling data for album ids 1870-1879 (of 7948)
pulling data for album ids 1880-1889 (of 7948)
pulling data for album ids 1890-1899 (of 7948)
pulling data for album ids 1900-1909 (of 7948)
pulling data for album ids 1910-1919 (of 7948)
pulling data for album ids 1920-1929 (of 7948)
pulling data for album ids 1930-1939 (of 7948)
pulling data for album ids 1940-1949 (of 7948)
pulling data for album ids 1950-1959 (of 7948)
pulling data for album ids 1960-1969 (of 7948)
pulling data for album ids 1970-1979 (of 7948)
pulling data for album ids 1980-1989 (of 7948)
pulling data for album ids 1990-1999 (of 7948)
pulling data for album ids 2000-2009 (of 7948)
pulling data for album ids 2010-2019 (of 7948)
pulling data 

pulling data for album ids 3560-3569 (of 7948)
pulling data for album ids 3570-3579 (of 7948)
pulling data for album ids 3580-3589 (of 7948)
pulling data for album ids 3590-3599 (of 7948)
pulling data for album ids 3600-3609 (of 7948)
pulling data for album ids 3610-3619 (of 7948)
pulling data for album ids 3620-3629 (of 7948)
pulling data for album ids 3630-3639 (of 7948)
pulling data for album ids 3640-3649 (of 7948)
pulling data for album ids 3650-3659 (of 7948)
pulling data for album ids 3660-3669 (of 7948)
pulling data for album ids 3670-3679 (of 7948)
pulling data for album ids 3680-3689 (of 7948)
pulling data for album ids 3690-3699 (of 7948)
pulling data for album ids 3700-3709 (of 7948)
pulling data for album ids 3710-3719 (of 7948)
pulling data for album ids 3720-3729 (of 7948)
pulling data for album ids 3730-3739 (of 7948)
pulling data for album ids 3740-3749 (of 7948)
pulling data for album ids 3750-3759 (of 7948)
pulling data for album ids 3760-3769 (of 7948)
pulling data 

pulling data for album ids 5320-5329 (of 7948)
pulling data for album ids 5330-5339 (of 7948)
pulling data for album ids 5340-5349 (of 7948)
pulling data for album ids 5350-5359 (of 7948)
pulling data for album ids 5360-5369 (of 7948)
pulling data for album ids 5370-5379 (of 7948)
pulling data for album ids 5380-5389 (of 7948)
pulling data for album ids 5390-5399 (of 7948)
pulling data for album ids 5400-5409 (of 7948)
pulling data for album ids 5410-5419 (of 7948)
pulling data for album ids 5420-5429 (of 7948)
pulling data for album ids 5430-5439 (of 7948)
pulling data for album ids 5440-5449 (of 7948)
pulling data for album ids 5450-5459 (of 7948)
pulling data for album ids 5460-5469 (of 7948)
pulling data for album ids 5470-5479 (of 7948)
pulling data for album ids 5480-5489 (of 7948)
pulling data for album ids 5490-5499 (of 7948)
pulling data for album ids 5500-5509 (of 7948)
pulling data for album ids 5510-5519 (of 7948)
pulling data for album ids 5520-5529 (of 7948)
pulling data 

pulling data for album ids 7070-7079 (of 7948)
pulling data for album ids 7080-7089 (of 7948)
pulling data for album ids 7090-7099 (of 7948)
pulling data for album ids 7100-7109 (of 7948)
pulling data for album ids 7110-7119 (of 7948)
pulling data for album ids 7120-7129 (of 7948)
pulling data for album ids 7130-7139 (of 7948)
pulling data for album ids 7140-7149 (of 7948)
pulling data for album ids 7150-7159 (of 7948)
pulling data for album ids 7160-7169 (of 7948)
pulling data for album ids 7170-7179 (of 7948)
pulling data for album ids 7180-7189 (of 7948)
pulling data for album ids 7190-7199 (of 7948)
pulling data for album ids 7200-7209 (of 7948)
pulling data for album ids 7210-7219 (of 7948)
pulling data for album ids 7220-7229 (of 7948)
pulling data for album ids 7230-7239 (of 7948)
pulling data for album ids 7240-7249 (of 7948)
pulling data for album ids 7250-7259 (of 7948)
pulling data for album ids 7260-7269 (of 7948)
pulling data for album ids 7270-7279 (of 7948)
pulling data 

pulling extra tracks for ix 7296 -- n = 66
pulling extra tracks for ix 7335 -- n = 52
pulling extra tracks for ix 7342 -- n = 60
pulling extra tracks for ix 7378 -- n = 100
pulling extra tracks for ix 7414 -- n = 64
pulling extra tracks for ix 7463 -- n = 80
pulling extra tracks for ix 7513 -- n = 68
pulling extra tracks for ix 7541 -- n = 60
pulling extra tracks for ix 7638 -- n = 81
pulling extra tracks for ix 7905 -- n = 76


In [198]:
# flatten list of track lists
album_tracks = [t for ts in [ts['items'] for ts in _album_tracks] for t in ts]
print("%d total tracks from %d albums" % (len(album_tracks), len(_album_tracks)))

114188 total tracks from 7948 albums


In [194]:
# confirm that pulled as many tracks as expected
len(album_tracks) == sum([el["total"] for el in _album_tracks])

True

In [195]:
# get list of track_ids (need to pull information seperately to get popularity)
track_ids = [t['id'] for t in album_tracks]

In [196]:
# pull in spotify track information
tracks = [ ]
for ix in range(0, len(track_ids), 50):
    print("pulling data for tracks ids %d-%d (out of %d)" % (ix, ix + 49, len(track_ids)))
    
    # query 50 ids at a time
    ix_ids = track_ids[ix:ix+50]
    res = sp.tracks(ix_ids)
    
    # parse out album information
    tracks.extend([{'track_id': t['id'],
                    'name': t['name'],
                    'artist': '|'.join([artist['name'] for artist in t['artists']]),
                    'artist_id': '|'.join([artist['id'] for artist in t['artists']]),
                    'album': t['album']['name'],
                    'album_id': t['album']['id'],
                    'popularity': t['popularity'],
                    'duration_ms': t['duration_ms'],
                    'preview_url': t['preview_url'],
                    'track_number': t['track_number'],
                    'type': t['type']} for t in res['tracks']])

tracks_df = pd.DataFrame(tracks)

pulling data for tracks ids 0-49
pulling data for tracks ids 50-99
pulling data for tracks ids 100-149
pulling data for tracks ids 150-199
pulling data for tracks ids 200-249
pulling data for tracks ids 250-299
pulling data for tracks ids 300-349
pulling data for tracks ids 350-399
pulling data for tracks ids 400-449
pulling data for tracks ids 450-499
pulling data for tracks ids 500-549
pulling data for tracks ids 550-599
pulling data for tracks ids 600-649
pulling data for tracks ids 650-699
pulling data for tracks ids 700-749
pulling data for tracks ids 750-799
pulling data for tracks ids 800-849
pulling data for tracks ids 850-899
pulling data for tracks ids 900-949
pulling data for tracks ids 950-999
pulling data for tracks ids 1000-1049
pulling data for tracks ids 1050-1099
pulling data for tracks ids 1100-1149
pulling data for tracks ids 1150-1199
pulling data for tracks ids 1200-1249
pulling data for tracks ids 1250-1299
pulling data for tracks ids 1300-1349
pulling data for tr

pulling data for tracks ids 10850-10899
pulling data for tracks ids 10900-10949
pulling data for tracks ids 10950-10999
pulling data for tracks ids 11000-11049
pulling data for tracks ids 11050-11099
pulling data for tracks ids 11100-11149
pulling data for tracks ids 11150-11199
pulling data for tracks ids 11200-11249
pulling data for tracks ids 11250-11299
pulling data for tracks ids 11300-11349
pulling data for tracks ids 11350-11399
pulling data for tracks ids 11400-11449
pulling data for tracks ids 11450-11499
pulling data for tracks ids 11500-11549
pulling data for tracks ids 11550-11599
pulling data for tracks ids 11600-11649
pulling data for tracks ids 11650-11699
pulling data for tracks ids 11700-11749
pulling data for tracks ids 11750-11799
pulling data for tracks ids 11800-11849
pulling data for tracks ids 11850-11899
pulling data for tracks ids 11900-11949
pulling data for tracks ids 11950-11999
pulling data for tracks ids 12000-12049
pulling data for tracks ids 12050-12099


pulling data for tracks ids 21150-21199
pulling data for tracks ids 21200-21249
pulling data for tracks ids 21250-21299
pulling data for tracks ids 21300-21349
pulling data for tracks ids 21350-21399
pulling data for tracks ids 21400-21449
pulling data for tracks ids 21450-21499
pulling data for tracks ids 21500-21549
pulling data for tracks ids 21550-21599
pulling data for tracks ids 21600-21649
pulling data for tracks ids 21650-21699
pulling data for tracks ids 21700-21749
pulling data for tracks ids 21750-21799
pulling data for tracks ids 21800-21849
pulling data for tracks ids 21850-21899
pulling data for tracks ids 21900-21949
pulling data for tracks ids 21950-21999
pulling data for tracks ids 22000-22049
pulling data for tracks ids 22050-22099
pulling data for tracks ids 22100-22149
pulling data for tracks ids 22150-22199
pulling data for tracks ids 22200-22249
pulling data for tracks ids 22250-22299
pulling data for tracks ids 22300-22349
pulling data for tracks ids 22350-22399


pulling data for tracks ids 31450-31499
pulling data for tracks ids 31500-31549
pulling data for tracks ids 31550-31599
pulling data for tracks ids 31600-31649
pulling data for tracks ids 31650-31699
pulling data for tracks ids 31700-31749
pulling data for tracks ids 31750-31799
pulling data for tracks ids 31800-31849
pulling data for tracks ids 31850-31899
pulling data for tracks ids 31900-31949
pulling data for tracks ids 31950-31999
pulling data for tracks ids 32000-32049
pulling data for tracks ids 32050-32099
pulling data for tracks ids 32100-32149
pulling data for tracks ids 32150-32199
pulling data for tracks ids 32200-32249
pulling data for tracks ids 32250-32299
pulling data for tracks ids 32300-32349
pulling data for tracks ids 32350-32399
pulling data for tracks ids 32400-32449
pulling data for tracks ids 32450-32499
pulling data for tracks ids 32500-32549
pulling data for tracks ids 32550-32599
pulling data for tracks ids 32600-32649
pulling data for tracks ids 32650-32699


pulling data for tracks ids 41700-41749
pulling data for tracks ids 41750-41799
pulling data for tracks ids 41800-41849
pulling data for tracks ids 41850-41899
pulling data for tracks ids 41900-41949
pulling data for tracks ids 41950-41999
pulling data for tracks ids 42000-42049
pulling data for tracks ids 42050-42099
pulling data for tracks ids 42100-42149
pulling data for tracks ids 42150-42199
pulling data for tracks ids 42200-42249
pulling data for tracks ids 42250-42299
pulling data for tracks ids 42300-42349
pulling data for tracks ids 42350-42399
pulling data for tracks ids 42400-42449
pulling data for tracks ids 42450-42499
pulling data for tracks ids 42500-42549
pulling data for tracks ids 42550-42599
pulling data for tracks ids 42600-42649
pulling data for tracks ids 42650-42699
pulling data for tracks ids 42700-42749
pulling data for tracks ids 42750-42799
pulling data for tracks ids 42800-42849
pulling data for tracks ids 42850-42899
pulling data for tracks ids 42900-42949


pulling data for tracks ids 52000-52049
pulling data for tracks ids 52050-52099
pulling data for tracks ids 52100-52149
pulling data for tracks ids 52150-52199
pulling data for tracks ids 52200-52249
pulling data for tracks ids 52250-52299
pulling data for tracks ids 52300-52349
pulling data for tracks ids 52350-52399
pulling data for tracks ids 52400-52449
pulling data for tracks ids 52450-52499
pulling data for tracks ids 52500-52549
pulling data for tracks ids 52550-52599
pulling data for tracks ids 52600-52649
pulling data for tracks ids 52650-52699
pulling data for tracks ids 52700-52749
pulling data for tracks ids 52750-52799
pulling data for tracks ids 52800-52849
pulling data for tracks ids 52850-52899
pulling data for tracks ids 52900-52949
pulling data for tracks ids 52950-52999
pulling data for tracks ids 53000-53049
pulling data for tracks ids 53050-53099
pulling data for tracks ids 53100-53149
pulling data for tracks ids 53150-53199
pulling data for tracks ids 53200-53249


pulling data for tracks ids 62250-62299
pulling data for tracks ids 62300-62349
pulling data for tracks ids 62350-62399
pulling data for tracks ids 62400-62449
pulling data for tracks ids 62450-62499
pulling data for tracks ids 62500-62549
pulling data for tracks ids 62550-62599
pulling data for tracks ids 62600-62649
pulling data for tracks ids 62650-62699
pulling data for tracks ids 62700-62749
pulling data for tracks ids 62750-62799
pulling data for tracks ids 62800-62849
pulling data for tracks ids 62850-62899
pulling data for tracks ids 62900-62949
pulling data for tracks ids 62950-62999
pulling data for tracks ids 63000-63049
pulling data for tracks ids 63050-63099
pulling data for tracks ids 63100-63149
pulling data for tracks ids 63150-63199
pulling data for tracks ids 63200-63249
pulling data for tracks ids 63250-63299
pulling data for tracks ids 63300-63349
pulling data for tracks ids 63350-63399
pulling data for tracks ids 63400-63449
pulling data for tracks ids 63450-63499


pulling data for tracks ids 72550-72599
pulling data for tracks ids 72600-72649
pulling data for tracks ids 72650-72699
pulling data for tracks ids 72700-72749
pulling data for tracks ids 72750-72799
pulling data for tracks ids 72800-72849
pulling data for tracks ids 72850-72899
pulling data for tracks ids 72900-72949
pulling data for tracks ids 72950-72999
pulling data for tracks ids 73000-73049
pulling data for tracks ids 73050-73099
pulling data for tracks ids 73100-73149
pulling data for tracks ids 73150-73199
pulling data for tracks ids 73200-73249
pulling data for tracks ids 73250-73299
pulling data for tracks ids 73300-73349
pulling data for tracks ids 73350-73399
pulling data for tracks ids 73400-73449
pulling data for tracks ids 73450-73499
pulling data for tracks ids 73500-73549
pulling data for tracks ids 73550-73599
pulling data for tracks ids 73600-73649
pulling data for tracks ids 73650-73699
pulling data for tracks ids 73700-73749
pulling data for tracks ids 73750-73799


pulling data for tracks ids 82800-82849
pulling data for tracks ids 82850-82899
pulling data for tracks ids 82900-82949
pulling data for tracks ids 82950-82999
pulling data for tracks ids 83000-83049
pulling data for tracks ids 83050-83099
pulling data for tracks ids 83100-83149
pulling data for tracks ids 83150-83199
pulling data for tracks ids 83200-83249
pulling data for tracks ids 83250-83299
pulling data for tracks ids 83300-83349
pulling data for tracks ids 83350-83399
pulling data for tracks ids 83400-83449
pulling data for tracks ids 83450-83499
pulling data for tracks ids 83500-83549
pulling data for tracks ids 83550-83599
pulling data for tracks ids 83600-83649
pulling data for tracks ids 83650-83699
pulling data for tracks ids 83700-83749
pulling data for tracks ids 83750-83799
pulling data for tracks ids 83800-83849
pulling data for tracks ids 83850-83899
pulling data for tracks ids 83900-83949
pulling data for tracks ids 83950-83999
pulling data for tracks ids 84000-84049


pulling data for tracks ids 93100-93149
pulling data for tracks ids 93150-93199
pulling data for tracks ids 93200-93249
pulling data for tracks ids 93250-93299
pulling data for tracks ids 93300-93349
pulling data for tracks ids 93350-93399
pulling data for tracks ids 93400-93449
pulling data for tracks ids 93450-93499
pulling data for tracks ids 93500-93549
pulling data for tracks ids 93550-93599
pulling data for tracks ids 93600-93649
pulling data for tracks ids 93650-93699
pulling data for tracks ids 93700-93749
pulling data for tracks ids 93750-93799
pulling data for tracks ids 93800-93849
pulling data for tracks ids 93850-93899
pulling data for tracks ids 93900-93949
pulling data for tracks ids 93950-93999
pulling data for tracks ids 94000-94049
pulling data for tracks ids 94050-94099
pulling data for tracks ids 94100-94149
pulling data for tracks ids 94150-94199
pulling data for tracks ids 94200-94249
pulling data for tracks ids 94250-94299
pulling data for tracks ids 94300-94349


pulling data for tracks ids 103250-103299
pulling data for tracks ids 103300-103349
pulling data for tracks ids 103350-103399
pulling data for tracks ids 103400-103449
pulling data for tracks ids 103450-103499
pulling data for tracks ids 103500-103549
pulling data for tracks ids 103550-103599
pulling data for tracks ids 103600-103649
pulling data for tracks ids 103650-103699
pulling data for tracks ids 103700-103749
pulling data for tracks ids 103750-103799
pulling data for tracks ids 103800-103849
pulling data for tracks ids 103850-103899
pulling data for tracks ids 103900-103949
pulling data for tracks ids 103950-103999
pulling data for tracks ids 104000-104049
pulling data for tracks ids 104050-104099
pulling data for tracks ids 104100-104149
pulling data for tracks ids 104150-104199
pulling data for tracks ids 104200-104249
pulling data for tracks ids 104250-104299
pulling data for tracks ids 104300-104349
pulling data for tracks ids 104350-104399
pulling data for tracks ids 104400

pulling data for tracks ids 113050-113099
pulling data for tracks ids 113100-113149
pulling data for tracks ids 113150-113199
pulling data for tracks ids 113200-113249
pulling data for tracks ids 113250-113299
pulling data for tracks ids 113300-113349
pulling data for tracks ids 113350-113399
pulling data for tracks ids 113400-113449
pulling data for tracks ids 113450-113499
pulling data for tracks ids 113500-113549
pulling data for tracks ids 113550-113599
pulling data for tracks ids 113600-113649
pulling data for tracks ids 113650-113699
pulling data for tracks ids 113700-113749
pulling data for tracks ids 113750-113799
pulling data for tracks ids 113800-113849
pulling data for tracks ids 113850-113899
pulling data for tracks ids 113900-113949
pulling data for tracks ids 113950-113999
pulling data for tracks ids 114000-114049
pulling data for tracks ids 114050-114099
pulling data for tracks ids 114100-114149
pulling data for tracks ids 114150-114199


In [197]:
# confirm that pulled as many tracks as expected
len(tracks_df) == len(album_tracks)

True

In [199]:
tracks_df.to_csv("../data/03_spotify_album_tracks_w_popularity_%s.csv" %
                 date.today().strftime("%Y%m%d"), 
                 index = False)

In [200]:
tracks_df = pd.read_csv("../data/03_spotify_album_tracks_w_popularity_%s.csv" % 
                        date.today().strftime("%Y%m%d"))