# Spotify Project

**Authors:** Brian Karstens and Christian Kleronomos  
**Last Modified:** 12-09-2024 (created: 11-17-2024)

**Description:** This project will use a Kaggle dataset containing data of the most popular songs from the 2010's and scrape data from both Spotify using the web developer and Spotify song data themselves.  First the .csv file is loaded in as a dataframe.  Using the songs from the data frame, Spotipy API will find the track ids and generate a url to get to the spotify webpage.  The remainder of this notebook will scrape each individual songs url to retrieve the number of streams the song has.

<br>

Import Libraries:

In [7]:
!pip install spotipy



In [1]:
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from selenium import webdriver
from selenium.webdriver.common.by import By 
from selenium.webdriver.chrome.service import Service 
from webdriver_manager.chrome import ChromeDriverManager 
import time
import random

In [29]:
songs = pd.read_csv("Spotify_2010_-_2019_Top_100_Songs.csv", encoding='UTF-8')
display(songs)

Unnamed: 0,title,artist,top_genre,year_released,added,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop,top_year,artist_type
0,STARSTRUKK (feat. Katy Perry),3OH!3,dance pop,2009,2022‑02‑17,140,81,61,-6,23,23,203,0,6,70,2010,Duo
1,My First Kiss (feat. Ke$ha),3OH!3,dance pop,2010,2022‑02‑17,138,89,68,-4,36,83,192,1,8,68,2010,Duo
2,I Need A Dollar,Aloe Blacc,pop soul,2010,2022‑02‑17,95,48,84,-7,9,96,243,20,3,72,2010,Solo
3,Airplanes (feat. Hayley Williams of Paramore),B.o.B,atl hip hop,2010,2022‑02‑17,93,87,66,-4,4,38,180,11,12,80,2010,Solo
4,Nothin' on You (feat. Bruno Mars),B.o.B,atl hip hop,2010,2022‑02‑17,104,85,69,-6,9,74,268,39,5,79,2010,Solo
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,SICKO MODE,Travis Scott,hip hop,2018,2020‑06‑22,155,73,83,-4,12,45,313,1,22,86,2019,Solo
996,EARFQUAKE,"Tyler, The Creator",hip hop,2019,2020‑06‑22,80,50,55,-9,80,41,190,23,7,85,2019,Solo
997,Boasty (feat. Idris Elba),Wiley,grime,2019,2020‑06‑22,103,77,89,-5,9,46,177,1,7,68,2019,Solo
998,Strike a Pose (feat. Aitch),Young T & Bugsey,afroswing,2019,2020‑08‑20,138,58,53,-6,10,59,214,1,10,67,2019,Duo


In [33]:
client_id = 'e506a46eda964804881d83753da0a954'
client_secret = 'edf5f15c27194c37a3df6e5a7cd37e8d'

client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

def get_song_details(song_title, artist_name, year_released):
    query = f"track:{song_title} artist:{artist_name} year:{year_released}"
    result = sp.search(q=query, limit=1, type='track')
    
    if result['tracks']['items']:
        track = result['tracks']['items'][0]
        return {
            'title': track['name'],
            'artist': track['artists'][0]['name'],
            'pop_today': track['popularity'],
            'track_id': track['id'],
            'url': track['external_urls']['spotify']
        }
    else:
        return None

song_details = []
for _, row in songs.iterrows():
    title = row['title']
    artist = row['artist']  
    year = row['year_released'] 
    details = get_song_details(title, artist, year)
    if details:
        song_details.append(details)

songs_df = pd.DataFrame(song_details)
display(songs_df)

Unnamed: 0,title,artist,pop_today,track_id,url
0,STARSTRUKK (feat. Katy Perry),3OH!3,62,1hBM2D1ULT3aeKuddSwPsK,https://open.spotify.com/track/1hBM2D1ULT3aeKu...
1,My First Kiss (feat. Ke$ha),3OH!3,59,17tDv8WA8IhqE8qzuQn707,https://open.spotify.com/track/17tDv8WA8IhqE8q...
2,I Need A Dollar,Aloe Blacc,67,3oUphdZVPyrsprZ8FgbmQS,https://open.spotify.com/track/3oUphdZVPyrsprZ...
3,Airplanes (feat. Hayley Williams of Paramore),B.o.B,73,6lV2MSQmRIkycDScNtrBXO,https://open.spotify.com/track/6lV2MSQmRIkycDS...
4,Nothin' on You (feat. Bruno Mars),B.o.B,76,59dLtGBS26x7kc0rHbaPrq,https://open.spotify.com/track/59dLtGBS26x7kc0...
...,...,...,...,...,...
985,SICKO MODE,Travis Scott,81,2xLMifQCjDGFmkHkpNLD9h,https://open.spotify.com/track/2xLMifQCjDGFmkH...
986,EARFQUAKE,"Tyler, The Creator",82,5hVghJ4KaYES3BFUATCYn0,https://open.spotify.com/track/5hVghJ4KaYES3BF...
987,Boasty (feat. Idris Elba),Wiley,59,5X5YDBavdU5RjYMlxqwlCm,https://open.spotify.com/track/5X5YDBavdU5RjYM...
988,Strike a Pose (feat. Aitch),Young T & Bugsey,51,23GvTfcGK454ppLsts3W44,https://open.spotify.com/track/23GvTfcGK454ppL...


In [62]:
songs_df.to_csv('api_songs.csv', encoding = 'UTF-8', index = False)
songs.to_csv('tophits.csv', encoding='UTF-8', index = False)

In [3]:
tophits = pd.read_csv('tophits.csv', encoding='UTF-8')
apisongs = pd.read_csv('api_songs.csv', encoding= 'UTF-8')

display(tophits)
display(apisongs)

Unnamed: 0,title,artist,top_genre,year_released,added,bpm,nrgy,dnce,dB,live,val,dur,acous,spch,pop,top_year,artist_type
0,STARSTRUKK (feat. Katy Perry),3OH!3,dance pop,2009,2022‑02‑17,140,81,61,-6,23,23,203,0,6,70,2010,Duo
1,My First Kiss (feat. Ke$ha),3OH!3,dance pop,2010,2022‑02‑17,138,89,68,-4,36,83,192,1,8,68,2010,Duo
2,I Need A Dollar,Aloe Blacc,pop soul,2010,2022‑02‑17,95,48,84,-7,9,96,243,20,3,72,2010,Solo
3,Airplanes (feat. Hayley Williams of Paramore),B.o.B,atl hip hop,2010,2022‑02‑17,93,87,66,-4,4,38,180,11,12,80,2010,Solo
4,Nothin' on You (feat. Bruno Mars),B.o.B,atl hip hop,2010,2022‑02‑17,104,85,69,-6,9,74,268,39,5,79,2010,Solo
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,SICKO MODE,Travis Scott,hip hop,2018,2020‑06‑22,155,73,83,-4,12,45,313,1,22,86,2019,Solo
996,EARFQUAKE,"Tyler, The Creator",hip hop,2019,2020‑06‑22,80,50,55,-9,80,41,190,23,7,85,2019,Solo
997,Boasty (feat. Idris Elba),Wiley,grime,2019,2020‑06‑22,103,77,89,-5,9,46,177,1,7,68,2019,Solo
998,Strike a Pose (feat. Aitch),Young T & Bugsey,afroswing,2019,2020‑08‑20,138,58,53,-6,10,59,214,1,10,67,2019,Duo


Unnamed: 0,title,artist,pop_today,track_id,url
0,STARSTRUKK (feat. Katy Perry),3OH!3,62,1hBM2D1ULT3aeKuddSwPsK,https://open.spotify.com/track/1hBM2D1ULT3aeKu...
1,My First Kiss (feat. Ke$ha),3OH!3,59,17tDv8WA8IhqE8qzuQn707,https://open.spotify.com/track/17tDv8WA8IhqE8q...
2,I Need A Dollar,Aloe Blacc,67,3oUphdZVPyrsprZ8FgbmQS,https://open.spotify.com/track/3oUphdZVPyrsprZ...
3,Airplanes (feat. Hayley Williams of Paramore),B.o.B,73,6lV2MSQmRIkycDScNtrBXO,https://open.spotify.com/track/6lV2MSQmRIkycDS...
4,Nothin' on You (feat. Bruno Mars),B.o.B,76,59dLtGBS26x7kc0rHbaPrq,https://open.spotify.com/track/59dLtGBS26x7kc0...
...,...,...,...,...,...
985,SICKO MODE,Travis Scott,81,2xLMifQCjDGFmkHkpNLD9h,https://open.spotify.com/track/2xLMifQCjDGFmkH...
986,EARFQUAKE,"Tyler, The Creator",82,5hVghJ4KaYES3BFUATCYn0,https://open.spotify.com/track/5hVghJ4KaYES3BF...
987,Boasty (feat. Idris Elba),Wiley,59,5X5YDBavdU5RjYMlxqwlCm,https://open.spotify.com/track/5X5YDBavdU5RjYM...
988,Strike a Pose (feat. Aitch),Young T & Bugsey,51,23GvTfcGK454ppLsts3W44,https://open.spotify.com/track/23GvTfcGK454ppL...


In [5]:
# function to scroll from the top to the bottom of the web page
def random_scroll(browser, total_wait_time):
    # get the total height of the page
    total_height = browser.execute_script("return document.body.scrollHeight")
    
    # number of steps to scroll (you can adjust this number)
    scroll_steps = random.randint(2, 10) # randomize how many scroll steps we will use
    
    # calculate the height to scroll on each step
    scroll_increment = total_height // scroll_steps

    # calculate the total time available for scrolling each step
    time_per_step = total_wait_time / scroll_steps
    
    # random scrolling across time
    for step in range(scroll_steps):
        # scroll by the increment (dividing total height by number of steps)
        browser.execute_script(f"window.scrollBy(0, {scroll_increment});")
        
        # random wait time between scrolls to simulate varying speed
        random_wait = random.uniform(0.5 * time_per_step, 1.5 * time_per_step)  # randomize the wait within a range
        time.sleep(random_wait)
        
    # final scroll to make sure you are at the very bottom (in case it didn't exactly match)
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

In [8]:
browser = webdriver.Chrome()

titles = []
artists = []
streams = []


for i in range(0,100):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)
    
    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(7, 12)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists.append(artist_elem.text)  

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(2,4))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles)}")  
print(f"# of Artists: {len(artists)}")  
print(f"# of Streams: {len(streams)}") 

https://open.spotify.com/track/1hBM2D1ULT3aeKuddSwPsK
Song 1hBM2D1ULT3aeKuddSwPsK done, [0]
https://open.spotify.com/track/17tDv8WA8IhqE8qzuQn707
Song 17tDv8WA8IhqE8qzuQn707 done, [1]
https://open.spotify.com/track/3oUphdZVPyrsprZ8FgbmQS
Song 3oUphdZVPyrsprZ8FgbmQS done, [2]
https://open.spotify.com/track/6lV2MSQmRIkycDScNtrBXO
Song 6lV2MSQmRIkycDScNtrBXO done, [3]
https://open.spotify.com/track/59dLtGBS26x7kc0rHbaPrq
Song 59dLtGBS26x7kc0rHbaPrq done, [4]
https://open.spotify.com/track/5uHYcK0nbEYgRaFTY5BqnP
Song 5uHYcK0nbEYgRaFTY5BqnP done, [5]
https://open.spotify.com/track/7vWFaMq63AwkFDhS2OAg5u
Song 7vWFaMq63AwkFDhS2OAg5u done, [6]
https://open.spotify.com/track/0oJMv049q8hEkes9w0L1J4
Song 0oJMv049q8hEkes9w0L1J4 done, [7]
https://open.spotify.com/track/161DnLWsx1i3u1JT05lzqU
Song 161DnLWsx1i3u1JT05lzqU done, [8]
https://open.spotify.com/track/7BqBn9nzAq8spo5e7cZ0dJ
Song 7BqBn9nzAq8spo5e7cZ0dJ done, [9]
https://open.spotify.com/track/61LtVmmkGr8P9I2tSPvdpf
Song 61LtVmmkGr8P9I2tSPvdp

In [12]:
chunk = pd.DataFrame({'title': titles, 'artist': artists, 'streams': streams})
chunk.to_csv('chunk_raw.csv', encoding = 'UTF-8', index=False)
display(chunk)

Unnamed: 0,title,artist,streams
0,STARSTRUKK (feat. Katy Perry),3OH!3,160363103
1,My First Kiss (feat. Ke$ha),3OH!3,126408708
2,I Need A Dollar,Aloe Blacc,282228422
3,Airplanes (feat. Hayley Williams of Paramore),B.o.B,843380081
4,Nothin' on You (feat. Bruno Mars),B.o.B,684014150
...,...,...,...
95,Hey Daddy (Daddy's Home),USHER,285641365
96,No Hands (feat. Roscoe Dash & Wale),Waka Flocka Flame,715558832
97,We No Speak Americano (Edit),Yolanda Be Cool,158225136
98,BedRock,Young Money,368521660


In [26]:
browser = webdriver.Chrome()

titles1 = []
artists1 = []  
streams1 = []


for i in range(100,200):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)

    browser.get(url)

    total_wait_time = random.uniform(8, 12)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles1.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists1.append(artist_elem.text)  

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams1.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(2,5))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles1)}") 
print(f"# of Artists: {len(artists1)}")  
print(f"# of Streams: {len(streams1)}")  

https://open.spotify.com/track/73CMRj62VK8nUS4ezD2wvi
Song 73CMRj62VK8nUS4ezD2wvi done, [100]
https://open.spotify.com/track/6HTprulGfeFVrLLvfi3t8a
Song 6HTprulGfeFVrLLvfi3t8a done, [101]
https://open.spotify.com/track/2z4U9d5OAA4YLNXoCgioxo
Song 2z4U9d5OAA4YLNXoCgioxo done, [102]
https://open.spotify.com/track/5zpDHEU12zATwLGvozxPw2
Song 5zpDHEU12zATwLGvozxPw2 done, [103]
https://open.spotify.com/track/1z6WtY7X4HQJvzxC4UgkSf
Song 1z6WtY7X4HQJvzxC4UgkSf done, [104]
https://open.spotify.com/track/3lBRNqXjPp2j3JMTCXDTNO
Song 3lBRNqXjPp2j3JMTCXDTNO done, [105]
https://open.spotify.com/track/4RL77hMWUq35NYnPLXBpih
Song 4RL77hMWUq35NYnPLXBpih done, [106]
https://open.spotify.com/track/3JA9Jsuxr4xgHXEawAdCp4
Song 3JA9Jsuxr4xgHXEawAdCp4 done, [107]
https://open.spotify.com/track/0gY2iq0xJPRoIB1PScKSw4
Song 0gY2iq0xJPRoIB1PScKSw4 done, [108]
https://open.spotify.com/track/35KiiILklye1JRRctaLUb4
Song 35KiiILklye1JRRctaLUb4 done, [109]
https://open.spotify.com/track/5cCAZS9VhLGEDV4NCfieeg
Song 5

In [28]:
chunk1 = pd.DataFrame({'title': titles1, 'artist': artists1, 'streams': streams1})
chunk1.to_csv('chunk1_raw.csv', encoding = 'UTF-8', index=False)
display(chunk1)

Unnamed: 0,title,artist,streams
0,Set Fire to the Rain,Adele,1819493514
1,Mr. Saxobeat - Radio Edit,Alexandra Stan,642966652
2,What the Hell,Avril Lavigne,430331054
3,Lighters,Bad Meets Evil,269488030
4,Love On Top,Beyoncé,679659501
...,...,...,...
95,More - RedOne Jimmy Joker Remix,USHER,207271291
96,No Hands (feat. Roscoe Dash & Wale),Waka Flocka Flame,717551513
97,Black and Yellow,Wiz Khalifa,684219442
98,Roll Up,Wiz Khalifa,167691228


In [44]:
browser = webdriver.Chrome()

titles2 = []
artists2 = []  
streams2 = []


for i in range(200,300):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)
    
    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(7, 12)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles2.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists2.append(artist_elem.text)  

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams2.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(2,4))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles2)}")  
print(f"# of Artists: {len(artists2)}")  
print(f"# of Streams: {len(streams2)}") 

https://open.spotify.com/track/2A73XBDBQgmdXO8VsXPWIs
Song 2A73XBDBQgmdXO8VsXPWIs done, [200]
https://open.spotify.com/track/5pVk15sR3OgIeKBKqG9jWw
Song 5pVk15sR3OgIeKBKqG9jWw done, [201]
https://open.spotify.com/track/4sK96UnGx3NjBaqvfTG2dm
Song 4sK96UnGx3NjBaqvfTG2dm done, [202]
https://open.spotify.com/track/3n69hLUdIsSa1WlRmjMZlW
Song 3n69hLUdIsSa1WlRmjMZlW done, [203]
https://open.spotify.com/track/6c5QZx2v9753q26g1Fvo2F
Song 6c5QZx2v9753q26g1Fvo2F done, [204]
https://open.spotify.com/track/68rcszAg5pbVaXVvR7LFNh
Song 68rcszAg5pbVaXVvR7LFNh done, [205]
https://open.spotify.com/track/5UqCQaDshqbIk3pkhy4Pjg
Song 5UqCQaDshqbIk3pkhy4Pjg done, [206]
https://open.spotify.com/track/06h3McKzmxS8Bx58USHiMq
Song 06h3McKzmxS8Bx58USHiMq done, [207]
https://open.spotify.com/track/7mDKRYiqejoHzP7dQGxLys
Song 7mDKRYiqejoHzP7dQGxLys done, [208]
https://open.spotify.com/track/28GUjBGqZVcAV4PHSYzkj2
Song 28GUjBGqZVcAV4PHSYzkj2 done, [209]
https://open.spotify.com/track/3w3y8KPTfNeOKPiqUTakBh
Song 3

In [48]:
chunk2 = pd.DataFrame({'title': titles2, 'artist': artists2, 'streams': streams2})
chunk2.to_csv('chunk2_raw.csv', encoding = 'UTF-8', index=False)

In [13]:
browser = webdriver.Chrome()

titles3 = []
artists3 = []  
streams3 = []


for i in range(300,400):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)

    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(9, 13)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles3.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists3.append(artist_elem.text)  

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams3.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(5,7)) 

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles3)}") 
print(f"# of Artists: {len(artists3)}")  
print(f"# of Streams: {len(streams3)}")  

https://open.spotify.com/track/086myS9r57YsLbJpU0TgK9
Song 086myS9r57YsLbJpU0TgK9 done, [300]
https://open.spotify.com/track/5FVd6KXrgO9B3JPmC8OPst
Song 5FVd6KXrgO9B3JPmC8OPst done, [301]
https://open.spotify.com/track/06EL94D0TA27Ik0Ke5usbj
Song 06EL94D0TA27Ik0Ke5usbj done, [302]
https://open.spotify.com/track/0nrRP2bk19rLc0orkWPQk2
Song 0nrRP2bk19rLc0orkWPQk2 done, [303]
https://open.spotify.com/track/591nHHHzZl1NLt9PMKpinM
Song 591nHHHzZl1NLt9PMKpinM done, [304]
https://open.spotify.com/track/0qwcGscxUHGZTgq0zcaqk1
Song 0qwcGscxUHGZTgq0zcaqk1 done, [305]
https://open.spotify.com/track/7ueP5u2qkdZbIPN2YA6LR0
Song 7ueP5u2qkdZbIPN2YA6LR0 done, [306]
https://open.spotify.com/track/01XFgRZfZI7oBagNf1Loml
Song 01XFgRZfZI7oBagNf1Loml done, [307]
https://open.spotify.com/track/3gbBpTdY8lnQwqxNCcf795
Song 3gbBpTdY8lnQwqxNCcf795 done, [308]
https://open.spotify.com/track/7BNDyzwDboNRR2wmd7GSew
Song 7BNDyzwDboNRR2wmd7GSew done, [309]
https://open.spotify.com/track/5yIiXdLRE85OBiQmCaUenq
Song 5

In [47]:
chunk3 = pd.DataFrame({'title': titles3, 'artist': artists3, 'streams': streams3})
chunk3.to_csv('chunk3_raw.csv', encoding = 'UTF-8', index=False)

In [27]:
browser = webdriver.Chrome()

titles4 = []
artists4 = []  
streams4 = []


for i in range(400,500):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)
    
    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(10, 13)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles4.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists4.append(artist_elem.text)  

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams4.append(streams_elem.text)  
        
    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(6,7))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles4)}")  
print(f"# of Artists: {len(artists4)}")  
print(f"# of Streams: {len(streams4)}")  

https://open.spotify.com/track/5Hroj5K7vLpIG4FNCRIjbP
Song 5Hroj5K7vLpIG4FNCRIjbP done, [400]
https://open.spotify.com/track/5J4ZkQpzMUFojo1CtAZYpn
Song 5J4ZkQpzMUFojo1CtAZYpn done, [401]
https://open.spotify.com/track/7vS3Y0IKjde7Xg85LWIEdP
Song 7vS3Y0IKjde7Xg85LWIEdP done, [402]
https://open.spotify.com/track/12KUFSHFgT0XCoiSlvdQi4
Song 12KUFSHFgT0XCoiSlvdQi4 done, [403]
https://open.spotify.com/track/4lhqb6JvbHId48OUJGwymk
Song 4lhqb6JvbHId48OUJGwymk done, [404]
https://open.spotify.com/track/3DmW6y7wTEYHJZlLo1r6XJ
Song 3DmW6y7wTEYHJZlLo1r6XJ done, [405]
https://open.spotify.com/track/6jG2YzhxptolDzLHTGLt7S
Song 6jG2YzhxptolDzLHTGLt7S done, [406]
https://open.spotify.com/track/19gEmPjfqSZT0ulDRfjl0m
Song 19gEmPjfqSZT0ulDRfjl0m done, [407]
https://open.spotify.com/track/6YUTL4dYpB9xZO5qExPf05
Song 6YUTL4dYpB9xZO5qExPf05 done, [408]
https://open.spotify.com/track/07nH4ifBxUB4lZcsf44Brn
Song 07nH4ifBxUB4lZcsf44Brn done, [409]
https://open.spotify.com/track/4J7CKHCF3mdL4diUsmW8lq
Song 4

In [49]:
chunk4 = pd.DataFrame({'title': titles4, 'artist': artists4, 'streams': streams4})
chunk4.to_csv('chunk4_raw.csv', encoding = 'UTF-8', index=False)

In [31]:
browser = webdriver.Chrome()

titles5 = []
artists5 = [] 
streams5 = []


for i in range(500,600):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)

    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(11, 14)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles5.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists5.append(artist_elem.text)  

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams5.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(6,8))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles5)}")  
print(f"# of Artists: {len(artists5)}") 
print(f"# of Streams: {len(streams5)}")  

https://open.spotify.com/track/2GiJYvgVaD2HtM8GqD9EgQ
Song 2GiJYvgVaD2HtM8GqD9EgQ done, [500]
https://open.spotify.com/track/3pXF1nA74528Edde4of9CC
Song 3pXF1nA74528Edde4of9CC done, [501]
https://open.spotify.com/track/7MmG8p0F9N3C4AXdK6o6Eb
Song 7MmG8p0F9N3C4AXdK6o6Eb done, [502]
https://open.spotify.com/track/22mek4IiqubGD9ctzxc69s
Song 22mek4IiqubGD9ctzxc69s done, [503]
https://open.spotify.com/track/0k6DnZMLoEUH8NGD5zh2SE
Song 0k6DnZMLoEUH8NGD5zh2SE done, [504]
https://open.spotify.com/track/3uwnnTQcHM1rDqSfA4gQNz
Song 3uwnnTQcHM1rDqSfA4gQNz done, [505]
https://open.spotify.com/track/3DXXKDbbZKyAZfNb96ST3q
Song 3DXXKDbbZKyAZfNb96ST3q done, [506]
https://open.spotify.com/track/3MOECVkNshqHYTPt5DZcdN
Song 3MOECVkNshqHYTPt5DZcdN done, [507]
https://open.spotify.com/track/78EQ5LZGgviMU9k0zrqv1r
Song 78EQ5LZGgviMU9k0zrqv1r done, [508]
https://open.spotify.com/track/57kR5SniQIbsbVoIjjOUDa
Song 57kR5SniQIbsbVoIjjOUDa done, [509]
https://open.spotify.com/track/0wwPcA6wtMf6HUMpIRdeP7
Song 0

In [51]:
chunk5 = pd.DataFrame({'title': titles5, 'artist': artists5, 'streams': streams5})
chunk5.to_csv('chunk5_raw.csv', encoding = 'UTF-8', index=False)

In [35]:
browser = webdriver.Chrome()

titles6 = []
artists6 = []  
streams6 = []


for i in range(600,700):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)

    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(11, 14)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles6.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists6.append(artist_elem.text) 

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams6.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(6,8))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles6)}")  
print(f"# of Artists: {len(artists6)}")  
print(f"# of Streams: {len(streams6)}")  

https://open.spotify.com/track/46u5B2WN4wryYLZuMAOmI4
Song 46u5B2WN4wryYLZuMAOmI4 done, [600]
https://open.spotify.com/track/1oxOiOjsi7plNOZEhoPLPj
Song 1oxOiOjsi7plNOZEhoPLPj done, [601]
https://open.spotify.com/track/0lnIJmgcUpEpe4AZACjayW
Song 0lnIJmgcUpEpe4AZACjayW done, [602]
https://open.spotify.com/track/3pXF1nA74528Edde4of9CC
Song 3pXF1nA74528Edde4of9CC done, [603]
https://open.spotify.com/track/43PuMrRfbyyuz4QpZ3oAwN
Song 43PuMrRfbyyuz4QpZ3oAwN done, [604]
https://open.spotify.com/track/2BOqDYLOJBiMOXShCV1neZ
Song 2BOqDYLOJBiMOXShCV1neZ done, [605]
https://open.spotify.com/track/1vvNmPOiUuyCbgWmtc6yfm
Song 1vvNmPOiUuyCbgWmtc6yfm done, [606]
https://open.spotify.com/track/37FXw5QGFN7uwwsLy8uAc0
Song 37FXw5QGFN7uwwsLy8uAc0 done, [607]
https://open.spotify.com/track/7soJgKhQTO8hLP2JPRkL5O
Song 7soJgKhQTO8hLP2JPRkL5O done, [608]
https://open.spotify.com/track/6GBMbvX7sqyOxT5wWK4hgN
Song 6GBMbvX7sqyOxT5wWK4hgN done, [609]
https://open.spotify.com/track/4CGGIk81BvfCZiscwFP6t0
Song 4

In [53]:
chunk6 = pd.DataFrame({'title': titles6, 'artist': artists6, 'streams': streams6})
chunk6.to_csv('chunk6_raw.csv', encoding = 'UTF-8', index=False)

In [41]:
browser = webdriver.Chrome()

titles7 = []
artists7 = []  
streams7 = []


for i in range(700,800):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)

    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(11, 14)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles7.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists7.append(artist_elem.text) 

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams7.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(6,8))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles7)}")  
print(f"# of Artists: {len(artists7)}")  
print(f"# of Streams: {len(streams7)}")  

https://open.spotify.com/track/6X5OFBbrsHRsyO1zP7udgr
Song 6X5OFBbrsHRsyO1zP7udgr done, [700]
https://open.spotify.com/track/04sN26COy28wTXYj3dMoiZ
Song 04sN26COy28wTXYj3dMoiZ done, [701]
https://open.spotify.com/track/6Se3x9ANMLv0dCDsjGmEjh
Song 6Se3x9ANMLv0dCDsjGmEjh done, [702]
https://open.spotify.com/track/0KKkJNfGyhkQ5aFogxQAPU
Song 0KKkJNfGyhkQ5aFogxQAPU done, [703]
https://open.spotify.com/track/6b8Be6ljOzmkOmFslEb23P
Song 6b8Be6ljOzmkOmFslEb23P done, [704]
https://open.spotify.com/track/5bcTCxgc7xVfSaMV3RuVke
Song 5bcTCxgc7xVfSaMV3RuVke done, [705]
https://open.spotify.com/track/7tr2za8SQg2CI8EDgrdtNl
Song 7tr2za8SQg2CI8EDgrdtNl done, [706]
https://open.spotify.com/track/1rfofaqEpACxVEHIZBJe6W
Song 1rfofaqEpACxVEHIZBJe6W done, [707]
https://open.spotify.com/track/6KBYefIoo7KydImq1uUQlL
Song 6KBYefIoo7KydImq1uUQlL done, [708]
https://open.spotify.com/track/5lNuqFVMca4vPupY10cH0J
Song 5lNuqFVMca4vPupY10cH0J done, [709]
https://open.spotify.com/track/3vQ4T78TTMOjQXGfXVKQJo
Song 3

In [55]:
chunk7 = pd.DataFrame({'title': titles7, 'artist': artists7, 'streams': streams7})
chunk7.to_csv('chunk7_raw.csv', encoding = 'UTF-8', index=False)

In [25]:
browser = webdriver.Chrome()

titles8 = []
artists8 = []  
streams8 = []


for i in range(800,900):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)

    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(15, 20)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles8.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists8.append(artist_elem.text)  

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams8.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(6,8))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles8)}")  
print(f"# of Artists: {len(artists8)}")  
print(f"# of Streams: {len(streams8)}") 

https://open.spotify.com/track/0u2P5u6lvoDfwTYjAADbn4
Song 0u2P5u6lvoDfwTYjAADbn4 done, [800]
https://open.spotify.com/track/6HJ34Zyw6bg8yGm28AxLXf
Song 6HJ34Zyw6bg8yGm28AxLXf done, [801]
https://open.spotify.com/track/3Vo4wInECJQuz9BIBMOu8i
Song 3Vo4wInECJQuz9BIBMOu8i done, [802]
https://open.spotify.com/track/7ef4DlsgrMEH11cDZd32M6
Song 7ef4DlsgrMEH11cDZd32M6 done, [803]
https://open.spotify.com/track/4eWQlBRaTjPPUlzacqEeoQ
Song 4eWQlBRaTjPPUlzacqEeoQ done, [804]
https://open.spotify.com/track/58q2HKrzhC3ozto2nDdN4z
Song 58q2HKrzhC3ozto2nDdN4z done, [805]
https://open.spotify.com/track/6KBYefIoo7KydImq1uUQlL
Song 6KBYefIoo7KydImq1uUQlL done, [806]
https://open.spotify.com/track/2Yl4OmDby9iitgNWZPwxkd
Song 2Yl4OmDby9iitgNWZPwxkd done, [807]
https://open.spotify.com/track/7HdZY9UJTylIiNcSDFyUDc
Song 7HdZY9UJTylIiNcSDFyUDc done, [808]
https://open.spotify.com/track/6wmAHw1szh5RCKSRjiXhPe
Song 6wmAHw1szh5RCKSRjiXhPe done, [809]
https://open.spotify.com/track/1lsBTdE6MGsKeZCD6llNu7
Song 1

In [27]:
chunk8 = pd.DataFrame({'title': titles8, 'artist': artists8, 'streams': streams8})
chunk8.to_csv('chunk8_raw.csv', encoding = 'UTF-8', index=False)

In [40]:
browser = webdriver.Chrome()

titles9 = []
artists9 = []  
streams9 = []

for i in range(900,990):
    track_id = apisongs.iloc[i]['track_id']
    url = f"https://open.spotify.com/track/{track_id}"
    print(url)

    browser.get(url)
    browser.maximize_window()

    total_wait_time = random.uniform(11,14)  
    random_scroll(browser, total_wait_time)

    title_elements = browser.find_elements(By.CSS_SELECTOR, 'h1.encore-text.encore-text-headline-large[data-encore-id="text"]')
    for title in title_elements:
        titles9.append(title.text)

    artist_elements = browser.find_elements(By.CSS_SELECTOR, 'a[data-testid="creator-link"]')
    for artist_elem in artist_elements:
        artists9.append(artist_elem.text) 

    streams_elements = browser.find_elements(By.CSS_SELECTOR, 'span[data-testid="playcount"]')
    for streams_elem in streams_elements:
        streams9.append(streams_elem.text)  

    print(f"Song {track_id} done, {[i]}")

    time.sleep(random.randint(6,8))

browser.close()

print("-"* 30)
print(f"# of Titles: {len(titles9)}") 
print(f"# of Artists: {len(artists9)}")  
print(f"# of Streams: {len(streams9)}")  

https://open.spotify.com/track/4kV4N9D1iKVxx1KLvtTpjS
Song 4kV4N9D1iKVxx1KLvtTpjS done, [900]
https://open.spotify.com/track/0Ryd8975WihbObpp5cPW1t
Song 0Ryd8975WihbObpp5cPW1t done, [901]
https://open.spotify.com/track/56JyMaElW79S7TDWh1Zw1m
Song 56JyMaElW79S7TDWh1Zw1m done, [902]
https://open.spotify.com/track/2TH65lNHgvLxCKXM3apjxI
Song 2TH65lNHgvLxCKXM3apjxI done, [903]
https://open.spotify.com/track/3Ueq2zboxwAbsvHrOjdEqz
Song 3Ueq2zboxwAbsvHrOjdEqz done, [904]
https://open.spotify.com/track/0u2P5u6lvoDfwTYjAADbn4
Song 0u2P5u6lvoDfwTYjAADbn4 done, [905]
https://open.spotify.com/track/2Fxmhks0bxGSBdJ92vM42m
Song 2Fxmhks0bxGSBdJ92vM42m done, [906]
https://open.spotify.com/track/43zdsphuZLzwA9k4DJhU0I
Song 43zdsphuZLzwA9k4DJhU0I done, [907]
https://open.spotify.com/track/4SSnFejRGlZikf02HLewEF
Song 4SSnFejRGlZikf02HLewEF done, [908]
https://open.spotify.com/track/6hvczQ05jc1yGlp9zhb95V
Song 6hvczQ05jc1yGlp9zhb95V done, [909]
https://open.spotify.com/track/01tA4XmJ4fGQNwti6b2hPm
Song 0

In [46]:
chunk9 = pd.DataFrame({'title': titles9, 'artist': artists9, 'streams': streams9})
display(chunk9)
chunk9.to_csv('chunk9_raw.csv', encoding = 'UTF-8', index=False)

Unnamed: 0,title,artist,streams
0,"break up with your girlfriend, i'm bored",Ariana Grande,1016355182
1,boyfriend (with Social House),Ariana Grande,877976080
2,On A Roll,Ashley O,119064618
3,Callaita,Bad Bunny,1534744486
4,MIA (feat. Drake),Bad Bunny,1374414877
...,...,...,...
85,SICKO MODE,Travis Scott,2260795689
86,EARFQUAKE,"Tyler, The Creator",1092188072
87,Boasty (feat. Idris Elba),Wiley,171625826
88,Strike a Pose (feat. Aitch),Young T & Bugsey,133345785


In [58]:
raw0 = pd.read_csv("chunk_raw.csv", encoding = 'utf-8')
raw1 = pd.read_csv("chunk1_raw.csv", encoding = 'utf-8')
raw2 = pd.read_csv("chunk2_raw.csv", encoding = 'utf-8')
raw3 = pd.read_csv("chunk3_raw.csv", encoding = 'utf-8')
raw4 = pd.read_csv("chunk4_raw.csv", encoding = 'utf-8')
raw5 = pd.read_csv("chunk5_raw.csv", encoding = 'utf-8')
raw6 = pd.read_csv("chunk6_raw.csv", encoding = 'utf-8')
raw7 = pd.read_csv("chunk7_raw.csv", encoding = 'utf-8')
raw8 = pd.read_csv("chunk8_raw.csv", encoding = 'utf-8')
raw9 = pd.read_csv("chunk9_raw.csv", encoding = 'utf-8')

spotify = pd.concat([raw0, raw1, raw2, raw3, raw4, raw5, raw6, raw7, raw8, raw9], ignore_index=True)
spotify = spotify.drop(columns=["Unnamed: 3", "Unnamed: 4"])
display(spotify)

Unnamed: 0,title,artist,streams
0,STARSTRUKK (feat. Katy Perry),3OH!3,160363103
1,My First Kiss (feat. Ke$ha),3OH!3,126408708
2,I Need A Dollar,Aloe Blacc,282228422
3,Airplanes (feat. Hayley Williams of Paramore),B.o.B,843380081
4,Nothin' on You (feat. Bruno Mars),B.o.B,684014150
...,...,...,...
985,SICKO MODE,Travis Scott,2260795689
986,EARFQUAKE,"Tyler, The Creator",1092188072
987,Boasty (feat. Idris Elba),Wiley,171625826
988,Strike a Pose (feat. Aitch),Young T & Bugsey,133345785


In [60]:
tophits.to_csv('top1000songs_2010s_raw.csv', encoding='UTF-8', index= False)
apisongs.to_csv('apitopsongs_2010s_raw.csv', encoding = 'UTF-8', index = False)
spotify.to_csv('topsongstreams_raw.csv', encoding='UTF-8', index = False)