# This section of the code scrapes the weekly Spotify Top 50 page and downloads the data as CSV to a directory of your choice.

Be advised that this script only works with the chromedriver available to download at this link (https://chromedriver.chromium.org/downloads). This chrome driver is made for automated control of the browser.

#### The Spotify Viral Top 50 page on a weekly basis was chosen for this analysis as it provides fresh new songs rather than the top 200 charts (in which songs usually take months to change position). You can find the link to the latest Spotify Top 50 Viral page here (https://spotifycharts.com/viral/global/weekly).

In [None]:
# Set up the packages required for the webdriver to function
# Main package of interest here is selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time

In [None]:
# Define function that enables the Chrome agent to headlessly "click" on the "Download to CSV" option

def headless_download(url,download_dir,chromedriver_dir):
    #object of ChromeOptions
    op = webdriver.ChromeOptions()

    #set download directory path
    #adding preferences to ChromeOptions
    op.add_experimental_option("prefs", {
            "download.default_directory": download_dir,
            "download.prompt_for_download": False,
            "download.directory_upgrade": True,
            "safebrowsing_for_trusted_sources_enabled": False,
            "safebrowsing.enabled": False
    })
    op.add_argument("--headless")
    op.add_argument("--window-size=1920x1080")
    op.add_argument("--disable-notifications")
    op.add_argument('--no-sandbox')
    op.add_argument('--verbose')

    # initialize driver object and point to where your chromedriver should be
    driver = webdriver.Chrome(executable_path=chromedriver_dir, options=op)
    driver.implicitly_wait(0.4)
    driver.get(url);

    #identify element for the driver to click on (CSS was chosen here)
    m = driver.find_element_by_css_selector('.header-csv')
    m.click()

    # give some time to download file before closing the driver
    time.sleep(5)
    driver.close()

    print('Downloaded one Spotify Viral Top 50 CSV')

#### Once the webscraping function has been created, it can be used recursively, not just for a single page at a single time. Before building a loop that gathers all of the data, we need to create automatic URLs based on the dates used by the weekly Spotify Top 50 Viral webpage.

Be advised that Spotify releases their Viral Top 50 every week on Wednesday, so the start date for the automatic date creation needs to be a Wednesday at any given point in time. Otherwise, the URLs will not work.

In [None]:
from datetime import timedelta, date

In [None]:
# Define function that takes the range of dates between two days (easiest way for the Spotify algo is to give Wednesdays)

def daterange(date1, date2):
    for n in range(int ((date2 - date1).days)+1):
        yield date1 + timedelta(n)

#### The function that produces the Wednesdays of every week was given default values for the scope of this analysis, but it can always be run with different parameters.

In [None]:
# Define function that takes the date range and only keeps every 7th element, thus every Wednesday of every week

def get_wednesday_every_week(start_dt = date(2020, 1, 2), end_dt = date(2020, 12, 31)):
    list_dates = []
    for dt in daterange(start_dt, end_dt):
        list_dates.append(dt.strftime("%Y-%m-%d"))
    list_dates = list_dates[::7]
    return list_dates

#### Inspecting the structure of the URLs, append the produced dates to a list of URLs that match the format needed by the weekly Spotify Top 50 Viral.

In [3]:
# Generate URLs

def produce_spotify_urls():
    temp = get_wednesday_every_week()
    spotify_urls = []
    for wednesday in temp:
        spotify_urls.append('https://spotifycharts.com/viral/global/weekly/{}--{}'.format(wednesday,wednesday))
        print('Will download Spotify Viral Top 50 data for the week of {}'.format(wednesday))
    return spotify_urls

#### Create the final function which takes the download directory and chrome driver directory and downloads everything from the Spotify Viral Top 50 URLs to the specified directory.

In [None]:
# Final function that takes in the directories. Be sure to change this when running on your machine!

def download_spotify(download_dir = "C:\\Users\cosmi\Downloads", chromedriver_dir = "C:/Users/cosmi/Desktop/chromedriver.exe"):
    spotify_urls = produce_spotify_urls()
    for url in spotify_urls:
        headless_download(url, download_dir, chromedriver_dir)

In [4]:
# Run me --> MAGIC
download_spotify()
# function takes about 30 seconds to run for 3 datasets --> will take a long time to run for a full year

Will download Spotify Viral Top 50 data for the week of 2020-01-02
Will download Spotify Viral Top 50 data for the week of 2020-01-09
Will download Spotify Viral Top 50 data for the week of 2020-01-16
Will download Spotify Viral Top 50 data for the week of 2020-01-23
Will download Spotify Viral Top 50 data for the week of 2020-01-30
Will download Spotify Viral Top 50 data for the week of 2020-02-06
Will download Spotify Viral Top 50 data for the week of 2020-02-13
Will download Spotify Viral Top 50 data for the week of 2020-02-20
Will download Spotify Viral Top 50 data for the week of 2020-02-27
Will download Spotify Viral Top 50 data for the week of 2020-03-05
Will download Spotify Viral Top 50 data for the week of 2020-03-12
Will download Spotify Viral Top 50 data for the week of 2020-03-19
Will download Spotify Viral Top 50 data for the week of 2020-03-26
Will download Spotify Viral Top 50 data for the week of 2020-04-02
Will download Spotify Viral Top 50 data for the week of 2020-0