Analyzing Music Trends From the past year

Explanation of Motivation & Provide Resources

Motivation: each tutorial should be sufficiently motivated. If there is not motivation for the analysis, why would we ‘do data science’ on this topic?

Resources: tutorials should help the reader learn a skill, but they should also provide a launching pad for the reader to further develop that skill. The tutorial should link to additional resources wherever appropriate, so that a well-motivated reader can read further on techniques that have been used in the tutorial.

Phase 1: Data Collection, Curation, and Parsing

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import requests as rq
from bs4 import BeautifulSoup as bs
import datetime as dt
import re

# Helper Functions
def format_datetime(datetime):
    return str(datetime.year).zfill(4) + "-" + str(datetime.month).zfill(2) + "-" + str(datetime.day).zfill(2) 

def format_artist(artists):
    form1 = re.sub("&amp;", "&", artists)
    form2 = re.sub("Featuring", "ft.", form1)
    return form2

def remove_tags(tag, string):
    tag1 ="<" + tag + ".*?>\s*"
    tag2 = "\s*</" + tag + ".*?>"
    return re.sub(tag2, "", re.sub(tag1, "", string))

def scrape_billboard(start_date, end_date, page):
    info_list = []
    date = start_date
    while date <= end_date:
        billboard_url = "https://www.billboard.com/"  + page + format_datetime(date) + "/"
        soup = bs(rq.get(billboard_url).content)
        charts = soup.find_all("div", class_=re.compile("o-chart-results-list-row-container"))
        for entry in charts:
            rank = remove_tags("span", str(entry.find("span", class_=re.compile("c-label a-font-primary-bold-l"))))
            title = remove_tags("h3", str(entry.find("h3", class_=re.compile("c-title"))))
            artist = remove_tags("span", str(entry.find("span", class_=re.compile("c-label a-no-trucate"))))
            # Handle Multiple Artists
            artist = format_artist(artist)
            data = {"Week": date, "Rank": rank, "Title": title, "Artist": artist}
            info_list.append(data)
        date += dt.timedelta(days = 7)
    return pd.DataFrame(info_list)

def scrape_azlyrics():
    return 1

def scrape_hooktheory():
    # Get Chord and Melody Metrics as defined by Hook Theory
    url = "https://www.hooktheory.com/theorytab/view/mariah-carey/all-i-want-for-christmas-is-you"
    soup = bs(rq.get(url).content)
    print(soup.prettify)
    return 1

In [None]:
df = scrape_billboard(dt.date(2022, 5, 14), dt.date.today(), "charts/hot-100/")
df.to_csv("csv/billboard_data.csv")
billboard_data = pd.read_csv("csv/billboard_data.csv").iloc[:, 1:]
billboard_data.head(100)

In [142]:
# pip install spotipy
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import config

authentication = SpotifyClientCredentials(client_id = config.cid, client_secret = config.csecret)
sp = spotipy.Spotify(client_credentials_manager = authentication)

q = "track:{} artist:{}".format("Playing God", "Polyphia")
spotify_id = sp.search(q, type='track', limit=1)['tracks']['items'][0]['id']
print(spotify_id)

3nBGFgfRQ8ujSmu5cGlZIU


Phase 2: Data Management and Representation

Phase 3: Exploratory Data Analysis

Phase 4: Hypothesis Testing

Phase 5: Communication of Insights Attained

Understanding: the reader of the tutorial should walk away with some new understanding of the topic at hand. If it’s not possible for a reader to state ‘what they learned’ from reading your tutorial, then why do the analysis?