
# Trends in Music Post 2000s: What Makes Popular Music Popular?

###### Last updated on May 15th, 2022 by Matthew Lynch

## “Pop” music vs “Popular” music?

When people use the term “*pop music*” they could be referring either to popular music (as in music that is popular or trending) or a more specific genre distinct from metal, jazz, rap, indie, and other established genres. The discussion as to what constitutes the genre of pop music is a complicated one as the genre continuously evolves–branching out and [borrowing musical elements from other styles like rock, dance, Latin, and country music](https://en.wikipedia.org/wiki/Pop_music). The genre of pop also typically aligns with what is currently popular at a given time and Wikipedia suggests *“that the term "pop music" may be used to describe a distinct genre, designed to appeal to all.”* Yet when examining the top hits from the last two decades, there are some tracks that probably wouldn’t be considered as emblematic of the “pop” genre, but rather a subgenre or a different style altogether.

The discussion of popular music is also complex as what makes for “good” music relies heavily on cultural influences at the time. The #1 hit song in the United States in 2005 might not be the same as what’s popular in Spain or South Korea. To clarify, for this project we are analyzing **popular music in the United States from January 1st, 2000 to today** (March 16th, 2022) with the intent of determining if certain combinations of musical qualities are more likely to create hit songs than others. While the process of creating music isn't necessarily so straightforward, information about what the most popular song duration, tempo, key signature, or even volume could provide a guideline for people who wish to create well recieved music.

Explanation of Motivation & Provide Resources

Motivation: each tutorial should be sufficiently motivated. If there is not motivation for the analysis, why would we ‘do data science’ on this topic?

Resources: tutorials should help the reader learn a skill, but they should also provide a launching pad for the reader to further develop that skill. The tutorial should link to additional resources wherever appropriate, so that a well-motivated reader can read further on techniques that have been used in the tutorial.

# Collecting Billboard Top 100 Data

In [278]:
# Import Libraries
import pandas as pd
import numpy as np
import requests as rq
from bs4 import BeautifulSoup as bs
import datetime as dt
import re

# Helper Functions
def format_datetime(datetime):
    return str(datetime.year).zfill(4) + "-" + str(datetime.month).zfill(2) + "-" + str(datetime.day).zfill(2) 

def format_name(name):
    amp = re.sub('&amp;', "&", name)
    ft = re.sub('(Featuring)|(featuring)|(feat\.?)', "ft.", amp)
    return ft

def remove_tags(tag, string):
    tag1 ='<' + tag + '.*?>\s*'
    tag2 = '\s*</' + tag + '.*?>'
    return re.sub(tag2, "", re.sub(tag1, "", string))

def scrape_billboard(start_date, end_date, page):
    info_list = []
    date = start_date
    while date <= end_date:
        billboard_url = "https://www.billboard.com/"  + page + format_datetime(date) + "/"
        soup = bs(rq.get(billboard_url).content)
        charts = soup.find_all("div", class_=re.compile('o-chart-results-list-row-container'))
        for entry in charts:
            rank = remove_tags("span", str(entry.find("span", class_=re.compile('c-label a-font-primary-bold-l'))))
            title = remove_tags("h3", str(entry.find("h3", class_=re.compile('c-title'))))
            artist = remove_tags("span", str(entry.find("span", class_=re.compile('c-label a-no-trucate'))))
            # Handle Multiple Artists
            title = format_name(title)
            artist = format_name(artist)
            search = entry.find_all("span", class_=re.compile('(c-label a-font-primary-m lrv-u-padding-tb-050@mobile-max)|(c-label a-font-primary-bold-l a-font-primary-m@mobile-max u-font-weight-normal@mobile-max)'))
            "c-label a-font-primary-bold-l a-font-primary-m@mobile-max u-font-weight-normal@mobile-max"
            weeks = remove_tags("span", str(search[2]))
            page_name = "Weeks_in_" + re.sub('charts/|/', "_", page).strip("_")
            data = {'Rank': rank, 'Title': title, 'Artist': artist, 'Week': date, page_name: weeks}
            info_list.append(data)
        date += dt.timedelta(days = 7)
    return pd.DataFrame(info_list)

def scrape_azlyrics():
    return 1

def scrape_hooktheory():
    # Get Chord and Melody Metrics as defined by Hook Theory
    url = "https://www.hooktheory.com/theorytab/view/mariah-carey/all-i-want-for-christmas-is-you"
    soup = bs(rq.get(url).content)
    print(soup.prettify)
    return 1

In [279]:
billboard_data = scrape_billboard(dt.date(2000, 1, 1), dt.date.today(), "charts/hot-100/")
billboard_data.insert(5, "First_Week", billboard_data['Week'], False)
billboard_data.insert(6, "Last_Week", billboard_data['Week'], False)
billboard_data.drop(columns=['Week'], inplace=True)
billboard_data.to_csv("csv/billboard_data.csv")

In [280]:
billboard_data = pd.read_csv("csv/billboard_data.csv").iloc[:, 1:]
billboard_data.head(10)

Unnamed: 0,Rank,Title,Artist,Weeks_in_hot-100,First_Week,Last_Week
0,1,Smooth,Santana ft. Rob Thomas,23,2000-01-01,2000-01-01
1,2,Back At One,Brian McKnight,19,2000-01-01,2000-01-01
2,3,I Wanna Love You Forever,Jessica Simpson,12,2000-01-01,2000-01-01
3,4,My Love Is Your Love,Whitney Houston,18,2000-01-01,2000-01-01
4,5,I Knew I Loved You,Savage Garden,11,2000-01-01,2000-01-01
5,6,I Need To Know,Marc Anthony,17,2000-01-01,2000-01-01
6,7,Hot Boyz,"Missy ""Misdemeanor"" Elliott ft. NAS, EVE & Q-Tip",6,2000-01-01,2000-01-01
7,8,U Know What's Up,Donell Jones,15,2000-01-01,2000-01-01
8,9,Bring It All To Me,Blaque,11,2000-01-01,2000-01-01
9,10,Girl On TV,LFO,7,2000-01-01,2000-01-01


In [281]:
aggregation_functions = {'Rank': "min", 'Artist': "first", 'Weeks_in_hot-100': "max", 'First_Week': "min", 'Last_Week': "max"}
spotify_data = billboard_data.groupby(billboard_data['Title']).aggregate(aggregation_functions).reset_index()
spotify_data.insert(0, "Top_Rank", spotify_data['Rank'], False)
spotify_data.drop(columns=['Rank'], inplace=True)
#spotify_data.sort_values(by='Top_Rank').head(10)

In [282]:
# pip install spotipy
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import config

authentication = SpotifyClientCredentials(client_id=config.cid, client_secret=config.csecret)
sp = spotipy.Spotify(client_credentials_manager=authentication)

def get_audio_analysis(artist, title):
    q = "{} artist:{}".format(title, artist)
    #print(q)
    result = sp.search(q, type='track', limit=1)['tracks']['items']
    if result == []:
        q = "track:{} artist:{}".format(title, artist)
        result = sp.search(q, type='track', limit=1)['tracks']['items']
        if result == []:
            return [None, None, None, None, None, None, None, None, None, None]
    spotify_id = result[0]['id']
    try:
        analysis = sp.audio_analysis(spotify_id)
    except spotipy.client.SpotifyException:
        analysis = []
    if analysis == []:
        return [None, None, None, None, None, None, None, None, None, None]
    duration = analysis['track']['duration']
    loudness = analysis['track']['loudness']
    tempo = analysis['track']['tempo']
    tempo_conf = analysis['track']['tempo_confidence']
    time_sig = analysis['track']['time_signature']
    time_sig_conf = analysis['track']['time_signature_confidence']
    key = analysis['track']['key']
    key_conf = analysis['track']['key_confidence']
    mode = analysis['track']['mode']
    mode_conf = analysis['track']['mode_confidence']
    return [duration, loudness, tempo, tempo_conf, time_sig, time_sig_conf, key, key_conf, mode, mode_conf]

In [284]:
duration_list = []
loudness_list = []
tempo_list = []
tempo_conf_list = []
time_sig_list = []
time_sig_conf_list = []
key_list = []
key_conf_list =[]
mode_list = []
mode_conf_list = []

for index, row in spotify_data.iterrows():
    string = re.sub('\(|\)', ", ", re.sub('\s+((ft\.)|&|X|x|(\+)|/)\s+', ", ", row['Artist'])).strip(', ')
    artist_list = string.split(",")
    values = [None, None, None, None, None, None, None, None, None, None]
    for artist in artist_list:
        if values == [None, None, None, None, None, None, None, None, None, None]:
            values = get_audio_analysis(artist, re.sub('\(.*\)', "", row['Title']))
    duration_list.append(values[0])
    loudness_list.append(values[1])
    tempo_list.append(values[2])
    tempo_conf_list.append(values[3])
    time_sig_list.append(values[4])
    time_sig_conf_list.append(values[5])
    key_list.append(values[6])
    key_conf_list.append(values[7])
    mode_list.append(values[8])
    mode_conf_list.append(values[9])

spotify_data.insert(6, "Duration", duration_list, False)
spotify_data.insert(7, "Loudness", loudness_list, False)
spotify_data.insert(8, "Tempo", tempo_list, False)
spotify_data.insert(9, "Tempo_Confidence", tempo_conf_list, False)
spotify_data.insert(10, "Meter", time_sig_list, False)
spotify_data.insert(11, "Meter_Confidence", time_sig_conf_list, False)
spotify_data.insert(12, "Key", key_list, False)
spotify_data.insert(13, "Key_Confidence", key_conf_list, False)
spotify_data.insert(14, "Mode", mode_list, False)
spotify_data.insert(15, "Mode_Confidence", mode_conf_list, False)
spotify_data.head(10)

HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/4LaGu95Ui2s4vprSQYWUAZ with Params: {} returned 404 due to analysis not found
HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/6yuvC80FcnVJNvC0DbXN9e with Params: {} returned 404 due to analysis not found
HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/3uh7YcFzAWHGg7spVzPfqP with Params: {} returned 404 due to analysis not found
HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/5gfPJ45gpn3ThswDyeW0Qc with Params: {} returned 404 due to analysis not found
HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/0BXTqB4It8UM09lCaIY3Jk with Params: {} returned 404 due to analysis not found
HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/5OGkKx8jP0A5KSULEc6XYZ with Params: {} returned 404 due to analysis not found
HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/6MFQeWtk7kxWGydnJB2y36 with Params: {} returned 404 due to analysis not found


Unnamed: 0,Top_Rank,Title,Artist,Weeks_in_hot-100,First_Week,Last_Week,Duration,Loudness,Tempo,Tempo_Confidence,Meter,Meter_Confidence,Key,Key_Confidence,Mode,Mode_Confidence
0,22,#1,Nelly,20,2001-10-20,2002-03-02,223.08,-6.358,116.935,0.841,4.0,1.0,1.0,0.459,1.0,0.517
1,15,#Beautiful,Mariah Carey ft. Miguel,16,2013-05-25,2013-09-07,199.94667,-5.333,107.03,0.001,4.0,1.0,4.0,0.228,1.0,0.334
2,16,#SELFIE,The Chainsmokers,11,2014-03-15,2014-05-24,183.74998,-3.262,127.956,0.835,4.0,0.923,0.0,0.508,1.0,0.467
3,17,#thatPOWER,will.i.am ft. Justin Bieber,16,2013-04-06,2013-07-20,279.50668,-6.096,127.999,0.796,4.0,1.0,6.0,0.465,0.0,0.504
4,71,$ave Dat Money,Lil Dicky ft. Fetty Wap & Rich Homie Quan,19,2015-10-10,2016-03-19,290.8357,-5.361,98.013,0.399,4.0,1.0,2.0,0.643,1.0,0.347
5,4,'03 Bonnie & Clyde,Jay-Z ft. Beyonce Knowles,23,2002-10-26,2003-03-29,205.56,-5.148,89.64,0.723,4.0,1.0,9.0,0.672,0.0,0.564
6,58,'Til Summer Comes Around,Keith Urban,16,2010-01-30,2010-05-15,331.46667,-7.608,127.907,0.21,4.0,1.0,9.0,0.494,0.0,0.505
7,18,'Til You Can't,Cody Johnson,30,2021-10-23,2022-05-14,224.21333,-4.865,160.087,0.387,4.0,0.996,1.0,0.494,1.0,0.689
8,39,'Tis The Damn Season,Taylor Swift,2,2020-12-26,2021-01-02,229.84,-8.193,145.916,0.362,4.0,0.72,5.0,0.693,1.0,0.585
9,7,(Hot S**t) Country Grammar,Nelly,34,2000-04-29,2000-12-16,291.78195,-6.49,101.875,0.966,4.0,1.0,7.0,0.597,1.0,0.51


In [287]:
null_data = spotify_data[spotify_data.isnull().any(axis=1)]
print(len(null_data.index))
null_data.head(10)
# should be 2
spotify_data.to_csv("csv/spotify_data_2000_01_01.csv")
print(len(spotify_data.index))

269
8486


Phase 2: Data Management and Representation

Phase 3: Exploratory Data Analysis

Phase 4: Hypothesis Testing

Phase 5: Communication of Insights Attained

Understanding: the reader of the tutorial should walk away with some new understanding of the topic at hand. If it’s not possible for a reader to state ‘what they learned’ from reading your tutorial, then why do the analysis?