# Downloading Billboard Hot 100

[Billboard hot 100](https://www.billboard.com/charts/hot-100/): According to their own description, billboard hot 100 is the week's most popular current songs across all genres, ranked by straming activity from digital music sources, tracked by Luminate, radio airplay audience impresssions as measured by Luminate, and sales data as compiled by Luminate.

[Luminate](https://luminatedata.com/): Luminate is an entertainment preeminent data and insights company, trying to unleash access to the most essential, objective and trustworthy information across music film and television. 

## Libraries:

In [15]:
#!pip install billboard.py
#!pip install pandas
#!pip install swifter

In [18]:
#import billboard
import re
import pandas as pd
import requests
from bs4 import BeautifulSoup
import swifter
import time

## Inputs

In [3]:
HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
}


## Configuration

In [4]:
pd.set_option('display.max_colwidth', 500)

## Functions

In [22]:
def create_date_span(min_date, max_date):
    """Function that creates a date list depending on the min and max date provided,
    the date list is a pandas series within a week scope to the given frequency."""
    dates = pd.Series(pd.date_range(min_date, max_date, freq='W-MON')).dt.date
    return dates

def get_billboard_url_for_date(date):
    """Given a date in the format YYYY-MM-DD,
    generate a billboard link with that scope"""
    billboard_url_w_date = f"https://www.billboard.com/charts/hot-100/{date}"
    return billboard_url_w_date

def request_url(url):
    """Given a URL, generate a corresponding request to get the information as soup."""
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content)
    #time.sleep(0.1)
    return soup

def finding_top100_elements_from_soup(soup, additional_string):
    """Given a soup from the billboard100, extract the top100 elements as a list,
    this will be used to create a psv (pipe sepparated values)"""
    row_containers_top100 = soup.find_all('div', class_="o-chart-results-list-row-container")
    saving_elements = []
    for element_row in row_containers_top100:
        text_from_element = element_row.find_all('ul')[0].text.replace('\n', '').replace('\t', '|' )
        if "|NEW|" in text_from_element:
            text_from_element = text_from_element.replace('|NEW|', '||')
        if "|RE-ENTRY|" in text_from_element:
            text_from_element = text_from_element.replace('|RE-ENTRY|', '||')
        text_from_element = text_from_element + f'||{additional_string}'

        # Removes duplicated pipes, keeps only one.
        output_string = re.sub(r'\|{2,}', '|', text_from_element)
        saving_elements.append(output_string)
    return saving_elements

In [23]:
def get_info(date):
    url = get_billboard_url_for_date(date)
    soup = request_url(url)
    top100 = finding_top100_elements_from_soup(soup, date)
    return top100

## Download

In [27]:
%%time
billboard_100_info = dates.astype('str').apply(get_info)

CPU times: user 10min 16s, sys: 265 ms, total: 10min 16s
Wall time: 20min 51s


## Save Data

In [28]:
pd.DataFrame(billboard_100_info).to_pickle("../data/billboard.pkl")

In [30]:
exploded_billboard_100 = billboard_100_info.explode()

0                                 |1|Smooth|Santana Featuring Rob Thomas|1|1|24|1|1|24|2000-01-03
0                                          |2|Back At One|Brian McKnight|2|2|20|2|2|20|2000-01-03
0                            |3|I Wanna Love You Forever|Jessica Simpson|3|3|13|3|3|13|2000-01-03
0                                |4|My Love Is Your Love|Whitney Houston|4|4|19|4|4|19|2000-01-03
0       |5|Hot Boyz|Missy "Misdemeanor" Elliott Featuring NAS, EVE & Q-Tip|7|5|7|7|5|7|2000-01-03
                                                  ...                                            
1304                         |96|Dos Dias|Tito Double P & Peso Pluma|88|51|16|88|51|16|2024-12-30
1304                                                 |97|25|Rod Wave|94|16|11|94|16|11|2024-12-30
1304                                         |98|Popular|Ariana Grande|73|53|5|73|53|5|2024-12-30
1304                  |99|Think I'm In Love With You|Chris Stapleton|89|49|34|89|49|34|2024-12-30
1304                