# Web Scraping

## Libraries

In [1]:
# main
import pandas as pd
import numpy as np

# web scraping
from bs4 import BeautifulSoup
import requests

## Web Scraping "Billboard 100"

### "Soup" Preparation

On this few lines of code, I will prepare the "soup" for getting the songs from the **Billboard 100**

In [2]:
url = "https://www.billboard.com/charts/hot-100"

In [3]:
response = requests.get(url)

In [4]:
response

<Response [200]>

In [5]:
soup = BeautifulSoup(response.content, "html.parser")

### Scraping songs

First of all, let's test the code to obtain the name of the first song. Once the result returns the same output as the one from the web, I will loop through it to obtaing the rest of them.

In [6]:
soup.select("span.chart-element__information__song.text--truncate.color--primary")[0].get_text()

'Mood'

In [7]:
song = []

for s in soup.select("span.chart-element__information__song.text--truncate.color--primary"):
    song.append(s.get_text())

In [8]:
# Check that the lenght is 100, the amount of songs it should have
len(song)

100

### Scraping artists

As with the songs, first of all let's find the line of code that returs the first artist of the list. Once we know it's the correct one, then we will loop through it like the step before.

In [9]:
soup.select("span.chart-element__information__artist.text--truncate.color--secondary")[0].get_text()

'24kGoldn Featuring iann dior'

In [10]:
artist = []

for a in soup.select("span.chart-element__information__artist.text--truncate.color--secondary"):
    artist.append(a.get_text())

In [11]:
# Check that the lenght is 100, the amount of artists it should have
len(artist)

100

## Creating DataFrame

Once we have the **Song Name** and the **Artist**, we can create a `DataFrame` for better visualization and being able to store it as a *csv* for the next part of the project.

In [12]:
billboard_hot100 = pd.DataFrame({"song":song,
                         "artist(s)":artist
                         })

In [13]:
billboard_hot100

Unnamed: 0,song,artist(s)
0,Mood,24kGoldn Featuring iann dior
1,Positions,Ariana Grande
2,Blinding Lights,The Weeknd
3,Holy,Justin Bieber Featuring Chance The Rapper
4,Dynamite,BTS
...,...,...
95,So Done,The Kid LAROI
96,Just The Way,Parmalee x Blanco Brown
97,Way Out,Jack Harlow Featuring Big Sean
98,M3tamorphosis,Playboi Carti Featuring Kid Cudi


In [14]:
# Storing it as a CSV
billboard_hot100.to_csv("../data/billboard100.csv", index=False)