![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | Web Scraping Single Page

#### Instructions - Scraping popular songs

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: [https://www.billboard.com/charts/hot-100](https://www.billboard.com/charts/hot-100).

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

We first import BeautifulSoup for the scraping, and requests to connect to the page we want to scrape data from

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

We then set the url of the page we want to take the information from and check connection status

In [2]:
url = "https://www.billboard.com/charts/hot-100"
response = requests.get(url)
response.status_code

200

Set the parser

In [3]:
soup = BeautifulSoup(response.content,'html.parser')

we try to find the information in the page code. Each entry in the rank is wrapped in a button. here is the first entry: 


In [4]:
soup.select("button.chart-element__wrapper ")[0]

<button class="chart-element__wrapper display--flex flex--grow sort--default">
<span class="chart-element__rank flex--column flex--xy-center flex--no-shrink">
<span class="chart-element__rank__number">1</span>
<span class="chart-element__trend chart-element__trend--rising color--up"><i class="fa fa-arrow-up"><span class="sr--only">Rising</span></i></span>
</span>
<span class="chart-element__information">
<span class="chart-element__information__song text--truncate color--primary">Save Your Tears</span>
<span class="chart-element__information__artist text--truncate color--secondary">The Weeknd &amp; Ariana Grande</span>
<span class="chart-element__information__delta color--secondary">
<span class="chart-element__information__delta__text text--default">+5</span>
<span class="chart-element__information__delta__text text--last">6 Last Week</span>
<span class="chart-element__information__delta__text text--peak">1 Peak Rank</span>
<span class="chart-element__information__delta__text text--we

and from here we have to extract each of the data points:

In [5]:
# song name 
soup.select("span.chart-element__information span.chart-element__information__song")[0].text

'Save Your Tears'

In [6]:
# artist name 
soup.select("span.chart-element__information span.chart-element__information__artist")[0].text

'The Weeknd & Ariana Grande'

In [7]:
#rank
soup.select("span.chart-element__rank__number")[0].text

'1'

In [8]:
#last week rank 
soup.select("button.chart-element__wrapper ")[0].select("span.chart-element__metas span.chart-element__meta")[0].text

'6'

In [9]:
#peak rank 
soup.select("button.chart-element__wrapper ")[0].select("span.chart-element__metas span.chart-element__meta")[1].text

'1'

In [10]:
#weeks on chart 
soup.select("button.chart-element__wrapper ")[0].select("span.chart-element__metas span.chart-element__meta")[2].text

'20'

We'll use tqdm to check the progress of the page read 

In [11]:
from tqdm.notebook import tqdm

In [12]:
rank = []
song = []
artist = []
last_week = []
peak = []
weeks = []

for i in tqdm(range(len(soup.select("span.chart-element__rank__number")))):
    rank.append(soup.select("span.chart-element__rank__number")[i].text)
    song.append(soup.select("span.chart-element__information span.chart-element__information__song")[i].text)
    artist.append(soup.select("span.chart-element__information span.chart-element__information__artist")[i].text)
    last_week.append(soup.select("button.chart-element__wrapper ")[i].select("span.chart-element__metas span.chart-element__meta")[0].text) 
    peak.append(soup.select("button.chart-element__wrapper ")[i].select("span.chart-element__metas span.chart-element__meta")[1].text)
    weeks.append(soup.select("button.chart-element__wrapper ")[i].select("span.chart-element__metas span.chart-element__meta")[2].text)

  0%|          | 0/100 [00:00<?, ?it/s]

and create a dataframe with the information

In [13]:
chart = pd.DataFrame({'rank':rank,
                      'song':song,
                      'artist':artist,
                      'last_week_rank':last_week,
                      'peak_rank':peak,
                      'weeks':weeks
})

In [14]:
chart

Unnamed: 0,rank,song,artist,last_week_rank,peak_rank,weeks
0,1,Save Your Tears,The Weeknd & Ariana Grande,6,1,20
1,2,Leave The Door Open,Silk Sonic (Bruno Mars & Anderson .Paak),2,1,8
2,3,Peaches,Justin Bieber Featuring Daniel Caesar & Giveon,3,1,6
3,4,Rapstar,Polo G,1,1,3
4,5,Levitating,Dua Lipa Featuring DaBaby,5,5,30
...,...,...,...,...,...,...
95,96,4 Da Gang,42 Dugg & Roddy Ricch,100,67,4
96,97,Blame It On You,Jason Aldean,-,97,1
97,98,Wasted On You,Morgan Wallen,92,9,16
98,99,Way Less Sad,AJR,-,99,1


In the end we'll save the dataframe as a csv

In [16]:
chart.to_csv("../data/top-100-songs-chart.csv")