<h1>Lab | Web Scraping Single Page</h1>

**Business goal:**
1) Check the case_study_gnod.md file.

2) Make sure you've understood the big picture of your project:
- the goal of the company (Gnod),
- their current product (Gnoosic),
- their strategy, and
- how your project fits into this context.

Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

**Instructions - Scraping popular songs**

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100.

It's a good place to start!

**The goal of this lab is to create a function: scrape_hot100()** to scrape the current top 100 songs present at https://www.billboard.com/charts/hot-100 and their respective artists, put the information into a pandas dataframe, and save the dataframe in a csv file in the current folder.

In [1]:
# importing libraries

import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [2]:
# find url and store it in a variable
url = "https://www.billboard.com/charts/hot-100/"

# download html with a get request
response = requests.get(url)

print(response.status_code) # 200 status code means OK!

200


In [3]:
# parsing html (create the 'soup')
soup = BeautifulSoup(response.text, 'html.parser')
# print(soup.prettify())

In [4]:
num_iter = len(soup.select("li.o-chart-results-list__item > h3"))
num_iter

100

In [5]:
# searching for the names of the songs and singer names

for songs in soup.find_all("li",{"class":"o-chart-results-list__item // lrv-u-flex-grow-1 lrv-u-flex lrv-u-flex-direction-column lrv-u-justify-content-center lrv-u-border-b-1 u-border-b-0@mobile-max lrv-u-border-color-grey-light lrv-u-padding-l-050 lrv-u-padding-l-1@mobile-max"}):
    print(songs.find_all("h3")[0].get_text().replace("\t","").replace("\n",""))

Lift Me Up
Unholy
Bad Habit
As It Was
Lavender Haze
Midnight Rain
I Like You (A Happier Song)
Bejeweled
Super Freaky Girl
Shirt
Maroon
I Ain't Worried
You Proof
I'm Good (Blue)
Snow On The Beach
Karma
Vegas
Sunroof
You're On Your Own, Kid
Under The Influence
Wasted On You
In My Head
Vigilante Shit
Jimmy Cooks
Thriller
Die For You
Wait For U
Something In The Orange
Titi Me Pregunto
Cuff It
Question...?
About Damn Time
Late Night Talking
Tomorrow 2
The Kind Of Love We Make
She Had Me At Heads Carolina
Mastermind
Unstoppable
Hold Me Closer
Me Porto Bonito
Labyrinth
Thank God
5 Foot 9
Sweet Nothing
Golden Hour
Monster Mash
Just Wanna Rock
Ghostbusters
Fall In Love
The Astronaut
Half Of Me
Until I Found You
Rock And A Hard Place
California Breeze
Victoria’s Secret
Son Of A Sinner
What My World Spins Around
Would've, Could've, Should've
Star Walkin' (League Of Legends Worlds Anthem)
Heyy
Music For A Sushi Restaurant
Made You Look
Forever
Romantic Homicide
Don't Come Lookin'
Wishful Drinking


In [6]:
for songs in soup.find_all("li",{"class":"o-chart-results-list__item // lrv-u-flex-grow-1 lrv-u-flex lrv-u-flex-direction-column lrv-u-justify-content-center lrv-u-border-b-1 u-border-b-0@mobile-max lrv-u-border-color-grey-light lrv-u-padding-l-050 lrv-u-padding-l-1@mobile-max"}):
    print(songs.find_all("span")[0].get_text().replace("\t","").replace("\n",""))

Rihanna
Sam Smith & Kim Petras
Steve Lacy
Harry Styles
Taylor Swift
Taylor Swift
Post Malone Featuring Doja Cat
Taylor Swift
Nicki Minaj
SZA
Taylor Swift
OneRepublic
Morgan Wallen
David Guetta & Bebe Rexha
Taylor Swift Featuring Lana Del Rey
Taylor Swift
Doja Cat
Nicky Youre & dazy
Taylor Swift
Chris Brown
Morgan Wallen
Juice WRLD
Taylor Swift
Drake Featuring 21 Savage
Michael Jackson
The Weeknd
Future Featuring Drake & Tems
Zach Bryan
Bad Bunny
Beyonce
Taylor Swift
Lizzo
Harry Styles
GloRilla & Cardi B
Luke Combs
Cole Swindell
Taylor Swift
Sia
Elton John & Britney Spears
Bad Bunny & Chencho Corleone
Taylor Swift
Kane Brown With Katelyn Brown
Tyler Hubbard
Taylor Swift
JVKE
Bobby "Boris" Pickett And The Crypt-Kickers
Lil Uzi Vert
Ray Parker Jr.
Bailey Zimmerman
JIN
Thomas Rhett Featuring Riley Green
Stephen Sanchez
Bailey Zimmerman
Lil Baby
Jax
Jelly Roll
Jordan Davis
Taylor Swift
Lil Nas X
Lil Baby
Harry Styles
Meghan Trainor
Lil Baby Featuring Fridayy
d4vd
Jackson Dean
Ingrid Andress

In [7]:
song_name = soup.select('#post-1479786 > div.pmc-paywall > div > div > div > div > div > ul > li.lrv-u-width-100p > ul > li > span')[0]

song_name = [text.get_text() for text in song_name]

print(song_name)


['\n\t\n\tTaylor Swift\n']


In [8]:
soup.select('#post-1479786 > div.pmc-paywall > div > div > div > div > div > ul > li.lrv-u-width-100p > ul > li > h3')[0]


<h3 class="c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 u-font-size-23@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max a-truncate-ellipsis u-max-width-245 u-max-width-230@tablet-only u-letter-spacing-0028@tablet" id="title-of-a-story">

	
	
		
					Anti-Hero		
	
</h3>

In [9]:

for songs in soup.find_all("li",{"class":"o-chart-results-list__item // lrv-u-flex-grow-1 lrv-u-flex lrv-u-flex-direction-column lrv-u-justify-content-center lrv-u-border-b-1 u-border-b-0@mobile-max lrv-u-border-color-grey-light  lrv-u-padding-l-1@mobile-max"}):
        print(songs.find_all("h3")[0].get_text().replace("\t","").replace("\n",""))

        print(songs.find_all("span")[0].get_text().replace("\t","").replace("\n",""))


In [10]:
# Creating a function for web scraping

def scrape_hot100():
    url = "https://www.billboard.com/charts/hot-100/"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    song_name = []
    song_artist = []
    first_name = soup.select('#post-1479786 > div.pmc-paywall > div > div > div > div > div > ul > li.lrv-u-width-100p > ul > li > h3')[0].get_text().replace("\t","").replace("\n","")
    song_name.append(first_name)
    first_artist = soup.select('#post-1479786 > div.pmc-paywall > div > div > div > div > div > ul > li.lrv-u-width-100p > ul > li > span')[0].get_text().replace("\t","").replace("\n","")
    song_artist.append(first_artist)
    for songs in soup.find_all("li",{"class":"o-chart-results-list__item // lrv-u-flex-grow-1 lrv-u-flex lrv-u-flex-direction-column lrv-u-justify-content-center lrv-u-border-b-1 u-border-b-0@mobile-max lrv-u-border-color-grey-light lrv-u-padding-l-050 lrv-u-padding-l-1@mobile-max"}):
        name=songs.find_all("h3")[0].get_text().replace("\t","").replace("\n","")
        song_name.append(name)
        artist=songs.find_all("span")[0].get_text().replace("\t","").replace("\n","")
        song_artist.append(artist)
    hot100=pd.DataFrame({
    "song_name":song_name,
    "song_artist":song_artist})
    hot100.to_csv('hot100.csv', index=False)

In [11]:
scrape_hot100()

Now we have a file 'hot100.csv' with 100 hot songs.