## Web Scraping Lab 1:


### Prepare your project:

#### Business goal:

Make sure you've understood the big picture of your project: the goal of the company (Gnod), their current product (Gnoosic), their strategy, and how your project fits into this context. Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to acomplish.

#### Scraping popular songs:

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputed song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard mantains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100. 

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [4]:
# Import necessary tools

import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm.notebook import tqdm

In [5]:
url = "https://www.billboard.com/charts/hot-100"

In [6]:
response = requests.get(url)

In [7]:
response.status_code

200

In [8]:
soup = BeautifulSoup(response.content, 'html.parser')

In [None]:
soup.prettify

In [10]:
soup.select("span.chart-element__information__song")[0].text

'Butter'

In [11]:
soup.select("span.chart-element__information__artist")[0].text

'BTS'

In [12]:
soup.select("span.chart-element__rank__number")[0].text

'1'

In [13]:
song = []
artist = []
rank = []

len_charts = len(soup.select('span.chart-element__information__song'))

In [14]:
from tqdm.notebook import tqdm

for i in tqdm(range(len_charts)):
    song.append(soup.select("span.chart-element__information__song")[i].text)
    artist.append(soup.select("span.chart-element__information__artist")[i].text)
    rank.append(soup.select("span.chart-element__rank__number")[i].text)

  0%|          | 0/100 [00:00<?, ?it/s]

In [15]:
billboard100 = pd.DataFrame({"song":song, "artist":artist, "rank":rank, "source":"Billboard100"})

In [16]:
billboard100.head()

Unnamed: 0,song,artist,rank,source
0,Butter,BTS,1,Billboard100
1,Good 4 U,Olivia Rodrigo,2,Billboard100
2,Levitating,Dua Lipa Featuring DaBaby,3,Billboard100
3,Kiss Me More,Doja Cat Featuring SZA,4,Billboard100
4,Montero (Call Me By Your Name),Lil Nas X,5,Billboard100


### Bonus

Can you find other websites with lists of "hot" songs? What about songs that were popular on a certain decade? 

You can scrape more lists and add extra features to the project.

In [None]:
# The 90's are great, so here's a playlist
# Obviously 'I want it that way' is #1, no questions asked

http://www.discjockey.org/top-100-songs-of-the-1990s/

In [65]:
url2 = "http://www.discjockey.org/top-100-songs-of-the-1990s/"

In [66]:
response = requests.get(url2)

In [67]:
response.status_code

200

In [68]:
soup = BeautifulSoup(response.content, 'html.parser')

In [None]:
soup.prettify

In [76]:
soup.select("tbody > tr:nth-child(1) > td:nth-child(2)")[0].text

'Electric Slide'

In [89]:
iterations = range(1,101,1)

In [95]:
song2 = []

for i in tqdm(iterations):
    search = "tbody > tr:nth-child(" + str(i) +") > td:nth-child(2)"
    song2.append(soup.select(search)[0].text)

  0%|          | 0/100 [00:00<?, ?it/s]

In [96]:
artist2 = []

for i in tqdm(iterations):
    search = "tbody > tr:nth-child(" + str(i) +") > td:nth-child(3)"
    artist2.append(soup.select(search)[0].text)

  0%|          | 0/100 [00:00<?, ?it/s]

In [97]:
rank2 = []

for i in tqdm(iterations):
    search = "tbody > tr:nth-child(" + str(i) +") > td:nth-child(1)"
    rank2.append(soup.select(search)[0].text)

  0%|          | 0/100 [00:00<?, ?it/s]

In [98]:
Top90s = pd.DataFrame({"song":song2, "artist":artist2, "rank":rank2, "source":"Top100 90s"})

In [99]:
Top90s.head()

Unnamed: 0,song,artist,rank,source
0,Electric Slide,Marcia Griffiths,1,Top100 90s
1,Baby Got Back,Sir Mix-A-Lot,2,Top100 90s
2,Friends In Low Places,Garth Brooks,3,Top100 90s
3,Cotton Eye Joe,Rednex,4,Top100 90s
4,Macarena,Los Del Rio,5,Top100 90s
