# Web Scraping - Part I 

## Scrape Billboard 100 Hot Songs

Create a function to scrape the Billboards 100 HOT songs and create a local dataframe of songs with them including:

- Song’s name
- Song’s artis
- Song’s album
- Song’s release year

## Libraries 

In [41]:
from urllib.request import urlopen 
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [None]:
# 2. find url and store it in a variable
#url = "https://www.billboard.com/charts/hot-100"
url = "http://www.discjockey.org/top-100-songs-of-the-1950s/"

In [None]:
# 3. download html with a get request
response = requests.get(url)

In [85]:
response.status_code # 200 status code means OK!

200

In [86]:
# 4.1. parse html (create the 'soup')
soup = BeautifulSoup(response.content, "html.parser")

# 4.2. check that the html code looks like it should
soup

<!DOCTYPE html>

<html class="" lang="">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1, user-scalable=no" name="viewport"/>
<title>The Hot 100 Chart | Billboard</title>
<meta content="The Hot 100 Chart" name="title" property="title">
<meta content="@billboard" name="twitter:site"/>
<meta content="Billboard" property="og:site_name">
<meta content="article" property="og:type">
<link href="/manifest.json" rel="manifest"/>
<style>
        .chart-pro-access {
            background-image: url('https://www.billboard.com/assets/1629982496/images/piano/chart-pro-access-mb.png?c5ccab1679d336f9b241');
        }

        @media (min-width: 769px) {
            .chart-pro-access {
                background-image: url('https://www.billboard.com/assets/1629982496/images/piano/chart-pro-access-dk.png?c5ccab1679d336f9b241');
            }
        }
    </style>
<script async="async" data-cfasync="false" src="ht

In [87]:
# improve visual code if necessary 
print(soup.prettify())

<!DOCTYPE html>
<html class="" lang="">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1, user-scalable=no" name="viewport"/>
  <title>
   The Hot 100 Chart | Billboard
  </title>
  <meta content="The Hot 100 Chart" name="title" property="title">
   <meta content="@billboard" name="twitter:site"/>
   <meta content="Billboard" property="og:site_name">
    <meta content="article" property="og:type">
     <link href="/manifest.json" rel="manifest"/>
     <style>
      .chart-pro-access {
            background-image: url('https://www.billboard.com/assets/1629982496/images/piano/chart-pro-access-mb.png?c5ccab1679d336f9b241');
        }

        @media (min-width: 769px) {
            .chart-pro-access {
                background-image: url('https://www.billboard.com/assets/1629982496/images/piano/chart-pro-access-dk.png?c5ccab1679d336f9b241');
            }
        }
     </style>
     <script a

In [None]:
# 5. retrieve/extract the desired info (here, you'll paste the "Selector" you copied before to get the element that belongs to the top movie)
titles = soup.select("li button span span.chart-element__information__song.text--truncate.color--primary")
print(titles)

##musicTable > tbody > tr:nth-child(1) > td:nth-child(2)
##musicTable > tbody

In [89]:
#or use find_all function
titles = soup.find_all("span", class_="chart-element__information__song")
titles


[<span class="chart-element__information__song text--truncate color--primary">Butter</span>,
 <span class="chart-element__information__song text--truncate color--primary">Stay</span>,
 <span class="chart-element__information__song text--truncate color--primary">Bad Habits</span>,
 <span class="chart-element__information__song text--truncate color--primary">Good 4 U</span>,
 <span class="chart-element__information__song text--truncate color--primary">Kiss Me More</span>,
 <span class="chart-element__information__song text--truncate color--primary">Hurricane</span>,
 <span class="chart-element__information__song text--truncate color--primary">Industry Baby</span>,
 <span class="chart-element__information__song text--truncate color--primary">Levitating</span>,
 <span class="chart-element__information__song text--truncate color--primary">Fancy Like</span>,
 <span class="chart-element__information__song text--truncate color--primary">Jail</span>,
 <span class="chart-element__information__so

In [90]:
song_title = [song.getText() for song in titles]
song_title

['Butter',
 'Stay',
 'Bad Habits',
 'Good 4 U',
 'Kiss Me More',
 'Hurricane',
 'Industry Baby',
 'Levitating',
 'Fancy Like',
 'Jail',
 'Off The Grid',
 'Ok Ok',
 'Deja Vu',
 'Save Your Tears',
 'Montero (Call Me By Your Name)',
 'Junya',
 'Moon',
 'Family Ties',
 'Essence',
 'Praise God',
 'You Right',
 'Sharing Locations',
 'Heat Waves',
 'Need To Know',
 'Take My Breath',
 'Jesus Lord',
 'Jonah',
 'Believe What I Say',
 'Leave The Door Open',
 'God Breathed',
 'Peaches',
 'Rumors',
 'Leave Before You Love Me',
 'Heartbreak Anniversary',
 "Beggin'",
 'Pepas',
 'Waves',
 'Happier Than Ever',
 'Traitor',
 'Remote Control',
 'Forever After All',
 'Heaven And Hell',
 '24',
 'Arcade',
 'Things A Man Oughta Know',
 'Thot Shit',
 'Chasing After You',
 'Late At Night',
 'Every Chance I Get',
 "If I Didn't Love You",
 'Country Again',
 'Pure Souls',
 'No Child Left Behind',
 'Wockesha',
 'Cold Beer Calling My Name',
 'A-O-K',
 'Skate',
 'Donda',
 'Keep My Spirit Alive',
 'Whole Lotta Money',

In [91]:
# now we are doing the same with the artist
artist_name = soup.find_all("span", class_="chart-element__information__artist")
artist_name


[<span class="chart-element__information__artist text--truncate color--secondary">BTS</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">The Kid LAROI &amp; Justin Bieber</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Ed Sheeran</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Olivia Rodrigo</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Doja Cat Featuring SZA</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Kanye West</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Lil Nas X &amp; Jack Harlow</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Dua Lipa</span>,
 <span class="chart-element__information__artist text--truncate color--secondary">Walker Hayes</span>,
 <span class="chart-element__information__

In [92]:
artist = [a.getText() for a in artist_name]
artist

['BTS',
 'The Kid LAROI & Justin Bieber',
 'Ed Sheeran',
 'Olivia Rodrigo',
 'Doja Cat Featuring SZA',
 'Kanye West',
 'Lil Nas X & Jack Harlow',
 'Dua Lipa',
 'Walker Hayes',
 'Kanye West',
 'Kanye West',
 'Kanye West',
 'Olivia Rodrigo',
 'The Weeknd & Ariana Grande',
 'Lil Nas X',
 'Kanye West',
 'Kanye West',
 'Baby Keem & Kendrick Lamar',
 'Wizkid Featuring Justin Bieber & Tems',
 'Kanye West',
 'Doja Cat & The Weeknd',
 'Meek Mill Featuring Lil Baby & Lil Durk',
 'Glass Animals',
 'Doja Cat',
 'The Weeknd',
 'Kanye West',
 'Kanye West',
 'Kanye West',
 'Silk Sonic (Bruno Mars & Anderson .Paak)',
 'Kanye West',
 'Justin Bieber Featuring Daniel Caesar & Giveon',
 'Lizzo Featuring Cardi B',
 'Marshmello X Jonas Brothers',
 'Giveon',
 'Maneskin',
 'Farruko',
 'Luke Bryan',
 'Billie Eilish',
 'Olivia Rodrigo',
 'Kanye West',
 'Luke Combs',
 'Kanye West',
 'Kanye West',
 'Duncan Laurence',
 'Lainey Wilson',
 'Megan Thee Stallion',
 'Ryan Hurd With Maren Morris',
 'Roddy Ricch',
 'DJ Kh

In [93]:
# where do we get album and year from? --> does not work (for now)

In [116]:
df = pd.DataFrame(list(zip(artist, song_title)),
               columns =['artist_name', 'title'])
df

Unnamed: 0,artist_name,title
0,BTS,Butter
1,The Kid LAROI & Justin Bieber,Stay
2,Ed Sheeran,Bad Habits
3,Olivia Rodrigo,Good 4 U
4,Doja Cat Featuring SZA,Kiss Me More
...,...,...
95,J Balvin & Skrillex,In Da Getto
96,Olivia Rodrigo,Brutal
97,Nio Garcia X J Balvin X Bad Bunny,AM
98,Lil Tecca & Gunna,Repeat It


# Part II 

Steps: 

1. Input User = song title
2. Check if song is currently "hot"
    2a if yes, recommend another hot song 
    2b if no, get audio features of the song + recommend a song that sounds similar 

In [133]:
song = ""
def user_input_song():
    song = input("What's the title of your favourite song? ")
    return song

In [134]:
def is_it_hot(): 
    if user_input_song() in df.values:
        return True 
    else: 
        return False

In [135]:
is_it_hot()

What's the title of your favourite song? repeat it


False

## Part III 

In [2]:
#1. Get Top 100 songs from each decade
#2. Filter per decade, genre, length 

In [2]:
def get_decade():
    decade = input("What decade you want to hear? (1950s, 1960s, ..., 2020s)")
    return decade

In [3]:
#get_decade()

In [4]:
def get_genre():
    genre = input("What mood are you in?")
    print("Here's a list of genres you can choose from:", genre.unique)
    return genre

In [None]:
#get_genre()

In [1]:
import inquirer
questions = [
  inquirer.List('size',
                message="What size do you need?",
                choices=['Jumbo', 'Large', 'Standard', 'Medium', 'Small', 'Micro'],
            ),
]
answers = inquirer.prompt(questions)
print(answers["size"])

ModuleNotFoundError: No module named 'inquirer'

In [None]:
questions

# New Attempt

In [58]:
df_50 = pd.read_html("http://www.discjockey.org/top-100-songs-of-the-1950s/")
df_60 = pd.read_html("http://www.discjockey.org/top-100-songs-of-the-1960s/")
df_70 = pd.read_html("http://www.discjockey.org/top-100-songs-of-the-1970s/")
df_80 = pd.read_html("http://www.discjockey.org/top-100-songs-of-the-1980s/")
df_90 = pd.read_html("http://www.discjockey.org/top-100-songs-of-the-1990s/")
df_00 = pd.read_html("http://www.discjockey.org/top-100-songs-of-the-2000s/")

In [59]:
#print(df_50[0].tail(15))

df1 = df_50[0]
#df1 = df.drop(index = 100)
df1.drop(df1.tail(1).index,inplace=True)
df1

Unnamed: 0,Rank,Song Title,Song Artist,Year,Genre
0,1,That's Amore,Dean Martin,1953,Oldies
1,2,Come Fly With Me,Frank Sinatra,1958,Oldies
2,3,Jailhouse Rock,Elvis Presley,1957,Oldies
3,4,I Walk The Line,Johnny Cash,1956,Country
4,5,I've Got You Under My Skin,Frank Sinatra,1953,Oldies
...,...,...,...,...,...
95,96,Loving You,Elvis Presley,1957,Ballad
96,97,My Prayer,Platters,1956,Oldies
97,98,Sincerely,McGuire Sisters,1955,Oldies
98,99,Cherry Pink And Apple Blossom White,Perez Prado,1955,Oldies


In [60]:
df2 = df_60[0]
df2.drop(df2.tail(1).index,inplace=True)
df2

Unnamed: 0,Rank,Song Title,Song Artist,Year,Genre
0,1,Sweet Caroline (Good Times Never Seemed So Good),Neil Diamond,1969,Oldies
1,2,Shout,Otis Day And The Knights/Isley Brothers,1967,Oldies
2,3,Brown Eyed Girl,Van Morrison,1967,Oldies
3,4,The Way You Look Tonight,Frank Sinatra,1964,Ballad
4,5,Twist And Shout,Beatles,1963,Oldies
...,...,...,...,...,...
95,96,Born To Be Wild,Steppenwolf,1968,Oldies
96,97,Down On The Corner/Fortunate Son,Creedence Clearwater Revival,1969,Oldies
97,98,This Magic Moment,Drifters,1960,Oldies
98,99,My Cherie Amour,Stevie Wonder,1969,Oldies


In [61]:
df3 = df_70[0]
df3.drop(df3.tail(1).index,inplace=True)
df3

Unnamed: 0,Rank,Song Title,Song Artist,Year,Genre
0,1,Wonderful Tonight,Eric Clapton,1978,Ballad
1,2,YMCA,Village People,1975,Disco
2,3,Sweet Home Alabama,Lynyrd Skynyrd,1974,Rock
3,4,We Are Family,Sister Sledge,1979,Popular
4,5,Old Time Rock & Roll,Bob Seger & The Silver Bullet Band,1978,Rock
...,...,...,...,...,...
95,96,Soul Man,Blues Brothers,1979,Rock
96,97,Born To Run,Bruce Springsteen,1975,Rock
97,98,My Sharona,Knack,1979,Oldies
98,99,Gimme Three Steps,Lynyrd Skynyrd,1973,Rock


In [62]:
df4 = df_80[0]
df4.drop(df4.tail(1).index,inplace=True)
df4

Unnamed: 0,Rank,Song Title,Song Artist,Year,Genre
0,1,Don't Stop Believin',Journey,1981,Rock
1,2,You Shook Me All Night Long,AC/DC,1980,Rock
2,3,Love Shack,B-52's,1989,Popular
3,4,Livin' On A Prayer,Bon Jovi,1986,Rock
4,5,Pour Some Sugar On Me,Def Leppard,1987,Rock
...,...,...,...,...,...
95,96,Heaven,Bryan Adams,1985,Oldies
96,97,Here And Now,Luther Vandross,1989,Ballad
97,98,Nothin' But A Good Time,Poison,1988,Rock
98,99,Its Raining Men,Weather Girls,1983,Funk


In [63]:
df5 = df_90[0]
df5.drop(df5.tail(1).index,inplace=True)
df5

Unnamed: 0,Rank,Song Title,Song Artist,Year,Genre
0,1,Electric Slide,Marcia Griffiths,1990,Popular
1,2,Baby Got Back,Sir Mix-A-Lot,1992,Popular
2,3,Friends In Low Places,Garth Brooks,1990,Country
3,4,Cotton Eye Joe,Rednex,1994,Country
4,5,Macarena,Los Del Rio,1995,Popular
...,...,...,...,...,...
95,96,Barbie Girl,Aqua,1997,Popular
96,97,Whatta Man,Salt 'N Pepa,1994,Hip Hop
97,98,I Like It I Love It,Tim McGraw,1995,Country
98,99,Somewhere over the Rainbow,Israel Kamakawiwow'ole,1993,Easy Listening


In [64]:
df6 = df_00[0]
df6.drop(df6.tail(1).index,inplace=True)
df6

Unnamed: 0,Rank,Song Title,Song Artist,Year,Genre
0,1,Cupid Shuffle,Cupid,2007,Popular
1,2,Cha-Cha Slide,Mr. C The Slide Man,2000,Popular
2,3,I Gotta Feeling,Black Eyed Peas,2009,Popular
3,4,Single Ladies (Put A Ring On It),Beyonce,2008,Popular
4,5,Wobble,V.I.C.,2008,Popular
...,...,...,...,...,...
95,96,The Way You Move,Outkast Featuring Sleepy Brown,2003,Hip Hop
96,97,Big Green Tractor,Jason Aldean,2009,Country
97,98,Remember When,Alan Jackson,2001,Ballad
98,99,Shake It,Metro Station,2008,Popular


In [None]:
#combine all dataframes 
df_all = pd.concat(df_50, df_60)

TypeError: unhashable type: 'list'