# Lab | Web Scraping Single Page

### Instructions - Scraping popular songs

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. 

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [261]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from random import randint
from time import sleep
import random

In [262]:
url = 'https://www.popvortex.com/music/charts/top-100-songs.php'

In [263]:
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [264]:
soup = BeautifulSoup(response.content, "html.parser")

In [265]:
soup.select('#chart-position-1 > div.chart-content.col-xs-12.col-sm-8 > p')

[<p class="title-artist"><cite class="title">Unholy</cite><em class="artist">Sam Smith &amp; Kim Petras</em></p>]

In [266]:
top100_songs = []
top100_artist = []
year = []

num_iter = len(soup.select(".title-artist"))
song = soup.select(".title")
artist = soup.select(".artist")

for i in range(num_iter):
    top100_songs.append(song[i].get_text())
    top100_artist.append(artist[i].get_text())
    year.append(2022)

#print(top100_songs)
#print(top100_artist)

In [267]:
top100_songs_df = pd.DataFrame({'title':top100_songs,
                                'artist':top100_songs,
                                'year':year})

In [268]:
top100_songs_df.head()

Unnamed: 0,title,artist,year
0,Unholy,Unholy,2022
1,I'm Good (Blue),I'm Good (Blue),2022
2,wait in the truck,wait in the truck,2022
3,Thank God,Thank God,2022
4,Everywhere,Everywhere,2022


In [269]:
#ask user for a song he/she likes
music = input("\nEnter your music? ") 

#get a random music form the top100
random_music = random.choice(top100_songs) 
    
if music in top100_songs_df['title'].values:
    print('Great choice. Here is another music from the Top 100: ',random_music)
else:
    print('Oh, bad luck! Try again tomorrow or listen to one of the musics from the Top 100: ', random_music)



Enter your music? Thank God
Great choice. Here is another music from the Top 100:  Build a Boat


# Lab | Web Scraping Multiple Pages

#### Expand the project

If you're done, you can try to expand the project on your own. Here are a few suggestions:

 - Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!

In [270]:
url = "https://playback.fm/charts/top-100-songs/2000"

In [271]:
response = requests.get(url)
response.status_code

200

In [272]:
soup = BeautifulSoup(response.content, "html.parser")

In [273]:
iterations = range(2000, 2022)
for i in iterations:
    print(i)


2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021


In [274]:
pages = []

iterations = range(2000, 2022)
[i for i in iterations]

for i in iterations:
    year= str(i)
    url = "https://playback.fm/charts/top-100-songs/" + year
    
    response = requests.get(url)

    # monitor the process by printing the status code
    #print("Status code: " + str(response.status_code))

    # store response into "pages" list
    pages.append(response)

    # respectful nap:
    wait_time = randint(1,2)
    sleep(wait_time)
    

In [275]:
new_song_list = []
new_artist_list = []
year = []


for i in range(len(pages)):
    # parse all pages
    soup = BeautifulSoup(pages[i].content, "html.parser")
    
    songs_catalogue = soup.select('#myTable')
    
    song = soup.select(".song a")
    artist = soup.select(".artist")
    
    
  
    
    for j in range(len(song)):
        new_song_list.append(song[j].get_text().replace('\n','')) 
        new_artist_list.append(artist[j].get_text().replace('\n',''))
        year.append(i)
            
#print(new_song_list)
#print(new_artist_list)
#print(year)


In [276]:
# Turn new list into a dataframe

top100_new_list_df = pd.DataFrame({'title':new_song_list,
                                  'artist':new_artist_list,
                                  'year':year})
top100_new_list_df.head()

Unnamed: 0,title,artist,year
0,Music,Madonna,0
1,Beautiful Day,U2,0
2,"Bye, Bye, Bye",N Sync,0
3,Stan,Eminem,0
4,Oops!... I Did it Again,Britney Spears,0


In [277]:
top100_new_list_df['year']=top100_new_list_df['year'].replace(to_replace = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
                                                              value = [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022])

In [278]:
full_list_df = pd.concat([top100_songs_df,top100_new_list_df], axis=0)

In [279]:
full_list_df.shape

(2298, 3)

In [280]:
full_list_df = full_list_df.drop_duplicates()

In [281]:
full_list_df.shape

(2296, 3)

In [283]:
full_list_df

Unnamed: 0,title,artist,year
0,Unholy,Unholy,2022
1,I'm Good (Blue),I'm Good (Blue),2022
2,wait in the truck,wait in the truck,2022
3,Thank God,Thank God,2022
4,Everywhere,Everywhere,2022
...,...,...,...
2193,Leave Before You Love Me,Marshmello & Jonas Brothers,2021
2194,Beggin,Maneskin,2021
2195,Famous Friends,Chris Young + Kane Brown,2021
2196,Lil Bit,Nelly & Florida Georgia Line,2021


 - Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.