## Web Scraping Lab 1:


### Prepare your project:

#### Business goal:

Make sure you've understood the big picture of your project: the goal of the company (Gnod), their current product (Gnoosic), their strategy, and how your project fits into this context. Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to acomplish.

#### Scraping popular songs:

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputed song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard mantains a weekly Top 100 of "hot" songs here: https://www.billboard.com/charts/hot-100. 

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

# Setup

In [3]:
# Import libraries
import requests # to download html code
from bs4 import BeautifulSoup # to navigate through the html code
import pandas as pd
import numpy as np
import re # for cleanup
import spotipy
from config import *
import json
from spotipy.oauth2 import SpotifyClientCredentials

In [4]:
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id= Client_ID, client_secret=Client_Secret))

In [5]:
# url used for top100
url = "https://www.billboard.com/charts/hot-100"

In [6]:
# Download html
response = requests.get(url)
# 200 status code means OK! response.status_code
print(response.status_code)

200


In [7]:
# Parse html
soup = BeautifulSoup(response.text, 'html.parser')
# Check up
# soup

# Scraping

## Top 100

### Songname

In [8]:
# charts > div > div.chart-list__wrapper > div > ol > li:nth-child(1) > button > span.chart-element__information > span.chart-element__information__song.text--truncate.color--primary

In [9]:
songnametext = soup.select("span.chart-element__information__song")[0].get_text()
songnametext

'Montero (Call Me By Your Name)'

In [10]:
Songnames = []
for elem in soup.select("span.chart-element__information__song"):
    Songnames.append(elem.get_text())

print(Songnames)

['Montero (Call Me By Your Name)', 'Peaches', 'Leave The Door Open', 'Up', 'Drivers License', 'Save Your Tears', 'Levitating', 'Blinding Lights', 'Mood', 'What You Know Bout Love', 'Tombstone', 'Astronaut In The Ocean', "What's Next", '34+35', 'Go Crazy', 'Street Runner', 'Best Friend', 'Calling My Phone', 'Therefore I Am', 'Back In Blood', 'You Broke Me First.', 'Richer', "You're Mines Still", 'Beat Box', 'Positions', 'On Me', 'Dakiti', 'Heartbreak Anniversary', "My Ex's Best Friend", 'Dynamite', 'Without You', 'The Good Ones', 'Beautiful Mistakes', 'Wants And Needs', 'Streets', 'Anyone', 'For The Night', 'Whoopty', 'No More Parties', 'Good Days', 'Starting Over', "What's Your Country Song", 'Put Your Records On', 'Cry Baby', 'Long Live', 'Hold On', 'Track Star', "We're Good", 'Hard For The Next', 'Heat Waves', "You All Over Me (Taylor's Version) (From The Vault)", 'Lady', 'Telepatia', 'Forever After All', 'SoulFly', 'Goosebumps', 'My Head And My Heart', 'Time Today', 'Willow', 'Just 

### Artist

In [11]:
# charts > div > div.chart-list__wrapper > div > ol > li:nth-child(1) > button > span.chart-element__information > span.chart-element__information__artist.text--truncate.color--secondary

In [12]:
artisttext = soup.select("span.chart-element__information__artist")[0].get_text()
artisttext

'Lil Nas X'

In [13]:
Artists = []
for elem in soup.select("span.chart-element__information__artist"):
    Artists.append(elem.get_text())

print(Artists)

['Lil Nas X', 'Justin Bieber Featuring Daniel Caesar & Giveon', 'Silk Sonic (Bruno Mars & Anderson .Paak)', 'Cardi B', 'Olivia Rodrigo', 'The Weeknd', 'Dua Lipa Featuring DaBaby', 'The Weeknd', '24kGoldn Featuring iann dior', 'Pop Smoke', 'Rod Wave', 'Masked Wolf', 'Drake', 'Ariana Grande', 'Chris Brown & Young Thug', 'Rod Wave', 'Saweetie Featuring Doja Cat', 'Lil Tjay Featuring 6LACK', 'Billie Eilish', 'Pooh Shiesty Featuring Lil Durk', 'Tate McRae', 'Rod Wave Featuring Polo G', 'Yung Bleu Featuring Drake', 'SpotemGottem Featuring Pooh Shiesty Or DaBaby', 'Ariana Grande', 'Lil Baby', 'Bad Bunny & Jhay Cortez', 'Giveon', 'Machine Gun Kelly X blackbear', 'BTS', 'The Kid LAROI', 'Gabby Barrett', 'Maroon 5 Featuring Megan Thee Stallion', 'Drake Featuring Lil Baby', 'Doja Cat', 'Justin Bieber', 'Pop Smoke Featuring Lil Baby & DaBaby', 'CJ', 'Coi Leray Featuring Lil Durk', 'SZA', 'Chris Stapleton', 'Thomas Rhett', 'Ritt Momney', 'Megan Thee Stallion Featuring DaBaby', 'Florida Georgia Line

## Spotify

### Playlist 1

In [18]:
# https://open.spotify.com/playlist/1Ien12ACUL1Q2V2xIra3WO
playlist_1 = sp.user_playlist_tracks("spotify", "1Ien12ACUL1Q2V2xIra3WO")

In [19]:
type(playlist_1)

dict

In [20]:
print(list(playlist_1.keys())) # items and total songs
print("Total number of songs in the playlist: ",playlist_1["total"]) 
len(playlist_1["items"]) # It is limited to 100 tracks, we will have to fix it:

['href', 'items', 'limit', 'next', 'offset', 'previous', 'total']
Total number of songs in the playlist:  8845


100

In [128]:
def get_playlist_tracks(username, playlist_id):
    results = sp.user_playlist_tracks(username,playlist_id)
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks


In [126]:
playlist = get_playlist_tracks("spotify", "1Ien12ACUL1Q2V2xIra3WO")


In [127]:
len(playlist)

8845

In [129]:
playlist

[{'added_at': '2021-01-23T10:04:33Z',
  'added_by': {'external_urls': {'spotify': 'https://open.spotify.com/user/1237826676'},
   'href': 'https://api.spotify.com/v1/users/1237826676',
   'id': '1237826676',
   'type': 'user',
   'uri': 'spotify:user:1237826676'},
  'is_local': False,
  'primary_color': None,
  'track': {'album': {'album_type': 'album',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/76Plkuk64KSXfG04kwxNZk'},
      'href': 'https://api.spotify.com/v1/artists/76Plkuk64KSXfG04kwxNZk',
      'id': '76Plkuk64KSXfG04kwxNZk',
      'name': 'Koi',
      'type': 'artist',
      'uri': 'spotify:artist:76Plkuk64KSXfG04kwxNZk'}],
    'available_markets': ['AD',
     'AE',
     'AG',
     'AL',
     'AM',
     'AO',
     'AR',
     'AT',
     'AU',
     'AZ',
     'BA',
     'BB',
     'BD',
     'BE',
     'BF',
     'BG',
     'BH',
     'BI',
     'BJ',
     'BN',
     'BO',
     'BR',
     'BS',
     'BT',
     'BW',
     'BY',
     'BZ',
     'C

In [132]:
playlist[0]['track']['name'] #songname

'beverly'

In [135]:
for artist in playlist[0]['track']['artists']: #artists name (can be more than 1)
    print(artist['name'])

Koi


In [136]:
playlist[0]['track']['uri'] #uri for audio features

'spotify:track:2pTW1GkqQ6dPTBOff2ZqQc'

In [153]:
song_names = []
song_uri = []
artist_names = []
audio_features = []
counter = 0
song_uri_total = []

for item in playlist:
    if item["is_local"] == False:   # is_local == True is stored locally?/duplicates
        counter = counter +1
    
        song_names.append(item['track']['name'])
        song_uri_total.append(item['track'] ['uri'])
        song_uri.append(item['track']['uri'])
    
        tempo_artists = []
        for artist in item['track']['artists']:
            tempo_artists.append(artist['name'])
        
        artist_names.append(tempo_artists)
    
        if counter == 100:
            audio_features.append(sp.audio_features(song_uri))
            counter = 0
            song_uri = []      
      
audio_features.append(sp.audio_features(song_uri))  

In [155]:
audio_features_total = [subitem for item in audio_features for subitem in item]

In [158]:
len (song_names)

8844

In [156]:
len(audio_features_total)

8844

In [161]:
audi_features_df = pd.DataFrame(audio_features_total)
audi_features_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,0.735,0.491,0,-7.766,0,0.0427,0.157,6e-05,0.121,0.574,167.13,audio_features,2pTW1GkqQ6dPTBOff2ZqQc,spotify:track:2pTW1GkqQ6dPTBOff2ZqQc,https://api.spotify.com/v1/tracks/2pTW1GkqQ6dP...,https://api.spotify.com/v1/audio-analysis/2pTW...,103473,4
1,0.602,0.796,0,-3.657,0,0.103,0.0682,0.12,0.15,0.265,126.06,audio_features,47Z5890IcjSed81ldeLgqc,spotify:track:47Z5890IcjSed81ldeLgqc,https://api.spotify.com/v1/tracks/47Z5890IcjSe...,https://api.spotify.com/v1/audio-analysis/47Z5...,245053,4
2,0.8,0.585,10,-7.343,1,0.0924,0.264,0.0,0.153,0.779,126.058,audio_features,6PGoSes0D9eUDeeAafB2As,spotify:track:6PGoSes0D9eUDeeAafB2As,https://api.spotify.com/v1/tracks/6PGoSes0D9eU...,https://api.spotify.com/v1/audio-analysis/6PGo...,213400,4
3,0.747,0.592,8,-6.334,1,0.0457,0.00517,1.5e-05,0.124,0.176,110.991,audio_features,3sTCfUmYXSVWDacTd6uMbQ,spotify:track:3sTCfUmYXSVWDacTd6uMbQ,https://api.spotify.com/v1/tracks/3sTCfUmYXSVW...,https://api.spotify.com/v1/audio-analysis/3sTC...,212027,3
4,0.341,0.193,2,-16.915,1,0.0364,0.848,0.923,0.131,0.0749,104.448,audio_features,2Twe7p278J2GxzjQZWJZWM,spotify:track:2Twe7p278J2GxzjQZWJZWM,https://api.spotify.com/v1/tracks/2Twe7p278J2G...,https://api.spotify.com/v1/audio-analysis/2Twe...,164107,4


In [162]:
audi_features_df = audi_features_df.drop(columns=['type', 'id', 'track_href', 'analysis_url', 'duration_ms', 'time_signature'])

In [163]:
audi_features_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri
0,0.735,0.491,0,-7.766,0,0.0427,0.157,6e-05,0.121,0.574,167.13,spotify:track:2pTW1GkqQ6dPTBOff2ZqQc
1,0.602,0.796,0,-3.657,0,0.103,0.0682,0.12,0.15,0.265,126.06,spotify:track:47Z5890IcjSed81ldeLgqc
2,0.8,0.585,10,-7.343,1,0.0924,0.264,0.0,0.153,0.779,126.058,spotify:track:6PGoSes0D9eUDeeAafB2As
3,0.747,0.592,8,-6.334,1,0.0457,0.00517,1.5e-05,0.124,0.176,110.991,spotify:track:3sTCfUmYXSVWDacTd6uMbQ
4,0.341,0.193,2,-16.915,1,0.0364,0.848,0.923,0.131,0.0749,104.448,spotify:track:2Twe7p278J2GxzjQZWJZWM


In [167]:
spotify_names = pd.DataFrame({'Songtitle': song_names, 'Artist(s)': artist_names, 'uri': song_uri_total})

In [168]:
spotify_names

Unnamed: 0,Songtitle,Artist(s),uri
0,beverly,[Koi],spotify:track:2pTW1GkqQ6dPTBOff2ZqQc
1,Titanium (feat. Sia),"[David Guetta, Sia]",spotify:track:47Z5890IcjSed81ldeLgqc
2,LOVE. FEAT. ZACARI.,"[Kendrick Lamar, Zacari]",spotify:track:6PGoSes0D9eUDeeAafB2As
3,do re mi,[blackbear],spotify:track:3sTCfUmYXSVWDacTd6uMbQ
4,Lord Of The Rings: The Fellowship Of The Ring ...,"[Raine, The City of Prague Philharmonic Orches...",spotify:track:2Twe7p278J2GxzjQZWJZWM
...,...,...,...
8839,Killer,[Kingmichaelbeats],spotify:track:0F9RJPp8LCfpxM7rS0pZVk
8840,Disco Man,[Remi Wolf],spotify:track:0T7aTl1t15HKHfwep4nANV
8841,"Chillin Vibe (Feat. Chaboom, kitsyojii, Wonjja...","[YoBoy, Chaboom, kitsyojii, 원쨩, Leebido, 오아이, ...",spotify:track:6wXNvp1ZAm13IHAsU0AxsH
8842,Run It Up (feat. Offset & Moneybagg Yo),"[Lil Tjay, Offset, Moneybagg Yo]",spotify:track:5pmvv42h1bcK3frE8CEblt


# Dataframe

## Top 100

In [14]:
Top_100 = pd.DataFrame({'Songtitle': Songnames, 'Artist(s)': Artists})
Top_100

Unnamed: 0,Songtitle,Artist(s)
0,Montero (Call Me By Your Name),Lil Nas X
1,Peaches,Justin Bieber Featuring Daniel Caesar & Giveon
2,Leave The Door Open,Silk Sonic (Bruno Mars & Anderson .Paak)
3,Up,Cardi B
4,Drivers License,Olivia Rodrigo
...,...,...
95,Shock Da World,Rod Wave
96,You Got It,VEDO
97,Sneaky Links,Rod Wave
98,Nobody,Dylan Scott


## Spotify

In [164]:
audi_features_df.head()

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,uri
0,0.735,0.491,0,-7.766,0,0.0427,0.157,6e-05,0.121,0.574,167.13,spotify:track:2pTW1GkqQ6dPTBOff2ZqQc
1,0.602,0.796,0,-3.657,0,0.103,0.0682,0.12,0.15,0.265,126.06,spotify:track:47Z5890IcjSed81ldeLgqc
2,0.8,0.585,10,-7.343,1,0.0924,0.264,0.0,0.153,0.779,126.058,spotify:track:6PGoSes0D9eUDeeAafB2As
3,0.747,0.592,8,-6.334,1,0.0457,0.00517,1.5e-05,0.124,0.176,110.991,spotify:track:3sTCfUmYXSVWDacTd6uMbQ
4,0.341,0.193,2,-16.915,1,0.0364,0.848,0.923,0.131,0.0749,104.448,spotify:track:2Twe7p278J2GxzjQZWJZWM


In [169]:
spotify_names.head()

Unnamed: 0,Songtitle,Artist(s),uri
0,beverly,[Koi],spotify:track:2pTW1GkqQ6dPTBOff2ZqQc
1,Titanium (feat. Sia),"[David Guetta, Sia]",spotify:track:47Z5890IcjSed81ldeLgqc
2,LOVE. FEAT. ZACARI.,"[Kendrick Lamar, Zacari]",spotify:track:6PGoSes0D9eUDeeAafB2As
3,do re mi,[blackbear],spotify:track:3sTCfUmYXSVWDacTd6uMbQ
4,Lord Of The Rings: The Fellowship Of The Ring ...,"[Raine, The City of Prague Philharmonic Orches...",spotify:track:2Twe7p278J2GxzjQZWJZWM


In [170]:
Spotify_df = pd.merge(spotify_names, audi_features_df, how="inner", on=["uri", "uri"])

In [171]:
Spotify_df.head()

Unnamed: 0,Songtitle,Artist(s),uri,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,beverly,[Koi],spotify:track:2pTW1GkqQ6dPTBOff2ZqQc,0.735,0.491,0,-7.766,0,0.0427,0.157,6e-05,0.121,0.574,167.13
1,Titanium (feat. Sia),"[David Guetta, Sia]",spotify:track:47Z5890IcjSed81ldeLgqc,0.602,0.796,0,-3.657,0,0.103,0.0682,0.12,0.15,0.265,126.06
2,LOVE. FEAT. ZACARI.,"[Kendrick Lamar, Zacari]",spotify:track:6PGoSes0D9eUDeeAafB2As,0.8,0.585,10,-7.343,1,0.0924,0.264,0.0,0.153,0.779,126.058
3,do re mi,[blackbear],spotify:track:3sTCfUmYXSVWDacTd6uMbQ,0.747,0.592,8,-6.334,1,0.0457,0.00517,1.5e-05,0.124,0.176,110.991
4,Lord Of The Rings: The Fellowship Of The Ring ...,"[Raine, The City of Prague Philharmonic Orches...",spotify:track:2Twe7p278J2GxzjQZWJZWM,0.341,0.193,2,-16.915,1,0.0364,0.848,0.923,0.131,0.0749,104.448


In [172]:
Spotify_df.to_csv(r'../Gnoosic/Spotify.csv', sep=',', index=False)

# Prototype 1

Pseudocode:
User inputs song (not lower/uppercase sensitive)
Is the song in the Top 100?
    NO: Prototype 2
    YES: Recommend another Song from the Top 100 list

In [15]:
# Songinput (not lower/uppercase sensitive)
# Songinput = input("To teach Gnod what you are like, please type in 1 Song that you already know and like: ").lower()

In [16]:
# Functions

# Check if it is in the top 100
# Top_100.loc[Top_100['Songtitle'].str.lower() == Songinput]

# Songrecommendation

def song_recommendation():
    Recommendation = Top_100.sample()
    return print("Songrecommendation: " + Recommendation.iloc[0,0] + " by " + Recommendation.iloc[0,1])


# Recommendation

def Recommendation():
    Match = (Top_100.loc[Top_100['Songtitle'].str.lower() == Songinput])
    if Match.shape[0] == 1:
        song_recommendation()
    else:
        print("Prototype 2 filler")

# how to get Match
# Match = (Top_100.loc[Top_100['Songtitle'].str.lower() == Songinput])
# print(Match) > if its's a match it will show 1
# Match.shape[0] > so this will be 1 also ergo if Match.shape[0] is not 1 it is not a match

In [17]:
Songinput  = input("To teach Gnod what you are like, please type in 1 Song that you already know and like: ").lower()
Recommendation()

To teach Gnod what you are like, please type in 1 Song that you already know and like: up
Songrecommendation: Beat Box by SpotemGottem Featuring Pooh Shiesty Or DaBaby


# Bonus

Can you find other websites with lists of "hot" songs? What about songs that were popular on a certain decade? 

You can scrape more lists and add extra features to the project.