## Case Study: The site for recommendations - "Gnod"

Antonio Montilla

### Lab | Web Scraping Single Page (GNOD part 1)
- Web scraping the top 100 songs from https://www.popvortex.com/music/charts/top-100-songs.php

In [2]:
#importing libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd
import random

In [3]:
#copying url
url = "https://www.popvortex.com/music/charts/top-100-songs.php"

In [4]:
#downloading html
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [5]:
#parseing html with BeatifulSoup
soup = BeautifulSoup(response.content, "html.parser")

In [6]:
#checkin the code
#soup

In [7]:
#selecting the top songs body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper
#print(soup.select("body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper > div" ))
len(soup.select("body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper > div" ))

105

In [8]:
#looking at first item
soup.select("body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper > div" )[0]

<div class="feed-item music-chart flex-row" id="chart-position-1"><div class="cover-art col-xs-12 col-sm-4"><p class="chart-position">1</p><img alt="Lovin On Me - Jack Harlow Cover Art" class="cover-image" data-pin-description="Lovin On Me - Jack Harlow" data-pin-media="https://is1-ssl.mzstatic.com/image/thumb/Music116/v4/c1/95/0e/c1950e0a-3fc8-83c2-8913-a14dd33bfe62/075679664990.jpg/1200x1200bb.png" data-pin-url="https://www.popvortex.com/music/charts/top-100-songs.php" height="170" loading="lazy" src="https://is1-ssl.mzstatic.com/image/thumb/Music116/v4/c1/95/0e/c1950e0a-3fc8-83c2-8913-a14dd33bfe62/075679664990.jpg/170x170bb.png" width="170"/> <audio controls="" controlslist="nodownload" preload="none"><source src="https://audio-ssl.itunes.apple.com/itunes-assets/AudioPreview116/v4/ef/09/34/ef09346f-91c8-bc8d-7a40-b94d871bcf98/mzaf_11177645511189211683.plus.aac.p.m4a"/></audio> </div><div class="chart-content col-xs-12 col-sm-8"><p class="title-artist"><cite class="title">Lovin On Me

In [9]:
artists = []
songs = []
target_classes = ['feed-item', 'music-chart', 'flex-row', 'new-release'] #I had to add different class as only using 'flex-row' would exclude some songs
chart_entries = soup.find_all('div', class_=target_classes)
for entry in chart_entries:
    title_artist = entry.find('p', class_='title-artist')
    if title_artist:
        title = title_artist.find('cite', class_='title').text.strip()
        artist = title_artist.find('em', class_='artist').text.strip()
        artists.append(artist)
        songs.append(title)

hot_songs = pd.DataFrame({"artist":artists, "song":songs})

In [10]:
#checking the dataframe
print(len(hot_songs))
hot_songs.drop_duplicates()
print(len(hot_songs))
#deleting the first row which was repeated:
hot_songs.drop(index=hot_songs.index[0], axis=0, inplace=True)
hot_songs.reset_index()
hot_songs

101
101


Unnamed: 0,artist,song
1,Jack Harlow,Lovin On Me
2,Paul Russell,Lil Boo Thang
3,Zach Bryan,I Remember Everything (feat. Kacey Musgraves)
4,Chris Stapleton,White Horse
5,Jelly Roll,Need A Favor
...,...,...
96,Wham!,Last Christmas (Single Version)
97,HARDY,TRUCK BED
98,David Kushner,Daylight
99,Bing Crosby,White Christmas


In [11]:
hot_songs.to_csv('hot_songs', index = False)

### Extending the hot_songs df using billboard public information

For the second lab, I want to extend the hot_songs dataframe using top 100 song from Billborad published here: https://www.billboard.com/charts/greatest-hot-100-singles/

In [12]:
#copying url
url = "https://www.billboard.com/charts/greatest-hot-100-singles/"

In [13]:
#downloading html
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [14]:
#parseing html with BeatifulSoup
soup = BeautifulSoup(response.content, "html.parser")

In [15]:
#checkin the code
#soup

In [16]:
#selecting the top songs #post-6760926 > div.pmc-paywall > div
#soup.select("#post-6760926 > div.pmc-paywall > div > div")

In [17]:
#now creating the dataframe
artists = []
songs = []
chart_rows = soup.select('.o-chart-results-list-row')
for row in chart_rows:
    title_element = row.select_one('.c-title')
    artist_element = row.select_one('.c-label')
    if title_element and artist_element:
        title = title_element.text.strip()
        artist = row.select_one('.c-label').find_next('span', class_='c-label').text.strip()
        songs.append(title)
        artists.append(artist)
    
hot_songs2 = pd.DataFrame({"artist":artists, "song":songs})

In [18]:
hot_songs2

Unnamed: 0,artist,song
0,The Weeknd,Blinding Lights
1,Chubby Checker,The Twist
2,Santana Featuring Rob Thomas,Smooth
3,Bobby Darin,Mack The Knife
4,Mark Ronson Featuring Bruno Mars,Uptown Funk!
...,...,...
95,Donna Summer,Hot Stuff
96,Post Malone Featuring 21 Savage,Rockstar
97,Coolio Featuring L.V.,Gangsta's Paradise
98,The Steve Miller Band,Abracadabra


In [19]:
#appending the two df
hot_songs_all = hot_songs.append(hot_songs2, ignore_index=True)
hot_songs_all

  hot_songs_all = hot_songs.append(hot_songs2, ignore_index=True)


Unnamed: 0,artist,song
0,Jack Harlow,Lovin On Me
1,Paul Russell,Lil Boo Thang
2,Zach Bryan,I Remember Everything (feat. Kacey Musgraves)
3,Chris Stapleton,White Horse
4,Jelly Roll,Need A Favor
...,...,...
195,Donna Summer,Hot Stuff
196,Post Malone Featuring 21 Savage,Rockstar
197,Coolio Featuring L.V.,Gangsta's Paradise
198,The Steve Miller Band,Abracadabra


## Part 2: song recommendation from hot_songs_all


Build a function that: 
1) takes as input a song title, 
2) checks if in hot_songs_all df, 
3) if so, then recommends another random song from that list, 
4) elif delivers a message with no recommendation

In [21]:
#trying the random selection 
random.choice(hot_songs_all['song']) 

'White Christmas'

In [23]:
#trying the tolist() function
hot_songs_all['song'].str.lower().tolist()

['lovin on me',
 'lil boo thang',
 'i remember everything (feat. kacey musgraves)',
 'white horse',
 'need a favor',
 'save me (with lainey wilson)',
 'around me the cold night',
 '90s rap mashup',
 'next winter i will wait for you forever',
 'cruel summer',
 'standing next to you',
 'thinkin’ bout me',
 'all i want for christmas is you',
 '3d',
 'even though i know you',
 'christmas / sarajevo 12/24 (instrumental)',
 'greedy',
 'sin so sweet',
 'fast car',
 'in the sky a white cloud drifted lazily',
 'now and then',
 'lose control',
 'standing next to you (band version)',
 'houdini',
 'water',
 'standing next to you (future funk remix)',
 'standing next to you (slow jam remix)',
 'standing next to you (holiday remix)',
 'standing next to you (instrumental)',
 "rockin' around the christmas tree (single)",
 'standing next to you (latin trap remix)',
 'standing next to you (pbr&b remix)',
 'last night',
 "now i know i know i've lost you",
 'where the wild things are',
 'paint the town re

In [26]:
#now using these two steps into the function
def recommend_song(song_name, df):
    lower_case_song = song_name.lower()
    if lower_case_song in df['song'].str.lower().tolist():
        recommended_song = random.choice(df['song'])
        return f"We recommend you to listen '{recommended_song}' as well:)"
    else:
        return "Sorry, there is no recommendation at this stage"

In [27]:
recommend_song('Blinding Lights', hot_songs_all)

"We recommend you to listen 'Save Me (with Lainey Wilson)' as well:)"

In [28]:
recommend_song('Fulana', hot_songs_all)

'Sorry, there is no recommendation at this stage'