In [1]:
'''Dear xxxxxxxx,

We are thrilled to welcome you as a Data Analyst for Gnoosic!

As you know, we are trying to come up with ways to enhance our music recommendations. 
One of the new features we'd like to research is to recommend songs (not only bands). 
We're also aware of the limitations of our collaborative filtering algorithms, 
and would like to give users two new possibilities when searching for recommendations:

- Songs that are actually similar to the ones they picked from an acoustic point of view.
- Songs that are popular around the world right now, independently from their tastes.

Coming up with the perfect song recommender will take us months - no need to stress out too much. 
In this first week, we want you to explore new data sources for songs. 
The internet is full of information and our first step is to acquire it do an initial exploration. 
Feel free to use APIs or directly scrape the web to collect as much information as possible from popular songs. 
Eventually, we'll need to collect data from millions of songs, but we can start with a few hundreds or thousands 
from each source and see if the collected features are useful. 

Once the data is collected, we want you to create clusters of songs that are similar to each other. 
The idea is that if a user inputs a song from one group, we'll prioritize giving them recommendations 
of songs from that same group.

On Friday, you will present your work to me and Marek, the CEO and founder. 
Full disclosure: I need you to be very convincing about this whole song-recommender, 
as this has been my personal push and the main reason we hired you for!

Be open minded about this process: we are agile, and that means that we define our products and features 
on-the-go, while exploring the tools and the data that's available to us. We'd love you to provide your 
own vision of the product and the next steps to be taken.

Lots of luck and strength for this first week with us!

Jane
'''

"Dear xxxxxxxx,\n\nWe are thrilled to welcome you as a Data Analyst for Gnoosic!\n\nAs you know, we are trying to come up with ways to enhance our music recommendations. \nOne of the new features we'd like to research is to recommend songs (not only bands). \nWe're also aware of the limitations of our collaborative filtering algorithms, \nand would like to give users two new possibilities when searching for recommendations:\n\n- Songs that are actually similar to the ones they picked from an acoustic point of view.\n- Songs that are popular around the world right now, independently from their tastes.\n\nComing up with the perfect song recommender will take us months - no need to stress out too much. \nIn this first week, we want you to explore new data sources for songs. \nThe internet is full of information and our first step is to acquire it do an initial exploration. \nFeel free to use APIs or directly scrape the web to collect as much information as possible from popular songs. \nE

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from tqdm.notebook import tqdm

In [3]:
# find url and store it in a variable
url = "https://www.popvortex.com/music/charts/top-100-songs.php"

In [4]:
# download html with a get request
response = requests.get(url)
response.status_code # 200 status code means OK!

200

In [5]:
response.content

b'<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><title>iTunes Top 100 Songs Chart 2022</title><meta name="viewport" content="width=device-width, initial-scale=1"><meta name="description" content="iTunes top 100 songs chart list. The most popular hit music and trending songs of 2022. Chart of today\'s current iTunes top 100 songs is updated daily."><meta property="og:title" content="iTunes Top 100 Songs Chart 2022"/><meta property="og:description" content="Chart of the top 100 songs on iTunes. Chart list of the top 100 song downloads of 2022 is updated daily."/><meta property="og:type" content="article"/><meta property="og:image" content="https://www.popvortex.com/images/logo-facebook.png"/><meta property="og:site_name" content="PopVortex"/><meta property="og:url" content="https://www.popvortex.com/music/charts/top-100-songs.php"/><meta property="fb:admins" content="100000239962942"/><meta property="fb:app_id" content="178831188827052"/><link rel="shortcut icon" href="/favi

In [6]:
# parse html (create the 'soup')
soup = BeautifulSoup(response.content, "html.parser")

In [7]:
# check that the html code looks like it should
soup

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8"/><title>iTunes Top 100 Songs Chart 2022</title><meta content="width=device-width, initial-scale=1" name="viewport"/><meta content="iTunes top 100 songs chart list. The most popular hit music and trending songs of 2022. Chart of today's current iTunes top 100 songs is updated daily." name="description"/><meta content="iTunes Top 100 Songs Chart 2022" property="og:title"><meta content="Chart of the top 100 songs on iTunes. Chart list of the top 100 song downloads of 2022 is updated daily." property="og:description"><meta content="article" property="og:type"><meta content="https://www.popvortex.com/images/logo-facebook.png" property="og:image"/><meta content="PopVortex" property="og:site_name"/><meta content="https://www.popvortex.com/music/charts/top-100-songs.php" property="og:url"/><meta content="100000239962942" property="fb:admins"/><meta content="178831188827052" property="fb:app_id"/><link href="/favicon.png" rel="shortcut

In [8]:
soup.select("#chart-position-1 > div.chart-content.col-xs-12.col-sm-8 > p > em")

[<em class="artist">Sam Smith &amp; Kim Petras</em>]

In [9]:
soup.select("em")

[<em class="artist">Sam Smith &amp; Kim Petras</em>,
 <em class="artist">Transformation Worship</em>,
 <em>New Release</em>,
 <em class="artist">Fleetwood Mac</em>,
 <em class="artist">David Guetta &amp; Bebe Rexha</em>,
 <em class="artist">HARDY &amp; Lainey Wilson</em>,
 <em class="artist">Christina Perri</em>,
 <em class="artist">Kane Brown &amp; Katelyn Brown</em>,
 <em class="artist">Jelly Roll</em>,
 <em class="artist">Morgan Wallen</em>,
 <em class="artist">OneRepublic</em>,
 <em class="artist">Sia</em>,
 <em class="artist">Beyoncé</em>,
 <em class="artist">Charlie Puth &amp; Jung Kook</em>,
 <em class="artist">Cole Swindell</em>,
 <em class="artist">HARDY</em>,
 <em>New Release</em>,
 <em class="artist">Morgan Wallen</em>,
 <em class="artist">HARDY</em>,
 <em>New Release</em>,
 <em class="artist">HARDY</em>,
 <em>New Release</em>,
 <em class="artist">Harry Styles</em>,
 <em class="artist">Luke Combs</em>,
 <em class="artist">Bailey Zimmerman</em>,
 <em class="artist">Lady Gaga 

In [10]:
soup.select("cite")

[<cite class="title">Unholy</cite>,
 <cite class="title">Eagle (feat. KB)</cite>,
 <cite class="title">Everywhere</cite>,
 <cite class="title">I'm Good (Blue)</cite>,
 <cite class="title">wait in the truck</cite>,
 <cite class="title">A Thousand Years</cite>,
 <cite class="title">Thank God</cite>,
 <cite class="title">Son Of A Sinner</cite>,
 <cite class="title">You Proof</cite>,
 <cite class="title">I Ain't Worried</cite>,
 <cite class="title">Unstoppable</cite>,
 <cite class="title">CUFF IT</cite>,
 <cite class="title">Left and Right</cite>,
 <cite class="title">She Had Me At Heads Carolina</cite>,
 <cite class="title">TRUCK BED</cite>,
 <cite class="title">Wasted On You</cite>,
 <cite class="title">here lies country music</cite>,
 <cite class="title">the mockingbird &amp; THE CROW</cite>,
 <cite class="title">As It Was</cite>,
 <cite>Billboard Hot 100</cite>,
 <cite class="title">The Kind of Love We Make</cite>,
 <cite class="title">Fall In Love</cite>,
 <cite class="title">Shallow<

In [11]:
soup

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8"/><title>iTunes Top 100 Songs Chart 2022</title><meta content="width=device-width, initial-scale=1" name="viewport"/><meta content="iTunes top 100 songs chart list. The most popular hit music and trending songs of 2022. Chart of today's current iTunes top 100 songs is updated daily." name="description"/><meta content="iTunes Top 100 Songs Chart 2022" property="og:title"><meta content="Chart of the top 100 songs on iTunes. Chart list of the top 100 song downloads of 2022 is updated daily." property="og:description"><meta content="article" property="og:type"><meta content="https://www.popvortex.com/images/logo-facebook.png" property="og:image"/><meta content="PopVortex" property="og:site_name"/><meta content="https://www.popvortex.com/music/charts/top-100-songs.php" property="og:url"/><meta content="100000239962942" property="fb:admins"/><meta content="178831188827052" property="fb:app_id"/><link href="/favicon.png" rel="shortcut

In [12]:
soup.select("p.title-artist")

[<p class="title-artist"><cite class="title">Unholy</cite><em class="artist">Sam Smith &amp; Kim Petras</em></p>,
 <p class="title-artist"><cite class="title">Eagle (feat. KB)</cite><em class="artist">Transformation Worship</em></p>,
 <p class="title-artist"><cite class="title">Everywhere</cite><em class="artist">Fleetwood Mac</em></p>,
 <p class="title-artist"><cite class="title">I'm Good (Blue)</cite><em class="artist">David Guetta &amp; Bebe Rexha</em></p>,
 <p class="title-artist"><cite class="title">wait in the truck</cite><em class="artist">HARDY &amp; Lainey Wilson</em></p>,
 <p class="title-artist"><cite class="title">A Thousand Years</cite><em class="artist">Christina Perri</em></p>,
 <p class="title-artist"><cite class="title">Thank God</cite><em class="artist">Kane Brown &amp; Katelyn Brown</em></p>,
 <p class="title-artist"><cite class="title">Son Of A Sinner</cite><em class="artist">Jelly Roll</em></p>,
 <p class="title-artist"><cite class="title">You Proof</cite><em class

In [13]:
#chart-position-1 > div.chart-content.col-xs-12.col-sm-8 > p > cite
#chart-position-1 > div.chart-content.col-xs-12.col-sm-8 > p > em

In [14]:
soup.select("p.title-artist cite.title")[0].get_text()

'Unholy'

In [15]:
soup.select("p.title-artist em.artist")[0].get_text()

'Sam Smith & Kim Petras'

In [16]:
soup.select("p.title-artist")[0].get_text()

'UnholySam Smith & Kim Petras'

## making a beautiful dataframe

In [17]:
artist = []
title = []

In [18]:
num_iter = len(soup.select("p.title-artist cite.title"))

title_list = soup.select("p.title-artist cite.title")

artist_list = soup.select("p.title-artist em.artist")

In [19]:
for i in range(num_iter):
    title.append(title_list[i].get_text())
    artist.append(artist_list[i].get_text())

print(title)
print(artist)

['Unholy', 'Eagle (feat. KB)', 'Everywhere', "I'm Good (Blue)", 'wait in the truck', 'A Thousand Years', 'Thank God', 'Son Of A Sinner', 'You Proof', "I Ain't Worried", 'Unstoppable', 'CUFF IT', 'Left and Right', 'She Had Me At Heads Carolina', 'TRUCK BED', 'Wasted On You', 'here lies country music', 'the mockingbird & THE CROW', 'As It Was', 'The Kind of Love We Make', 'Fall In Love', 'Shallow', 'Under the Influence', 'Lose Yourself', 'Life Is a Highway', 'Celestial', 'Super Freaky Girl', 'Sunroof', 'Shivers', 'Victoria’s Secret', 'High Heels', 'Love Me Like You Do', 'Hold Me Closer', 'I Like You (A Happier Song) [feat. Doja Cat]', 'About Damn Time', 'Numb', '2 Be Loved (Am I Ready)', 'Running Up That Hill (A Deal with God)', 'Earned It', 'You, Me, And Whiskey', 'Rock and a Hard Place', 'Next Thing You Know', "You'll Be In My Heart", 'Cold Heart (PNAU Remix)', 'Vegas (From the Original Motion Picture Soundtrack ELVIS)', 'Soul', '500 Miles', 'Monster Mash', 'What My World Spins Around'

In [20]:
top100 = pd.DataFrame({'title':title_list, 'artist':artist_list})

In [32]:
display(top100)

Unnamed: 0,title,artist
0,[Unholy],[Sam Smith & Kim Petras]
1,[Eagle (feat. KB)],[Transformation Worship]
2,[Everywhere],[Fleetwood Mac]
3,[I'm Good (Blue)],[David Guetta & Bebe Rexha]
4,[wait in the truck],[HARDY & Lainey Wilson]
...,...,...
95,[Fancy Like],[Walker Hayes]
96,[Ghostbusters],[Ray Parker Jr.]
97,[We Don't Talk About Bruno],"[Carolina Gaitán - La Gaita, Mauro Castillo, A..."
98,[No Se Va (En Vivo)],[Grupo Frontera]


In [59]:
# save it as csv
top100.to_csv('top100.csv', index = False, header = True)

## LAB2

### Built input via User & output

In [64]:
a = str(input('Sing a song for me: '))

Sing a song for me: Unholy


In [65]:
print(top100['title'].sample(1).values)

[<cite class="title">Half Of Me (feat. Riley Green)</cite>]


In [66]:
#res = str(top100)[1:-1]

if a in top100['title'].values:
    if a == top100['title'].sample(1).values:
        print('try again')
    else:
        print('Pump up the volume and let us dance to', str(top100['title'].sample(1).values)[1:-1])
else:
    print('When the music is over')

When the music is over


In [None]:
# Cant't find the mistake

## GETTING MORE SONGS FOR A NEW SOURCE FROM ROLING STONE

In [67]:
# and again

# find url and store it in a variable
url = "https://www.cs.ubc.ca/~davet/music/list/Best9.html"

In [68]:
# download html with a get request
response = requests.get(url)
response.status_code 

200

In [69]:
# 200 status code means OK!

In [71]:
# parse html (create the soup)
rolling_soup = BeautifulSoup(response.content, 'html.parser')

# prettifying the soup 
#rollingsoup.prettify

<bound method Tag.prettify of <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="en-us" http-equiv="Content-Language"/>
<link href="../davetmusic.css" rel="stylesheet" title="davetmusic" type="text/css"/>
<script src="../playsamples.js" type="text/javascript"></script>
<title>Rolling Stone - 500 Greatest Songs (Music Database :: Dave Tompkins)</title>
</head>
<body>
<div class="menutitle"><a href="../../index.html">Dave Tompkins</a> :: <a href="../index.html">Music Database</a></div>
<div class="menurow">
<a class="singlemenu" href="../index.html">INTRODUCTION</a>
<a class="singlemenu" href="../disc/index.html">DISCS</a>
<a class="singlemenu" href="../covers/index.html">COVERS</a>
<a class="singlemenu" href="../genre/index.html">GENRE</a>
<a class="singlemenu" href="../artisttag/index.html">ARTIST TAGS</a>
<a class="singlemenu" href="../

In [72]:
# call the artist

rollingsoup.select('td:nth-child(3) > a:nth-child(2)')[0].get_text(strip=True)

'Bob Dylan'

In [73]:
# sing the song

rollingsoup.select('td:nth-child(4) > a')[0].get_text(strip=True)

'Like a Rolling Stone'

In [74]:
# check results

Greatest500 = len(rollingsoup.select('td:nth-child(4) > a'))
Greatest500

500

In [75]:
# # looping the songs
artist = []
title = []


for i in tqdm(range(Greatest500)):
    title.append(rollingsoup.select('td:nth-child(4) > a')[i].get_text(strip=True))
    artist.append(rollingsoup.select('td:nth-child(3) > a:nth-child(2)')[i].get_text(strip=True))

  0%|          | 0/500 [00:00<?, ?it/s]

In [76]:
Greatest_500 = pd.DataFrame({'title':song,'artist':artist})

In [77]:
Greatest_500.head()

Unnamed: 0,title,artist
0,Like a Rolling Stone,Bob Dylan
1,Satisfaction,The Rolling Stones
2,Imagine,John Lennon
3,What's Going On,Marvin Gaye
4,Respect,Aretha Franklin


In [78]:
Greatest_500.to_csv(r'Greatest_500.csv', index = False, header = True)