                                                            GNOD Project

Context : 
Dear xxxxxxxx,
 
We are thrilled to welcome you as a Data Analyst for Gnoosic!
 
As you know, we are trying to come up with ways to enhance our music recommendations. One of the new features we'd like to research is to recommend songs (not only bands). We're also aware of the limitations of our collaborative filtering algorithms, and would like to give users two new possibilities when searching for recommendations:
 
- Songs that are actually similar to the ones they picked from an acoustic point of view.
- Songs that are popular around the world right now, independently from their tastes.
 
Coming up with the perfect song recommender will take us months - no need to stress out too much. In this first week, we want you to explore new data sources for songs. The internet is full of information and our first step is to acquire it do an initial exploration. Feel free to use APIs or directly scrape the web to collect as much information as possible from popular songs. Eventually, we'll need to collect data from millions of songs, but we can start with a few hundreds or thousands from each source and see if the collected features are useful. 
 
Once the data is collected, we want you to create clusters of songs that are similar to each other. The idea is that if a user inputs a song from one group, we'll prioritize giving them recommendations of songs from that same group.
 
On Friday, you will present your work to me and Marek, the CEO and founder. Full disclosure: I need you to be very convincing about this whole song-recommender, as this has been my personal push and the main reason we hired you for!
 
Be open minded about this process: we are agile, and that means that we define our products and features on-the-go, while exploring the tools and the data that's available to us. We'd love you to provide your own vision of the product and the next steps to be taken.
 
Lots of luck and strength for this first week with us!
 
Jane

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

* PART I : Scraping popular songs
s. Popvortex maintains a weekly Top 100 of "hot" songs here: http://www.popvortex.com/music/charts/top-100-songs.phart! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.

In [2]:
#store the url in a variable
url = "http://www.popvortex.com/music/charts/top-100-songs.php"

In [3]:
#download the html with a request
response = requests.get(url)

In [4]:
#verify the response status
response.status_code

200

In [5]:
#parse the html
soup = BeautifulSoup(response.content, "html.parser")

In [6]:
# retrieve/extract the top100 songs genral chart (using the top100 songs "Selector")
top100=soup.select("body > div.container > div:nth-child(4) > div.col-xs-12.col-md-8 > div.chart-wrapper")
#top100

In [7]:
# from the general chart, extracting the title and artist of each songs and storing them into lists
title=[]
artist=[]
for l in top100:
    for t in l.select(".title"):
        title.append(t.get_text())
    for a in l.select(".artist"):
        artist.append(a.get_text())
print(title)
print(artist)

['I Had Some Help (feat. Morgan Wallen)', 'A Bar Song (Tipsy)', 'Come back to me', 'Not Like Us', 'Lose Control', 'Lose My Breath', 'Heartbreak Summer', 'MILLION DOLLAR BABY', 'euphoria', 'Ronald', 'Too Sweet', 'Beautiful Things', 'I Had Some Help (feat. Morgan Wallen)', "HIND'S HALL", 'meet the grahams', 'Espresso', 'A Bar Song (Tipsy)', 'Miles On It', 'Feel Like Hell Today', 'Come back to me (Radio Edit)', 'Fortnight (feat. Post Malone)', 'Stargazing', "I Can't", 'Troubles', 'Where That Came From', 'Lose My Breath (Instrumental)', 'Like That', 'BOA', 'The Door', 'Family Matters', 'Jelly (feat. 2Rare) [Remix]', 'Live Like You Were Dying', 'I Can Do It With a Broken Heart', 'Come back to me', 'Training Season', 'Austin', 'Save Me (with Lainey Wilson)', 'Take It All Back', 'Cowgirls (feat. ERNEST)', 'Can You Hear Me', 'i like the way you kiss me', 'Halfway To Hell', 'Lil Boo Thang', 'Where the Wild Things Are', 'Illusion', 'A Country Boy Can Survive', "we can't be friends (wait for your

In [8]:
# combining the lists into a dataframe
top100_df = pd.DataFrame({"title":title, "artist":artist})
top100_df

Unnamed: 0,title,artist
0,I Had Some Help (feat. Morgan Wallen),Post Malone
1,A Bar Song (Tipsy),Shaboozey
2,Come back to me,RM
3,Not Like Us,Kendrick Lamar
4,Lose Control,Teddy Swims
...,...,...
95,umademeast4r.mp3,4batz
96,act iii: on god? (she like),4batz
97,get out yo feelings ho,4batz
98,Wanna Be,GloRilla & Megan Thee Stallion


* Asking for user input and recommending a random song from top100

In [9]:
#asking for user input
song_title=input("Enter song title: ")

Enter song title: NOT LIKE US


In [10]:
#recommending a song
if song_title.lower() in top100_df['title'].str.lower().values:
    random=top100_df.sample()
    print("You should check out:")
    print(random['title'].values[0], ' by ', random['artist'].values[0])
else:
    print("Can't recommend songs at the moment")

You should check out:
Beautiful As You  by  Thomas Rhett


In [11]:
top100_df.to_csv('hot_songs.csv', index=False)