Music Recommendations

Ignacio Rus Prados

DAFT-MAR21, Remote, 09/04/2021

Content

Project Description
Rules
Workflow
Organization
Links

Project Description

I have been (fictionally) hired as Data Analysts by Gnod, a company that owns a website that provides music recommendations, among other things. They hired me to improve their algorithms and set the ground for a collaboration with bigger companies such as Spotify. My first task is to develop a new feature: song recommendations based on your favorite songs.

My approach was the following: if the song provided by the user is within the Top 100 popular songs (https://www.billboard.com/charts/hot-100) then we'll just pick another song from that list (Path A). If the song is not among the most popular songs of the moment, then we'll analyze the audio features of the song and compare with our database. Based on this we'll find which songs are most "similar" to the one provided and pick one of those at random (Path B).

Dataset

I scraped "The Hot 100 Chart" by Billboard (https://www.billboard.com/charts/hot-100) for Path A and used an Spotify API (Spotipy) to create a database of songs that I divided into 11 different clusters based on their audio features for Path B. My database includes more than 27.000 songs.

Organization

I used the Trello template to organize my work and keep track of all remaining tasks.

My repository consists of:

Folder: Music-Recommendations
- README.md
- .gitignore
- Presentation.pdf (slides used to present this project during the Data Analytics bootcamp)
- Music4You.mp4 (Recorded presentation of the project)
- Folder: data
  - top100songs.csv (a list of the Hot 100 Billboard chart. Can be updated calling update_top100() from get_top_100.py)
  - song_database.csv (starting database gathered from Spotify playlists. NOT DIVIDED INTO CLUSTERS)
  - clustered_database.csv (song_database.csv but with an added column that identifies the cluster each song belongs to)
  - scaler.pickle (trained scaler)
  - kmeans.pickle (trained model for clustering)
- Folder: code
  - get_top_100.ipynb (code for scraping and storing data from the Hot 100 Billboard chart. Includes a function that updates the database to have the latest version of the chart)
  - get_top_100.py (same as get_top_100.ipynb but ready to be called)
  - Database.ipynb (code that uses Spotipy API to gather and store songs from several Spotify playlists)
  - Clusters.ipynb (code that takes song_database.csv and divides the songs into clusters by analyzing their audio features)
  - Music_recommendations.ipynb (code that defines the Recommendation() function. It asks for a song name and provides a song recommendation following Path A or B, depending on the input)
  - MUSIC4YOU.py (same as Music_recommendations.ipynb but ready to be run. Just call it on Terminal)

Links

Repository
Trello

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Recommendations

Content

Project Description

Dataset

Organization

Links

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
data		data
.gitignore		.gitignore
Music4you.mp4		Music4you.mp4
Presentation.pdf		Presentation.pdf
README.md		README.md

IgnacioRus/Music-Recommendations

Folders and files

Latest commit

History

Repository files navigation

Music Recommendations

Content

Project Description

Dataset

Organization

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages