Skip to content

Collect information on songs, analyze and build a recommender system with the help of Spotify API.

Notifications You must be signed in to change notification settings

janampatel15/trackData

Repository files navigation

Building a Song Recommendation and Analyzer System

The purpose of this project was to collect information about songs from 1900 to 2021. We attempted to get information about songs like release date, artist, album name, and also song's feature which can be danceability, speechiness, loudness, keys, etc.

We originally had information for 170K songs, thanks to Kaggle Dataset. But we didn't think it was enough, so we chose to get more with playlists that contained a large number of songs. In the end, we ended up with 370K songs information ranging from 1900 to 2021.

We also decided to compare the dataset with the Top 100 Billboards songs, which we were able to scrape from Wikipedia page for the years 1990 to 2020.

Process

To collect information about songs, we first created a collecting file for the regular songs from the years 1900 to 2021, which can be seen in the Collecting Notebook. For collecting the top 100 Billboards the process was done in the CollectingTop100 Notebook.

To store and work with this massive amount of data, we thought it'd be best to use an SQL server since it's much faster than storing and working with data that's in CSV or JSON format.

To pull the song features, we used the amazing Spotify API. The API doesn't necessarily have a limit, but it still took approximately 48 hours to collect information for 370K songs.

After collecting the data, we had cleaned and analyzed the data we can be seen in the EDA Notebook. To our surprise, we only had around 0.3% of data that was considered to be bad data (duplicates, or missing information). We also showed different analyses and relations under the same file.

We created a Recommender system with the help of cosine similarity and sklearn's preprocessing. The recommender system can take a song's features and recommend songs that have similar song features. This can be seen under the Recommender Notebook.

Finally, in the Relationship_Top10_AllSongs Notebook we studied the average of each year's average song features for Top 100 Billboards and studied their relationship, and also see which features have a strong and weak correlation.

Data

Some of the dataset's features don't need further explanation, such as artists, name, and year. These are the other numerical/categorical features briefly explained:

  • id - It's the track id generated by Spotify itself

  • acousticness - It's the confidence measure of whether the track is acoustic or not and it ranges from 0 to 1.

  • danceability - It describes how suitable a track is for dancing based on several musical elements and ranges from 0 to 1.

  • energy - It represents the perception of intensity and activity that someone has while listening to a song. The features that contribute to this perception include dynamic range, loudness, and onset rate. It ranges from 0 to 1.

  • duration_ms - It's the duration of a track in milliseconds. Spotify is giving output in ms, but it's better to convert to minutes to analyze it better.

  • instrumentalness - It detects whether a track contains spoken words or not. It ranges from 0(only vocal content) to 1(no vocal content).

  • valence - It describes the musical positiveness conveyed by a track and it ranges from 0 to 1 as well.

  • popularity - This is calculated on the total number of plays a specific track has had so far and how recent those plays are. The range goes from 0 to 100.

  • tempo - It's the estimated pace of a track in beats per minute (BPM).

  • liveness - It detects the presence of an audience during the song's recording.

  • loudness - It's the average loudness value across the entire track in decibels (dB). Values typically range between -60 and 0 dB.

  • speechiness - It detects the presence of spoken words in a track and ranges from 0 to 1.

  • key - It represents the key on octave encoded as a value ranging from 0 to 11, starting on C as 0, C#/Db as 1, D as 2, and so on.

Our data is quite large to be hosted on GitHub. So, if you'd like to see the dataset, you can email_me, and I'll be more than happy to share the data with you.

Acknowledgments

All work was done by myself, Ambrose Karella, and Mariachiara Acconcia.

Spotify API and Soundiiz to help us with searching songs and get the songs features.

About

Collect information on songs, analyze and build a recommender system with the help of Spotify API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published