Skip to content
main
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

README.md

song-popularity-prediction

Binder

This is a machine learning project to predict the popularity of a track on Spotify.

About the dataset

The dataset was sourced from Kaggle. The dataset contains 170,653 tracks from 1921 to 2020. It also contains 16 features. Visualizing feature importances reveals that 'year' is the most important predictor of the popularity of a track (Well, the Spotify documentation does state that popularity is calculated with respect to year).

Predicting popularity

The approach to predict popularity is simple. It is broken into four steps:

  1. Create a simple decision tree model
  2. Optimize the decision tree model with GridSearchCV
  3. Tweak only the max_leaf_nodes parameter of the decision tree model
  4. Create a random forest model

The random forest model has the best results, with a mean absolute error of 6.75.

Model MAE
Simple decision tree 9.196
Decision tree (GridSearchCV) 6.915
Decision tree (max_leaf_nodes) 6.829
Random Forest 6.750

About

Predict the popularity of a track on Spotify

Resources

Releases

No releases published

Packages

No packages published