Skip to content

Predict probabilities of whether songs will become a hit using Billboard's top-50 Hip Hop/R&B dataset

Notifications You must be signed in to change notification settings

cyrusParvereshi/HipHop_and_RNB_Hit_Song_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HipHop_and_RNB_Hit_Song_Prediction (STA141C_Project)

This project was done by Cyrus Parvereshi, Rohan Malhotra, and Jonathan Fernandez. The goal of our project is to come up with a predictive model that can be used to predict a hit hip-hop/rnb song. We have defined a hit song by a song that has appeared on the top 50 hip-hop/rnb billboard in the last 10 years. Below categorizes our files and provides a brief description of each one.

Getting the Data:

To get the data we looked at every artist that had appeared in the top 50 hip-hop/rnb billboard. We then got a discography of all those artists. We did this as some of their songs would be hit while others would be a non-hit.

Files:

  • artisits.txt: list of all artists from 2009-2019 on top 50 hip-hop/rnb playlist

  • BillboardAPI_EDA: Looked at the bill board data and is where we got artists.txt from using the Billboard API

  • get_artist_features.R: Used this get a df for each artist and all of their songs along with the spotify provided features using spotifyr

  • combiningFunction.R: Used to combine all the dfs into one big df from get_artist_features.R. Final output was final_artist.pkl

  • final_artist.pkl: Df with all songs from all the artists along with features from Spotify

  • R&B10yrDF.pkl: A list of every song that has been on top 50 hip-hop/rnb playlist

  • Final Data Creation.ipynb: Uses "R&B10yrDF.pkl" to identify which songs are hit songs in the final_artist.pkl data set. The df is cleaned our final data set is made which is "merged.pkl" which we were unable to be upload to github.

Machine Learning Algorithms:

  • Logistic Regression all variables.ipynb: Logistic regression using all variables.

  • Logistic Regression Recursive Feature Elimination.ipynb: Model using features found from Recursive Feature Elimination, which was only loudness.

  • Logistic Regression using selectkbest feature selection.ipynb: Used ANOVA F-scores to find the top 3 features which were Valence, Liveness, and Loudness.

  • Logistic Regression using variables from LDA.ipynb: Features that were found to be important from LDA.ipynb which were instrumentalism and energy.

  • LDA.ipynb: Used to determine important features.

  • RandomForrests.ipynb: implementation of Random Forest algorithim

  • SVM classification.ipynb: implementation of SVM classification method

Sound Analysis

  • sound analysis folder: .wav files for hit and non-hit songs
  • Audio data and rf.ipynb: performed random forest algorithm on the audio features obtained from the .wav files

About

Predict probabilities of whether songs will become a hit using Billboard's top-50 Hip Hop/R&B dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published