A Machine Learning project by Yiyi Chen, Avi Dixit, Sayan Sanyal, and Ed Yip
In this project we seek to understand what makes a song popular, and specifically if one can predict the popularity of a song based on its audio features and metadata. While the virality of a song might depend on many social factors, such as the effectiveness of its marketing campaign and demographics of its listeners, we hypothesize that the inherent characteristics of a song, such as its name and extracted musical features can also be correlated to and indicative of its popularity.
To further explore this question, we will first define what popularity means within the context of the dataset. We will then build a baseline model to predict a song’s popularity, using its audio features and metadata available from the dataset. From there we will expand and refine our model to answer three main questions -
- Item What’s the model’s overall performance on the test dataset? What are the most important features in determining a song’s popularity?
- Item How much, if any, does its performance differ across different genres? How do feature importances differ across genres?
- Item What’s the external validity of the model? How does the model perform against songs not in the test dataset, e.g. songs from the Billboard Top 100?
- Item (If time allows) Can we design better features to improve the overall performance of the model?
Understanding what factors contribute to a song’s popularity has practical significance in a few areas. It helps us better understand
- Item How people evaluate music consciously and subconsciously,
- Item How people’s preferences vary across genres, and
- Item Any systematic differences in preferences between people who listen to songs on Free Music Archive and the general public.
Read INFO 251 - Final Project Proposal_v2-5.pdf for the entire project proposal.