# COGS 108 - Project Proposal

## Authors

- Austin Flippo: Research Question, Hypothesis, Ethics
- David Li: Research Question, Hypothesis, Data
- Farzad Kashani: Rearch Question, Hypothesis, Background and Prior Work
- Roxy Behjat: Research Question, Hypothesis, Background and Prior Work, Team Expectations
- Ryan Namdar: Research Question, Hypothesis, Timeline

## Research Question

## Research Question

For Spotify songs, which audio features (such as danceability, energy, valence, tempo, loudness, acousticness, and speechiness) are most strongly associated with a song’s relative popularity within its genre? Popularity will be measured using Spotify’s track popularity score compared within genres as a percentage, and we will also include control variables such as artist popularity. We will use regression and analysis to find and understand these associations, which allows us to identify which musical features are most predictive of high relative popularity. 


## Background and Prior Work

Our group began our work with the premise that what makes a hit song has become increasingly quantifiable in the era of big data and streaming platforms such as Spotify. Research into what makes a song popular, or at least what increases its likelihood of becoming a hit, has expanded rapidly in recent years. With the proliferation of music datasets and streaming platforms such as Spotify and Apple Music, researchers have sought to determine whether measurable audio features and musical characteristics are associated with commercial success, listener engagement, and critical reception. We focus on Spotify because it provides both a standardized set of audio features (danceability, energy, and valence) and a track-level popularity score on a 0-100 scale, calculated algorithmically and primarily driven by total plays and their recency.

One research project by Araujo et al. <sup><a href="#ref1">1</a></sup> used Spotify charts and audio features to model song popularity and predict whether a track would appear in Spotify’s Top 50 Global rankings in the future. They framed the task as a classification problem. They found that machine-learning models achieved strong predictive performance, with AUC (Area Under the Curve) values exceeding 0.80 when forecasting popularity up to two months in advance. Interestingly, they found that adding acoustic features to chart-based predictions yielded only minimal performance gains, suggesting that audio characteristics alone may have limited predictive power. This finding highlights a significant limitation and motivates our focus on identifying which specific Spotify audio features meaningfully contribute to popularity prediction rather than assuming all features are equally informative.

Another related study by Ceulemans and Detry <sup><a href="#ref2">2</a></sup>investigated whether musical characteristics influence commercial success and critics’ rankings preferences using a dataset of 514 songs from 2009 that appeared on multiple year-end charts. The authors constructed variables for tempo, duration, and other musical attributes. They then used regression models to assess their association with success metrics such as Billboard rank, chart longevity, and critics’ lists. Their results suggested that while some features were associated with chart survival and critics’ preferences, other attributes had limited or context-dependent effects on commercial success. This finding is particularly relevant to our project, as it suggests that not all musical features contribute equally to popularity outcomes, and that the feature importance may vary with the success metric used.

Works like Georgieva et al. <sup><a href="#ref3">3</a></sup>  similarly explore the predictive power of audio features for song popularity. They also framed hit prediction as a classification problem, showing that audio features extracted from songs, such as rhythm and harmony, can help predict whether songs are hits or non-hits in historical chart data. Their work further highlights the difficulty of consistently defining “success” across time periods and datasets, underscoring the importance of carefully selecting and interpreting popularity metrics.

Together, these studies suggest that while audio features do contain meaningful information about popularity, their predictive strength is uneven and highly dependent on modeling choices and outcome definitions. Building on this prior work, our project aims to quantify the extent to which Spotify audio features predict popularity and identify which features are the strongest predictors.


1. Araujo, Carlos, et al. Predicting Music Popularity on Streaming Platforms,<a href="https://www.researchgate.net/publication/341420234_Predicting_Music_Popularity_on_Streaming_Platforms"> www.researchgate.net/publication/341420234_Predicting_Music_Popularity_on_Streaming_Platforms. </a>  
2. Ceulemans, Cedric, and Lionel Detry. Does Music Matter in “Pop” Music? The Impact of Musical ...,<a href="https://www.cedricceulemans.net/uploads/2/0/4/2/20423775/does_music_matter_in_%E2%80%9Cpop%E2%80%9D_music.pdf"> www.cedricceulemans.net/uploads/2/0/4/2/20423775/does_music_matter_in_%E2%80%9Cpop%E2%80%9D_music.pdf. </a>  
3. Georgieva, Elena, et al. HIT PREDICT: Predicting Hit Songs Using Spotify Data,<a href="https://ccrma.stanford.edu/~egeorgie/documents/HitPredict_Final.pdf"> ccrma.stanford.edu/~egeorgie/documents/HitPredict_Final.pdf. </a>


## Hypothesis


We hypothesize that songs with higher danceability, energy, and loudness will have higher relative within genre popularity percentages, and that songs with higher instrumentalness and acousticness will tend to have lower relative within genre popularity. We expect these patterns because high energy songs are often promoted in mainstream and playlist-driven listening environments on Spotify, whereas more acoustic/instrumental tracks may appeal to narrower, more niche audiences. 

## Data

The ideal dataset should have variables containing features of the song that we want to analyze, such as danceability and energy, as well as a popularity score for the song. The dataset should also contain the genre so it can be properly filtered. We should aim for 10,000 observations, as this provides enough data to reliably calculate statistics for several different genres without being skewed by outliers. The data should be collected directly from Spotify and should be stored in a tidy data format such as a CSV that can be easily read in Python. 
Potential Sources: 
https://developer.spotify.com/documentation/web-api 
The data is on Spotify’s servers and needs a registered application in order to access the data. The important variables here are popularity and audio features such as danceability, energy, and valence. This dataset also has a release date variable that allows us to filter based on year which may be helpful. 
https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset/data 
The data is located at this Kaggle URL and can be downloaded immediately as a CSV file. No special permissions are required. The important variables are popularity, and audio features such as danceability, energy, and valence. 


## Ethics 

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - X **A.1 Informed consent**  
We do not collect data from human subjects, the data we collect comes from publicly available Spotify track level metadata and audio features, not from individual users or surveys.
 - X **A.2 Collection bias**  
Our dataset only reflects Spotify’s platform, and Spotify’s recommendation algorithms and promotional systems influence which songs are visible to people who use their service. This could bias our sample, and means that are findings describe Spotify popularity rather than universal music popularity.
 - X **A.3 Limit PII exposure**  
We do not collect or use personally identifiable information, instead, we analyze track level attributes. This minimizes privacy risks for Spotify user data. 
 - X **A.4 Downstream bias mitigation**  
Spotify does not provide protected demographic attributes like race or gender, so we cannot test for downstream bias across these groups. We acknowledge this limitation and avoid making demographic claims.
### B. Data Storage
 - X **B.1 Data security**
 - X **B.2 Right to be forgotten**
 - X **B.3 Data retention plan**
### C. Analysis
 - X **C.1 Missing perspectives**  
Our analysis may miss perspectives from artists, smaller creators, or listeners outside Spotify, which is extremely important to address. We treat our findings as platform specific, with an emphasis of being exploratory and not making claims on what music should be developed. 
 - X **C.2 Dataset bias**  
Popularity may reflect marketing, current events, user playlist placement, or social trends rather than musical quality. We avoid interpreting popularity as artistic value, and make sure to frame results as platform insights and associations. 
 - X **C.3 Honest representation**  
In our insights and analysis, we will avoid overstating any correlations/patterns observed. We will not imply causation, and be sure to present results as associations.
 - X **C.4 Privacy in analysis**
 - X **C.5 Auditability**
### D. Modeling
 - X **D.1 Proxy discrimination**  
Our models use musical features rather than demographic data, but genre or language could indirectly heavily relate and influence to cultural groups. So we will avoid making demographic claims on our conclusions. 
 - X **D.2 Fairness across groups**
 - X **D.3 Metric selection**  
Spotify popularity reflects exposure and trends just as much as it reflects the quality of music streamed. We clearly communicate that it is not a measure of artistic value.
 - X **D.4 Explainability**
 - X **D.5 Communicate limitations**  
We clearly state that our findings are Spotify platform specific and should be observed as associations. 
### E. Deployment
 - X **E.1 Monitoring and evaluation**
 - X **E.2 Redress**
 - X **E.3 Roll back**
 - X **E.4 Unintended use**  
Models like ours could be used to encourage music creation that pertains to the traits that we prove are popular, generalizing art. We emphasize that our project explores patterns within the Spotify app, rather than prescribing how music should be made.

## Team Expectations 

Team Members: Roxana Behjat, David Li, Austin Flippo, Ryan Namdar, Farzad Kashani

* We all agree for our team members to respond in our iMessage group chat in a timely manner, and that we will give honest and timely updates in case of emergencies. We agree to meet at least once a week, aiming for twice, in order to discuss our project, responsibilities, and to work together. 

* We aim to have consensus style decision making, where if we come to an impasse we will speak to each other and try to find the root of the issue and move forward as a group.

* We agree to divide tasks evenly and to work through GitHub, where certain people might be working on more specific types of tasks, everyone will carry equal amount of weight in terms of work. We also agree to split work across divisions evenly, no one will only be doing coding, writing, etc.

* We also all agree to be honest with each other, especially in times of conflict or miscommunication.

## Project Timeline Proposal

Instructions: REPLACE the contents of this cell with your work

Specify your team's specific project timeline. An example timeline has been provided. Changes the dates, times, names, and details to fit your group's plan.

If you think you will need any special resources or training outside what we have covered in COGS 108 to solve your problem, then your proposal should state these clearly. For example, if you have selected a problem that involves implementing multiple neural networks, please state this so we can make sure you know what you’re doing and so we can point you to resources you will need to implement your project. Note that you are not required to use outside methods.



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/28  |  4 PM | Review COGS 108 proposal requirements and rubric; brainstorm research queston & measurable variables  | Determine best form of communication fir group is iMessage groupchat; Finalize research question direction and project idea; Assign proposal section | 
| 2/4  |  2 PM |  Research question, hypothesis and background/prior work, initial Ethics checklist, candidate dataset list and variables | Group edit and finalize proposal with clarity and full scope; Determine dataset plan and analysis approach; finalize ethics writeup and submit proposal | 
| 2/18  | 3 PM  | Acquire dataset(s); create data dictonary with features and target; complete intial data cleaning with plan and code.  | Discuss Wrangling and review wrangling deicisons (missing values/outliers/duplicates); confirm feature set and target definition; plan EDA figures; finalize and submit Data Checkpoint. |
| 3/4  | 6 PM  | Produce EDA outputs (distributions, correlations, popularity vs key features, any transforms); save key plots to results folder | Review and interpret EDA findings; discuss and refine analysis plan; choose our modeling approach and evaluation metrics; finalize and submit EDA checkpoint |
| 3/11  | 2 PM  | Implement baseline models + grasp idea of first "complete" model(s); train and test split + CV; initial feature importance and coefficients; draft methods outline for final report | Compare models and metrics; evaluate to determine error/diagnostic analysis; decide next iterations of feature and tuning; Outline final project sections |
| 3/17  | 12 PM  | Finalize analysis and checks; finalize tables/figures; draft reflections on results, disucssion, limitations and ethics; draft Final Project final submission end-to-end| Full-project review pass with clarity, visuals, claims vs evidence; reproducibility check to minimize any missed errors; finalize submission checklist |
| 3/18  | Before 11:59 PM  | Final proofread; ensure all notebooks and modules run clean with no hidden errors; push final versions to github; complete any surveys | Turn in Final Project & Group Project Surveys |