# COGS 108 - Project Proposal

## Authors

This is a modified [CRediT taxonomy of contributions](https://credit.niso.org). For each group member please list how they contributed to this project using these terms:
> Analysis, Background research, Conceptualization, Data curation, Experimental investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing

- Alexis Garcia: Analysis, 
- Emily Vega: Project administration
- James Bartelloni: Background research, Methodology
- Mohan Dong: Background research, Writing
- Nalin Joshi: Project administration

## Research Question

How did the popularity of music genres change from the pre-COVID period (2017–2019) to the post-COVID period (2022–2024), and are changes in genre popularity associated with measurable audio characteristics of tracks (tempo, energy, valence, danceability)?


## Background and Prior Work

Music has long served as a window into cultural links and emotional states. People’s choices in music reflect not only individual taste but also broader societal conditions, such as social environment and collective experience. The outbreak of the COVID-19 pandemic in early 2020 brought unprecedented disruption to everyday life worldwide, prompting researchers to examine how such a global shock altered human behaviour — including music listening and creation — and whether these changes persisted beyond lockdowns into the post-COVID era. Studies show that during acute pandemic periods, listeners turned to music not just for entertainment, but as an emotional regulation strategy, with distinct preferences emerging for nostalgic and emotionally positive songs as a means of coping with stress and isolation.<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1)

Early empirical work on music consumption during the pandemic found that lockdowns significantly changed listening behaviour. For example, analysis of Last.fm users’ streaming records revealed reduced variety and novelty in music consumption during the initial shock of the pandemic, coupled with a greater preference for mainstream artists and familiar tracks, suggesting that uncertainty led listeners to gravitate toward familiar, comforting music.<a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2) Another investigation in the UK documented a surge in listening to older ‘nostalgic’ music during lockdown, particularly positive-toned songs, indicating that both nostalgia and emotional valence influenced choices during COVID-19.<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1)

Beyond behavioural changes, researchers have also analysed the structure of musical content itself over time. Some studies have explored how measurable audio features — such as tempo, energy, danceability, and valence — relate to listening contexts or activities. One such study found that Spotify’s audio features like valence and energy are associated with listeners’ motivational contexts (e.g., dance, relaxation), suggesting that higher energy and valence are linked to more active listening situations.<a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3) Another data-driven project using Spotify’s API showed that tempo, energy, and valence are among the features most strongly correlated with track popularity across genres, forming a basis for linking audio characteristics with success metrics.<a name="cite_ref-4"></a>[<sup>4</sup>](#cite_note-4)

Recent longitudinal work has explicitly contextualized music trends across COVID and post-COVID periods. A comprehensive analysis of the top 1,000 Spotify songs from 2011 to 2023 reported significant shifts in dominant musical features over time, with pre-pandemic years showing rising trends in energy and loudness, and post-pandemic years characterized by increases in acoustic and introspective qualities.<a name="cite_ref-5"></a>[<sup>5</sup>](#cite_note-5) Though focusing broadly on feature evolution, such work underscores that major global events can coincide with measurable changes in the sonic qualities of popular music, supporting the idea that cultural and technological factors shape music trends.

Other research frames these shifts within broader emotional and societal trends. For instance, sentiment analysis of popular music lyrics across the pre-, during-, and post-COVID periods found thematic changes reflecting anxiety and introspection during the pandemic — suggesting that external stressors are mirrored not only in listening behaviour, but in the creative output of music itself.<a name="cite_ref-6"></a>[<sup>6</sup>](#cite_note-6) Collectively, this prior work reveals that music both reflects and shapes human experience during major societal shocks, and that measurable audio features can serve as quantifiable indicators of broader shifts in cultural mood and preference. As such, your project — which investigates how genre popularity and audio characteristics changed in the post-COVID era — builds directly on these existing streams of research, extending them with a structured comparison of popularity dynamics and audio features over clearly defined time periods.

<a name="cite_note-1"></a> [^](#cite_ref-1) Yeung, T. Y. C. (2023). Revival of positive nostalgic music during the first Covid-19 lockdown. *Nature*. https://www.nature.com/articles/s41599-023-01614-0#:~:text=Abstract,nostalgia%20and%20positivity%20in%20music.

<a name="cite_note-2"></a> [^](#cite_ref-2) Ghaffari, M. et al. (2024). The impact of COVID-19 on online music listening behaviours. *Springer*. https://doi.org/10.1007/s11042-023-16079-1

<a name="cite_note-3"></a> [^](#cite_ref-3) Duman, D. (2022). Music we move to: Spotify audio features and reasons. *Plos.org*. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0275228

<a name="cite_note-4"></a> [^](#cite_ref-4) Singh, A. (2025) A Data-Driven Approach Using Spotify API. *Research-Archive.org*. https://research-archive.org/index.php/rars/preprint/view/2615/3674

<a name="cite_note-5"></a> [^](#cite_ref-5) Tan, E. E. L., & Ko, A. M. S. (2026). Dynamics of Popular Music Over a Decade. *JSR*. https://www.jsr.org/hs/index.php/path/article/view/8744

<a name="cite_note-6"></a> [^](#cite_ref-6) Hemmati, H. & Frohock, B. (2025). Evolving Human Emotions Under a Global Crisis. *WUSS Proceedings*. https://www.wuss.org/proceedings/2025/WUSS-2025-Paper-174.pdf <a href="#ref7"></a>


## Hypothesis


In the post-COVID era, music genres with higher energy, faster tempo, and more positive emotional valence increased in popularity compared to lower-energy or more melancholic genres, as listeners gravitated toward more upbeat and emotionally uplifting music.

## Data


    Ideal Dataset:
    
    1. Ideal Variables for the dataset :

    Time Variables such as year and period ( pre-Covid vs post-Covid )
    Popularity Variables, number of streams per track, some kind of normalized popularity score, and a ranking either monthly or yearly.
    Genre Variables, artist genre classification, primary and or sub genre.
    Audio Characteristics like tempo, danceability, loudness, and energy.
    Identification, some sort of song ID along with an artist ID and their available platforms 

    2. Number of observations needed : 

    An ideal range would be between 40,000 to 100,000 tracks, this ensures that we have an ample amount of tracks per year with different genres along with reducing biased for songs that 
    went viral for a short period of time. A larger number of tracks in theory would result in a more stable average for genre and their audio features. 

    3. Who/What/How data would be collected : 

    Who: Large music platforms ( Spotify, Apple Music, etc. ) 
    What: Music that was popular and or charted between 2017-2019 and 2020 - 2024
    How: API data collection, Publicly Listed Charts, or pre-existing datasets ( Kaggle )

    4. How would this data be stored/organized : 

    The data would stored in tidy format, such that each column represents a variable and each row represents a track by year.
    Example:
    track_index, track_name, artist, genre, year, period, popularity, tempo, danceability, loudness, energy
    This structure makes it easy to compare pre and post covid, and also allows for regretion and correlation analysis
    
    Potential Datasets:

    This kind of dataset probably does not exist, so we would use a combination of the following datasets:

    Spotify Web API and Kaggle Spotify Data Set :
    Free developer access and API authentication is required for this dataset
    This dataset includes the following variables: tempo, energy, valence, danceability, track popularity score, artist genre, release dates. 

    Why is supports our project: 
    Provides quality audio characteristics

    Some limitations include:
    Only inclduing data within spotify, and no chart rankings

    Kaggle's Version of the Spotify dataset does include time variable features but they dont look completely consistent with other data. Also presents issues because it only considers 
    music available on spotify. 

    Billboard Charts Data ( billboard.com/charts ) :

    Accessible publicly, This dataset includes weekly chart rankings, chart dates, and track and artist names. 
    Why this supports our project: 
    Provides popularity measures beyond the Spotify API.
    Limitations:
    No audio characteristic Features 

    Since there are no "perfect" data sets our best option is the combination of the Spotify API, Spotify Kaggle, and Billboard Chart data, which approximates the ideal dataset
    for our project. 
  

## Ethics 

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> Example of how to use the checkbox, and also of how you can put in a short paragraph that discusses the way this checklist item affects your project.  Remove this paragraph and the X in the checkbox before you fill this out for your project

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
 - [X] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

### B. Data Storage
 - [X] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?
 - [X] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?
 - [X] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

### C. Analysis
 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?
 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?
 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
 - [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

### D. Modeling
 - [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?
 - [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?
 - [X] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?
 - [X] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?
 - [X] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment
 - [X] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 - [X] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
 - [X] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?
 - [X] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?


## Team Expectations 

Team Expectation 1: Communication and Meetings

The team will primarily use Discord as its main platform for communication. Discord has
already proven effective, as all members naturally check it regularly and respond at least once
per day, with increased activity as deadlines or checkpoints approach. This platform will be used
for general updates, task coordination, and quick clarifications. In cases where a team member
is non-responsive on Discord and the matter is time-sensitive, email will serve as an acceptable
backup communication channel.

The team will meet at least once per week, with the option to increase to twice per week
during periods of heavier workload or approaching milestones. Meeting frequency will be
adjusted collaboratively based on project needs. Meetings may be held in person or virtually,
depending on availability and practicality, but consistency and attendance are expected. Clear
communication and predictable check-ins are considered essential to maintaining project
momentum and accountability.


Team Expectation 2: Tone and Interpersonal Conduct

At the first in-person meeting, the team will explicitly agree on communication tone and
interpersonal norms. The agreed-upon standard will be respectful, blunt, and polite,
recognizing that direct feedback is important for project success while maintaining
professionalism. Personal insults, dismissive language, or hostile behavior will not be tolerated
under any circumstances.

Communication is expected to remain professional, friendly, and collegial, allowing for a
natural and conversational style without crossing boundaries. Team members are encouraged
to express disagreement clearly and constructively, for example by explaining concerns with
reasoning and inviting discussion. A core expectation is that all concerns—technical, logistical,
or interpersonal—should be voiced openly without fear of judgment or shame. Silence due to
discomfort is discouraged, and psychological safety is considered a shared responsibility. The
team values clarity, honesty, and mutual respect in all interactions.

Team Expectation 3: Decision-Making Process

Most major project decisions will aim for unanimous agreement, with the team making a
good-faith effort to hear and address all perspectives. If disagreement arises, the group will
discuss concerns openly and attempt to resolve them through clarification, compromise, or
alternative approaches. Persistent disagreement will prompt the team to consider adjusted
solutions rather than forcing consensus prematurely.
For sub-tasks and specialized components, decision-making authority will be delegated to
the individual or subgroup responsible for that task. These task owners are trusted to make
informed decisions independently, while retaining the option to request broader team input when
appropriate. In situations requiring urgent decisions and where a team member is
non-responsive, the responsibility will fall to those who are available and actively participating at
that time. Decisions made under time constraints will prioritize progress while remaining aligned
with the project’s overall goals.

Team Expectation 4: Task Allocation and Role Structure

Formal role specialization has not yet been finalized and will be discussed during the team’s first
in-person meeting. Until then, leadership has emerged organically, with Mohan taking a
coordinating role and all members contributing actively and equitably. The team values flexibility
and collaboration over rigid role boundaries.
Tasks will primarily be assigned based on individual preference and interest, with the
understanding that no member will be overly selective or resistant to necessary work. If a team
member feels overwhelmed or encounters difficulty with a task, they are encouraged to request
assistance without hesitation. Other members are expected to step in when possible. Progress
and task status will be tracked using GitHub, ensuring transparency and shared visibility into
ongoing work. The overall goal is balanced contribution, adaptability, and collective ownership of
project success.

Team Expectation 5: Support and Handling Struggles

If a team member begins struggling with a task, decision, or deliverable, they are expected to
communicate this early and openly, preferably through Discord. If Discord communication fails
or is not acknowledged, email may be used as a secondary option. The team recognizes that
members bring different strengths, particularly in data science and computer science, and will
leverage prior experience to provide targeted support when needed.
When assistance is required, effort will be reallocated collaboratively, ensuring that no single
person consistently bears additional workload. A rotating support approach will be used so
responsibility is shared fairly over time. If someone steps in to help during one instance, others
will be encouraged to step up in future situations. The priority is to maintain progress while
supporting teammates in a way that is sustainable, transparent, and respectful of everyone’s
time and capacity.

Team Expectation 6: Planning, Workflow, and Deadlines

The team will follow an agile-style workflow, structured around two- to three-week sprints.
Sprint length and scope will be adjusted depending on task complexity, dependencies, and
project priorities. Goals, tasks, and deadlines will be reviewed and updated regularly to reflect
evolving project needs.
While the team values flexibility, deadlines will be treated seriously. The guiding principle is a
fluid but firm approach: plans may adapt, but commitments are respected. The team is
considering using Jira to manage sprint planning, task assignment, and progress tracking,
though final tooling decisions may evolve. Clear documentation of tasks, timelines, and changes
is expected so that all members remain aligned and informed throughout the project lifecycle.

Team Expectation 7: Documentation and Visibility of Expectations

All team expectations, rules, and policies will be documented in a dedicated #rules channel on
the team’s Discord server. This channel will serve as a centralized and permanent reference
point for communication norms, decision-making processes, workflow expectations, and support
policies.

All relevant messages outlining team expectations will be pinned to ensure they remain easily
accessible to every member at all times. This approach promotes transparency, accountability,
and shared understanding, especially as the project evolves. By keeping expectations visible
and documented, the team aims to reduce ambiguity, prevent miscommunication, and reinforce
collective responsibility. Updates or revisions to these expectations will be discussed openly and
reflected in the pinned documentation as needed.

## Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/28  |  9 PM | Reviewed the Google forum and looked briefly at all the projects to see which one stood out most to us. Determined what topics we can focus on for the group project| Got together to share topic ideas and find one we all like. Discussed Google Form to give time to find a relatable project to what we have in mind | 
| 1/30  |  6 PM | Go over what the project review was asking from us | Worked on project review and took time to read the two selected projects that stood out most and gave us ideas. Came back together to discuss what we found in said projects, then discussed how the projects inspired more ideas for our project, leading us in a good direction| 
| 2/4   |  2 PM | See what project topic should be best to pick, look over project proposal| Choose topic to work on (Music during COVID pandemic), wrote out hypothesis till we finally chose one we all liked, Split up work and started to work on our part of the project while checking in with other groupmates |
| 2/10 | Online | Started looking into possible datasets we could use for our project and what information they include (genres, popularity, audio features). | Shared dataset ideas and discussed which ones would work best for answering our research question. Talked about any limitations or concerns with the data we found. |
| 2/14 | Online | Began organizing and cleaning the dataset by checking for missing values and making sure the genre and time period data were usable. | Compared what each person found while cleaning the data and discussed initial patterns we noticed. Planned what kind of analysis we wanted to focus on moving forward. |
| 2/23 | Online | Continued working with the data and explored trends in genre popularity and audio features across pre- and post-COVID time periods. | Came together to discuss early trends from the exploratory data analysis and made adjustments to our analysis plan based on what we observed. |
| 3/5 | Online | Ran the main analysis to compare genre popularity and audio features between the pre- and post-COVID periods. | Discussed results and whether they supported our hypothesis. Talked through how to present our findings clearly in the final project. |
| 3/13 | Online | Drafted the results, discussion, and conclusion sections of the project. | Reviewed the full project together, suggested edits, and made final revisions before submission. |
| 3/18 | Online | Take one final look by ourselves to make sure everything is good to go. Add very last-minute changes to our branches. | Look over the project altogether, determine if everything is good to go, discuss what's left on our branches, and decide whether committing it to the main, one more glance and submit |
