## Central Idea

### Research Question
**How are musical themes correlated with artist collaborations, and can we identify common thematic elements or trends emerging specifically from collaborative music pieces?**

### Why is this interesting?
Collaborative songs often incorporate different musical styles, genres, or lyrical themes stemming from the habits or niches occupied by the individual contributors. By representing these collaborations as a network, we can uncover unique patterns and trends that may not be evident by simply looking through on a case-by-case basis. 

Collaboration frequently results in creative innovation stemming from the interaction of diverse groups, enabling artists to experiment beyond their typical stylistic boundaries. It can facilitate the exchange of ideas, lead to genre-crossing hits, and introduce audiences to new combinations of sounds and lyrical styles. Identifying how themes shift or persist in collaborative contexts may also provide valuable insights into cultural influences, artistic interactions, and audience reception to musical innovation. 

Additionally, examining thematic patterns specifically arising from artist collaborations can help illuminate broader trends within the music industry, such as increasing genre fluidity, the impact of artist networks on creativity, and how collaborations may influence chart success or artist visibility.

In short, this project aims to bridge the gap between lyrical analysis and network analysis, highlighting how artistic collaborations contribute to evolving musical expression and innovation.


### What is your dataset?

The dataset is a combination of data collected from two main sources: the **Spotify Web API** and the **Genius API**.

- From the **Spotify API**, we retrieve metadata about songs and artists. This includes information such as:
  - `Song title`
  - `Primary and featured artists`
  - `Release date`
  - `Genres (inferred from artist)`
  - `Popularity metrics (e.g., Spotify popularity score)`
  - `Collaboration indicators (i.e., songs with multiple listed artists)`

- From the **Genius API**, we collect the **lyrics** for each of the songs obtained from Spotify. This enables us to perform textual analysis and extract thematic elements from the song lyrics.

The final dataset is constructed by merging Spotify metadata with corresponding Genius lyrics, resulting in a structured dataset where each row represents a song. Key variables include:
- `title`
- `artists`
- `release_date`
- `lyrics`
- `genres`
- `is_collaboration` (Boolean flag)

This combined dataset allows us to analyze both the **network of artist collaborations** and the **thematic content of lyrics**, forming the foundation for both our network and text analyses.


### Why did you choose it? ###

This dataset showed large amounts of promise for providing key insights into the inner workings of artistic culture and collaboration. Our group are all passionate about pop music and were excited for the opportunity to explore the motives and mechanisms of the musical ecosystem.

### What is your goal for the end-user experience? ###

The website presents a viewpoint into sub-divisions between the top 50 artists in the english-speaking world right now. The ultimate aim is to educate the audience on the way in which sentiment varies between stylistic groups, showing where both like-minded individuals working together can result in something completely different, and how blending between different subgroups often ends up following the main artist's stylistic preference. 

### Preprocessing ###

The data was luckily provided in a very clean format by the API, which led to minimal preprocessing, mainly consissting of removing duplicate songs, and making sure all lyrics passed to the analysis page were only english lyrics. 

### Dataset statistics ###

The data, including network and lyrics added up to about 15MB, with the graph containing 418 unique artists representing the edges, and 696 unique collaborations. The average degree of the network came to 4.5, and was quite sparse, only having a density score of 0.008

### Text work and network analysis ###

In terms of text, the source used were the lyrics in order to provide sentiment analysis to attribute to each composer, and provide a better understanding of the average sentiment of each community identified through the collaboration graph. Word clouds were created using TF-IDF, identifying unique words overrepresented in each communities song contents. From this it becomes clear that Community 0 is rap. After further sentiment analysis, using a pretrained model, the following observations were made:
 - Community 0 consisted of mainly rap artists and word content. This group presented with more negative sentiment

 - Both Community 1 and 2 consisted of mainly pop artists and word content, however upon inspection of sentiment analysis, Community 1 appears to be more melancholic pop, while community 2 represents more happy and uplifting pop
 
 - From the sentiment-over-time analysis of each group, interestingly all groups followed similar oscillatory patterns, with highs existing around 2007 and 2015, and lows around 2010, 2020 (most likely due to covid), and again in 2024.


### Discussion ###

#### What went well ####

Overall, the network coupled with text analysis was very informative about the different cultures existing within popular music. It highlighted the tendency of artists to form "creative cliques" and how indicative these cliques were of the content and sentiment of each collaborators songs. Furthermore, the fluctuation of sentiment over time was extremely interesting, showing an accurate portrayal of recent significant worldwide phenomena. 

#### Future Improvements ####

Given enough time, it would have been interesting to include more artists in the overall dataset, providing a more detailed and comprehensive perspective of differences in sentiment and word content between subgroups of artists within popular music. Furthermore, the incorporation of the sentiment data into the graph visualisation, potentially through colour coding or other means. Overall we were proud with the final product, and feel that it adequately illustrates the dedicated team of students behind the project.