# Background

The early 20th Century marks the transition from romanticism (Tchaikovsky, Mahler, Strauss) to modernism (Schoenberg - *twelve tone*- , Debussy, Gershwin - *jazz* -).  Restrictions are rejected while free form and experimentation become the norm.  The 20th Century is also an era of massive technological and socio-political change where technology starts to bring new avenues to the production and distribution of music.  

Accroding to ClassicFM, *'modernism in music was about being radical and different.'* The 1920s saw the birth of the *blues*, a predecessor to country music and rock n’ roll.

As we enter the 21st Century, contemporary music, along with new forms of music, such as electronic music, start to surface.

We will be focusing our analysis on the modern period beginning in the early 20th Century since the earliest track recorded in the Spotify dataset was released in 1920.  We expect the older songs to reflect the shift to modernism while still preserving the major characteristics associated with traditional music.  Similarly, more recent songs are expected to exhibit opposite characteristics as we move into modernism and post-modernism.


**References:** 
- https://www.classicfm.com/discover-music/periods-genres/modern/
- https://wmich.edu/mus-gened/mus150/1500%20webbook%20modern%20artmusic/Modern%20ArtMusic.htm




# Dataset

## Description of dataset

The Spotify dataset has been obtained from Kaggle, a data science community where public datasets are available for use. It is courtesy of Yamac Eren Ay, a Data Scientist from Berlin and a passionate music listener. This dataset has been created to serve the scientific exploration of the evolution of musical tastes.  
It contains data on audio features of more than 175,000 songs released  between 1921 and 2021 collected using the Spotify Web API. The noteworthy features include acousticness, danceability, energy, popularity, tempo, and speechiness. In addition, songs are categorized by key, artists, release date, and name. The ‘id’ field uniquely identifies the observations.   

The dataset contains five files. ‘data.csv’ is the main file, and the other ones were created using data from this file. While the numerical values - acousticness, danceability, energy, valence, instrumentalness, speechiness, tempo, loudness, duration_ms, liveness, and popularity – were obtained by calculating the mean of the values obtained from the API, categorical features such as ‘key’ and ‘mode’ were obtained by calculating the mode of the values. Moreover, ‘popularity’ is based on US data, not worldwide data, and this could potentially limit the scope of the analysis.

## Limitations of the dataset

The Spotify dataset only reflects American tastes as it mainly uses data from an American audience.  In addition, we cannot claim that the `popularity` variable accurately quantifies the popularity of a track during a certain period as the data is retrieved from a more modern audience. The `popularity` column only reflects how likeable a certain track is to a modern audience. However, the track characteristics can be used to identify the trends for each period as they are independent of the listener.

Another limitation is the smaller dataset for older tracks. Hence, for a fairer analysis, the maximum popularity for each period was first identified and the 75th percentile based on that maximum value was considered when looking at the characteristics of popular songs.  


## Notes on the variables 
(see https://developer.spotify.com/documentation/web-api/reference/#objects-index)

### Tracks:

- **acousticness**:    A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
- **artists**:   The artists who performed the track. This is stored as an array.
- **danceability**: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
- **duration_ms**: The track length in milliseconds.
- **energy**: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
- **explicit**: Whether or not the track has explicit lyrics ( `true` = yes it does; `false` = no it does not OR unknown).
- **instrumentalness**: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
- **key**:The key the track is in. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.
- **liveness**: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
- **loudness**:The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db
- **mode**:Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
- **speechiness**: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
- **tempo**: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
- **valence**: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
- **year**:
- **popularity**: The popularity of the track. The value will be between 0 and 100, with 100 being the most popular.
The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.
Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity. Note that the popularity value may lag actual popularity by a few days: the value is not updated in real time.

### Artist

- **genres**: A list of the genres the artist is associated with. For example: "Prog Rock" , "Post-Grunge". (If not yet classified, the array is empty.)
- **popularity**: The popularity of the artist. The value will be between 0 and 100, with 100 being the most popular. The artist’s popularity is calculated from the popularity of all the artist’s tracks.

# Methodology and Result

## Procedure

We will be considering three 'periods' of the modern era roughly based on the breakdown provided on the wmich.edu webpage listed in the references.

1. 1920 - 1950.  The period of transition from traditional to modern forms of music
2. 1951 - 2000.  Advent of electronic music and other novel approaches to creating music.
3. 2001 - 2021.  The present.


## Analysis and Research Questions

## 1. Investigating the shifts in musical taste throughout the years

The three periods outlined in the *Background* section are considered using the mean values of the elements for each 75th percentile.

![summary_dataframe.png](attachment:31b5b706-5acd-4d13-869f-e1aa4a7c93a5.png)



**1. 1920 - 1950. The period of transition from traditional to modern forms of music**

This is a period characterised by the following trends:

- High `acousticness`
- Low `energy`
- No `explicit` content
- Low `loudness`
- Low `speechiness`
- Significant variation in tempo
- High `valence`

These observations are in line with our assumptions.


**2. 1951 - 2000. Advent of electronic music and other novel approaches to creating music.**

The observations for this period are:

- `Acousticness` has significantly decreased
- High `energy` throughout
- `explicit` content becoming more accepted
- `loudness` is still quite low
- `valence` unchanged 

Interestingly, explicit content in music starts to become more acceptable during this period.  The high levels of `energy` could be related to the introduction of electronic sounds.

**3. 2001 - 2021. The present.**

This is a period characterised by the following trends:

- Low `acousticness`
- High `energy`
- More `explicit` content
- More `loudness` (closer to 0 dB)
- More `speechiness` 
- Lower `valence`
- Higher `tempo`

One possible reason for the increase in `speechiness` could be rap music becoming more popular.



![trends_energy_acousticness_instrumentalness.png](attachment:ba2a847c-93bf-4a69-a231-3b427233d717.png)

From the plot, a decrease in both acousticness and instrumentalness as well as a significant increase in energy can be observed.

## 2. Trends in Popularity

### Elements

![2.9.2. Plot the Release Counts Over Time by Mode (Major and Minor).png](attachment:ebc26503-e627-4839-96db-e8930e899866.png)

The number of releases in both the major and minor keys are increasing. This is probably due to more songs being produced each year. Moreover, this shows that the key is not a deciding factor in the popularity of a song.  There is, however, a preference for the major key as shown by the trends for each decade.



![2.9.3. Plot the Release Counts by Keys with Modes.png](attachment:8ebe5a35-83d2-4dba-903e-8921e4e43053.png)

This second plot confirms the finding that the major key is preferred. A reason could be the positivity associated with this key.

### Genres


![Popularity of the most popular genres.png](attachment:1b7aacd2-3a32-49cb-a439-079ab88b35a2.png)

Surprisingly, Chinese electropop is the top genre. This is followed by the 'korean mask singer' genre. 'Dutch rap indie', 'rochester mn indie' and 'dong-yo' follow closely behind.

All of these genres have low acousticness, low instrumentalness, high energy and high loudness.  This reflects the music preferences of the current generation.



### Popular tracks and artists

In this section, we look at the popular tracks and their respective artist for each period.

**1. 1920 - 1950. The period of transition from traditional to modern forms of music**

![period1_top5.png](attachment:b45a5d78-c2e7-4518-8f15-ce30c456664a.png)

The top tracks of this period reflect the trends previously identified. They are all high in `acousticness` and low in `energy`.  None of them contain any explicit content.  In addition, significant variations in tempo preferences and overall low `speechiness` levels can be observed.

It can again be seen that most of these songs are Christmas songs.

The top artists of this period based on the number of popular tracks released can be seen in the plot:

![Top 3 Artists (1920-1950).png](attachment:e50b5977-6af0-4c1d-b7cf-717e3737412d.png)


**2. 1951 - 2000. Advent of electronic music and other novel approaches to creating music.**

![period2_top5.png](attachment:171d6b6f-00c0-4016-855a-8d3d72f0d97b.png)

Again, the observations match the general trend of the period.  There is a decrease in `acousticness` as well as an increase in `energy`.  In addition, as shown by the negative values getting closer to sero, the tracks are getting louder.


The top artists of this period based on the number of popular tracks released can be seen in the plot:

![Top 5 Artists (1950-2000).png](attachment:0efcabe3-3f05-455b-a097-9e7450e9362b.png)


**3. 2001 - 2021. The present.**

![period3_top5.png](attachment:dc34c776-e374-4b8b-b40e-6177559730e2.png)

A significant increase in acceptance of explicit content can be observed.  Higher tempos are preferred and the tracks tend to be much higher in energy as compared to the previous period.

The top artists of this period based on the number of popular tracks released can be seen in the plot:

![Top 5 artists (2001-2021).png](attachment:50f0e0dd-a4e7-4972-a96e-8a68606527c5.png)



Looking at the overall picture, we observe that Justin Bieber and Bad Bunny have the most songs with a popularity above 90.

![popularity_above_90.png](attachment:8031f028-b0ae-4261-a556-1d2b65927900.png)



![2.9.5. Counts of Releases of 5 Artists That Own the Top 5 Popular Songs.png](attachment:b4024689-4126-412f-ae37-5834ef559095.png)

The artists who have released the top 5 most popular songs for all three periods are Sia, Ariana Grande, Pop Smoke, iann dior, and Olivia Rodrigo. Among them, Ariana Grande has the highest number of releases

![2.9.1. Counts of top 5 releases by artists.png](attachment:e690b6aa-1234-4dad-bd1e-87e3be6e4ea7.png)

The top 5 artists in terms of the number of songs released are Francisco Canaro, Tadeusz Dolega Mostowicz, Эрнест Хемингуэй,Эрих Мария Ремарк, and Frédéric Chopin.

## 3. Predicting future trends


The factors which could predict the popularity of a track are:

- Fast tempo
- Low speechiness
- Low acousticness
- High `loudness` and `energy` (leading to high `danceability` in some cases)

This reflects the current trends in music, where party music has more traction.

While other factors such as explicity are becoming more accepted, we do not believe that they are necessary components of a popular song. 

# Other Observations

The most popular songs of the period from 1920 to 2000 are almost all Christmas songs.  This observation confirms our suspicion that the popularity of older songs does not accurately reflect their real popularity at the time of release.

In addition, there is no correlation between danceability and energy. This could be surprising as highly danceable songs are generally more energetic nowadays.  However, slower music such as the waltz and the slow could be highly danceable but low in energy.

# Conclusion 

Musical tastes have significantly changed over the last 100 years.  The gain in popularity of both electronic music and rap could have played an instrumental role in shifting the trends.  There is now a preference for fast, loud, and energetic music as compared to more acoustic and instrumental sounds.