## SPOTIFY NEW DATA SET

TARGET: Descriptive Analysis

- Trends Over Time: Analyze how the popularity of songs has changed over time. This could include looking at how the attributes of popular music (like tempo, key, and duration) have evolved.
- Top Artists and Tracks: Identify the most popular artists and tracks in your dataset based on streaming statistics and presence on various music platforms.


In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv('Popular_Spotify_Songs.csv')

# Display the first few rows of the dataframe to understand its structure
df.head()


UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 7250-7251: invalid continuation byte

In [3]:
# Attempt to load the dataset with ISO-8859-1 encoding
df = pd.read_csv('Popular_Spotify_Songs.csv', encoding='ISO-8859-1')

# Display the first few rows of the dataframe to understand its structure
df.head()


Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
0,Seven (feat. Latto) (Explicit Ver.),"Latto, Jung Kook",2,2023,7,14,553,147,141381703,43,...,125,B,Major,80,89,83,31,0,8,4
1,LALA,Myke Towers,1,2023,3,23,1474,48,133716286,48,...,92,C#,Major,71,61,74,7,0,10,4
2,vampire,Olivia Rodrigo,1,2023,6,30,1397,113,140003974,94,...,138,F,Major,51,32,53,17,0,31,6
3,Cruel Summer,Taylor Swift,1,2019,8,23,7858,100,800840817,116,...,170,A,Major,55,58,72,11,0,11,15
4,WHERE SHE GOES,Bad Bunny,1,2023,5,18,3133,50,303236322,84,...,144,A,Minor,65,23,80,14,63,11,6


 Here's a quick overview of its structure and the types of data it includes:

1) track_name: Name of the track.
2) artist(s)_name: Name of the artist(s).
3) artist_count: Number of artists involved in the track.
4) released_year, released_month, released_day: Release date information.
5) in_spotify_playlists: Number of Spotify playlists featuring the track.
6) in_spotify_charts: Number of times the track appeared in Spotify charts.
7) streams: Number of streams on Spotify.
8) in_apple_playlists, in_deezer_playlists, in_shazam_charts: Presence of the track on other platforms.
9) Various audio features like bpm (beats per minute), key, mode, danceability_%, valence_% (musical positiveness), energy_%, acousticness_%, instrumentalness_%, liveness_%, and speechiness_%.

With this data, we can start our descriptive analysis. Here are some initial analyses we can perform:

- Trends Over Time:
        Calculate the total number of streams per year to see how music consumption has changed.
        Analyze the evolution of audio features over the years, like danceability and energy.

- Top Artists and Tracks:
        Identify the top artists by the total number of streams or appearances in playlists.
        Find the tracks with the highest number of streams.

In [5]:
# Ensure the 'streams' column contains only numerical values
# Attempt to convert the 'streams' column to numeric, setting errors='coerce' to replace problematic values with NaN
df['streams'] = pd.to_numeric(df['streams'], errors='coerce')

# Drop rows where 'streams' is NaN after conversion attempt
df_cleaned = df.dropna(subset=['streams'])

# Recalculate the total number of streams per year with the cleaned data
streams_per_year_cleaned = df_cleaned.groupby('released_year')['streams'].sum().reset_index()

# Display the cleaned result
streams_per_year_cleaned


Unnamed: 0,released_year,streams
0,1930,90598520.0
1,1942,395591400.0
2,1946,389772000.0
3,1950,473248300.0
4,1952,395591400.0
5,1957,919962000.0
6,1958,1310563000.0
7,1959,573417800.0
8,1963,1311263000.0
9,1968,1145728000.0


Let's start with the Trends Over Time analysis. We'll focus on two main areas:

- Changes in Music Consumption Over Time: We'll calculate the total number of streams per year to observe trends in music consumption.

- Evolution of Audio Features Over the Years: We'll analyze how certain audio features like danceability, energy, and acousticness have evolved over the years. This will help us understand if there have been any significant changes in the characteristics of popular music.

First, we'll calculate the total number of streams per year. Then, we'll explore the evolution of audio features over the years. Let's begin with the total number of streams per year:

It appears there was an unexpected issue with the data in the streams column, leading to some entries not being properly interpreted as numerical values. This might be due to inconsistencies or errors in the dataset.

Let's clean the data by ensuring that the streams column contains only numerical values and attempt the analysis again. We'll remove any non-numeric values from the streams column and then recalculate the total number of streams per year. Let's correct this and proceed: ​

​

Now that we've cleaned the data, we can observe the total number of streams per year, which gives us a clearer picture of trends in music consumption over time. Here are some key points:

- There is a general upward trend in the total number of streams over the years, especially noticeable from the early 2000s onwards. This likely reflects the growing popularity and accessibility of music streaming platforms.
- The highest total number of streams is observed in 2022, indicating a peak in music consumption through streaming services up to that point.
- A significant increase in streams begins around the mid-2010s, which aligns with the global adoption of streaming services like Spotify.

Next, let's examine the evolution of audio features over the years. We'll focus on three main features: danceability, energy, and acousticness. These features can give us insights into the changing characteristics of popular music over time. Let's calculate the average values of these audio features for each year

In [6]:
# Calculate the average values of danceability, energy, and acousticness for each year
audio_features_per_year = df_cleaned.groupby('released_year').agg({
    'danceability_%': 'mean',
    'energy_%': 'mean',
    'acousticness_%': 'mean'
}).reset_index()

# Display the result
audio_features_per_year


Unnamed: 0,released_year,danceability_%,energy_%,acousticness_%
0,1930,65.0,80.0,22.0
1,1942,23.0,25.0,91.0
2,1946,36.0,15.0,84.0
3,1950,60.0,32.0,88.0
4,1952,67.0,36.0,64.0
5,1957,62.5,35.5,78.5
6,1958,70.666667,48.0,75.666667
7,1959,57.0,30.0,86.0
8,1963,37.0,71.0,52.0
9,1968,74.0,70.0,7.0


Danceability

- There's a general upward trend in danceability, especially noticeable in the more recent years (2021-2023). This suggests that popular music has become more dance-friendly, potentially reflecting changes in listener preferences or the influence of dance music genres.

Energy

- The energy of songs shows some fluctuation over the decades but appears to increase slightly in the latest years. This could indicate a preference for more upbeat and energetic music in recent times.

Acousticness

- Acousticness varies significantly over the years. However, in the earliest and latest years of the dataset, we observe moderate to low levels of acousticness, suggesting a shift towards more electronic or processed music in popular tracks. Nonetheless, there's a notable peak in acousticness around 2018, indicating a temporary surge in preference for more acoustic and organic sounds.

These trends reflect how musical tastes and production techniques have evolved, with recent years showing a preference for more danceable and energetic tracks, possibly driven by the global popularity of genres like pop, electronic dance music (EDM), and hip-hop.

These insights could form the basis for further analysis, such as investigating the factors that influence these trends or exploring how other audio features have evolved alongside changes in music technology and consumption habits.

Let's start with the Trends Over Time in Attributes of Popular Music. We'll focus on the evolution of tempo (bpm), key, and duration over the years. Afterward, we'll proceed to identify the Top Artists and Tracks based on their streaming numbers and platform presence.
Analyzing the Evolution of Tempo, Key, and Duration Over Time

First, we'll calculate the average tempo (bpm), the most common key, and the average duration of songs per year. Let's begin with this analysis:

In [8]:
# Correct approach to calculate the average tempo and the most common key for each year
# Recalculating the most common key per year
keys_mode_per_year_corrected = df_cleaned.groupby('released_year')['key'].agg(lambda x: x.mode()[0] if not x.mode().empty else None).reset_index()

# Recalculate without the 'duration_s' column and correctly calculate for key
tempo_per_year_corrected = df_cleaned.groupby('released_year')['bpm'].mean().reset_index()

# Combine the recalculated average tempo with the most common key per year
tempo_key_combined_corrected = pd.merge(tempo_per_year_corrected, keys_mode_per_year_corrected, on='released_year')

# Display the combined result for tempo and key
tempo_key_combined_corrected


Unnamed: 0,released_year,bpm,key
0,1930,130.0,F#
1,1942,96.0,A
2,1946,139.0,C#
3,1950,143.0,D
4,1952,140.0,
5,1957,147.0,D
6,1958,135.0,G
7,1959,120.5,C#
8,1963,140.0,D
9,1968,116.0,


- Tempo (bpm): The average tempo varies from year to year, with some years showing higher averages that might indicate a preference for faster-paced music.
- Key: The most common key also changes over the years. For some years, a specific key might dominate popular tracks, reflecting musical trends or preferences at the time.

This analysis shows how certain musical attributes of popular songs have evolved over time. Changes in tempo and key can reflect shifts in musical styles, influences, and listener preferences.

In [11]:
# Calculate the total number of streams for each artist
total_streams_by_artist = df_cleaned.groupby('artist(s)_name')['streams'].sum().reset_index()

# Sort the artists by total streams in descending order and take the top 10
top_artists_by_streams = total_streams_by_artist.sort_values(by='streams', ascending=False).head(10)

# Display the top 10 artists by total streams
top_artists_by_streams



Unnamed: 0,artist(s)_name,streams
571,The Weeknd,14185550000.0
557,Taylor Swift,14053660000.0
159,Ed Sheeran,13908950000.0
222,Harry Styles,11608650000.0
43,Bad Bunny,9997800000.0
430,Olivia Rodrigo,7442149000.0
170,Eminem,6183806000.0
75,Bruno Mars,5846921000.0
25,Arctic Monkeys,5569807000.0
228,Imagine Dragons,5272485000.0


Identifying Top Artists and Tracks

Next, let's move on to identifying the top artists and tracks in the dataset. We'll consider both the total number of streams and the presence on various music platforms as indicators of popularity. We'll start by finding the top 10 artists based on the total number of streams for all their songs combined, and then we'll identify the top 10 tracks with the highest number of streams.

Let's begin with the top artists: ​

​

The top artists based on total streams are as follows:

- The Weeknd: With a total of approximately 14.19 billion streams.
- Taylor Swift: Close behind with about 14.05 billion streams.
- Ed Sheeran: With around 13.91 billion streams.
- Harry Styles: Accumulating about 11.61 billion streams.
- Bad Bunny: With nearly 10 billion streams.
- Olivia Rodrigo: With approximately 7.44 billion streams.
- Eminem: Garnering about 6.18 billion streams.
- Bruno Mars: With around 5.85 billion streams.
- Arctic Monkeys: Having about 5.57 billion streams.
- Imagine Dragons: With approximately 5.27 billion streams.

These artists have achieved remarkable success on streaming platforms, reflecting their widespread popularity and the enduring appeal of their music.

In [12]:
# Sort the tracks by total streams in descending order and take the top 10
top_tracks_by_streams = df_cleaned.sort_values(by='streams', ascending=False).head(10)[['track_name', 'artist(s)_name', 'streams']]

# Display the top 10 tracks by total streams
top_tracks_by_streams


Unnamed: 0,track_name,artist(s)_name,streams
55,Blinding Lights,The Weeknd,3703895000.0
179,Shape of You,Ed Sheeran,3562544000.0
86,Someone You Loved,Lewis Capaldi,2887242000.0
620,Dance Monkey,Tones and I,2864792000.0
41,Sunflower - Spider-Man: Into the Spider-Verse,"Post Malone, Swae Lee",2808097000.0
162,One Dance,"Drake, WizKid, Kyla",2713922000.0
84,STAY (with Justin Bieber),"Justin Bieber, The Kid Laroi",2665344000.0
140,Believer,Imagine Dragons,2594040000.0
725,Closer,"The Chainsmokers, Halsey",2591224000.0
48,Starboy,"The Weeknd, Daft Punk",2565530000.0


Next, let's identify the top 10 tracks with the highest number of streams to see which songs have captivated listeners the most: ​

​

The top tracks based on total streams are:

- Blinding Lights by The Weeknd: Approximately 3.70 billion streams.
- Shape of You by Ed Sheeran: Around 3.56 billion streams.
- Someone You Loved by Lewis Capaldi: Approximately 2.89 billion streams.
- Dance Monkey by Tones and I: About 2.86 billion streams.
- Sunflower - Spider-Man: Into the Spider-Verse by Post Malone, Swae Lee: Around 2.81 billion streams.
- One Dance by Drake, WizKid, Kyla: Approximately 2.71 billion streams.
- STAY (with Justin Bieber) by Justin Bieber, The Kid Laroi: About 2.67 billion streams.
- Believer by Imagine Dragons: Approximately 2.59 billion streams.
- Closer by The Chainsmokers, Halsey: Around 2.59 billion streams.
- Starboy by The Weeknd, Daft Punk: Approximately 2.57 billion streams.

These songs have achieved massive success, showcasing their global appeal and the impact they've had on listeners worldwide. Their high streaming numbers reflect their popularity and the significant role they play in the current music landscape.