In [60]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

# ABOUT THE DATA

The data used in this analysis has been generated using the Spotify API, a powerful platform that provides access to a vast collection of music-related information. Specifically, we have generated data for the renowned artist Taylor Swift. This choice is deliberate as our project focuses exclusively on Taylor Swift's music and the insights we can glean from it.

To create this dataset, we utilized the Spotify API to retrieve various details related to Taylor Swift's songs and albums. This information encompasses a wide range of attributes, including audio features, album release dates, and more.

Data Generation and Export

It's important to note that data generation through the Spotify API requires access keys and authentication, which grants us the ability to query and retrieve this valuable musical information. Once we successfully gathered the necessary data for Taylor Swift's music, we exported it to a CSV (Comma-Separated Values) file. This decision to export the data to a CSV file serves several purposes:

Data Preservation: By storing the data in a CSV file, we ensure that our project has access to this dataset without the need to repeatedly fetch it from the Spotify API. This not only reduces the burden on the API but also allows us to work with a consistent dataset for analysis.

Accessibility: CSV files are widely supported across various data analysis tools and programming languages, making it easy for us to load and manipulate the data in our chosen environment. The data is stored in [https://github.com/gu-dsan6000/fall-2023-reddit-project-team-21/tree/main/data/csv/spotify_Taylor_Swift]

In [61]:
# Load the data
data = pd.read_csv("spofify_Taylor_swift.csv")

data = pd.read_csv("https://github.com/gu-dsan6000/fall-2023-reddit-project-team-21/tree/main/data/csv/spotify_Taylor_Swift.csv")

# Print the head of the data
data.head()

Unnamed: 0,artist_name,artist_id,album_id,album_type,album_images,album_release_date,album_release_year,album_release_date_precision,danceability,energy,...,track_name,track_preview_url,track_number,type,track_uri,external_urls.spotify,album_name,key_name,mode_name,key_mode
0,Taylor Swift,06HL4z0CvFAxyc27GXpf02,1o59UpKw81iHR0HPiSkJR0,album,,2023-10-27,2023,day,0.757,0.61,...,Welcome To New York (Taylor's Version),,1,track,spotify:track:4WUepByoeqcedHoYhSNHRt,https://open.spotify.com/track/4WUepByoeqcedHo...,1989 (Taylor's Version) [Deluxe],G,major,G major
1,Taylor Swift,06HL4z0CvFAxyc27GXpf02,1o59UpKw81iHR0HPiSkJR0,album,,2023-10-27,2023,day,0.733,0.733,...,Blank Space (Taylor's Version),,2,track,spotify:track:0108kcWLnn2HlH2kedi1gn,https://open.spotify.com/track/0108kcWLnn2HlH2...,1989 (Taylor's Version) [Deluxe],C,major,C major
2,Taylor Swift,06HL4z0CvFAxyc27GXpf02,1o59UpKw81iHR0HPiSkJR0,album,,2023-10-27,2023,day,0.511,0.822,...,Style (Taylor's Version),,3,track,spotify:track:3Vpk1hfMAQme8VJ0SNRSkd,https://open.spotify.com/track/3Vpk1hfMAQme8VJ...,1989 (Taylor's Version) [Deluxe],B,minor,B minor
3,Taylor Swift,06HL4z0CvFAxyc27GXpf02,1o59UpKw81iHR0HPiSkJR0,album,,2023-10-27,2023,day,0.545,0.885,...,Out Of The Woods (Taylor's Version),,4,track,spotify:track:1OcSfkeCg9hRC2sFKB4IMJ,https://open.spotify.com/track/1OcSfkeCg9hRC2s...,1989 (Taylor's Version) [Deluxe],C,major,C major
4,Taylor Swift,06HL4z0CvFAxyc27GXpf02,1o59UpKw81iHR0HPiSkJR0,album,,2023-10-27,2023,day,0.588,0.721,...,All You Had To Do Was Stay (Taylor's Version),,5,track,spotify:track:2k0ZEeAqzvYMcx9Qt5aClQ,https://open.spotify.com/track/2k0ZEeAqzvYMcx9...,1989 (Taylor's Version) [Deluxe],C,major,C major


# Data Cleaning

In the process of preparing our dataset for analysis, several critical steps were taken to ensure its quality and relevance. We began by examining the data types of each column and addressing missing values. Duplicate rows were also identified and handled accordingly. To streamline our analysis, we focused on data from 2021 to 2023.

Furthermore, specific columns were excluded from our dataset. These exclusions were made purposefully to enhance the focus of our analysis. "artist_id" and "artist_name" were removed as they redundantly contained information about our primary focus, Taylor Swift. "album_images" were excluded, as they were not directly relevant to our goal of understanding public responses to Taylor Swift's music. Additionally, a range of technical and URL-related columns, including "analysis_url," "track_uri," and others, were removed. These columns contained information from the Spotify API that, while useful in other contexts, did not align with our specific research goals of examining how the public perceives Taylor Swift's music. This careful column exclusion allowed us to concentrate on key information essential to our research.

In [62]:
# Print the data types of each column in the data frame 'data'
data.dtypes

# Check for null values in the data frame
data.isnull().any()

# Check for duplicate rows in the data frame
data[data.duplicated()]

# Subset the data from 2021
data = data[(data['album_release_year'] >= 2021) & (data['album_release_year'] <= 2023)]

In [63]:
# List of columns to remove
columns_to_remove = ["artist_id", "artist_name", "album_type", "album_images", "album_release_date_precision",
                     "analysis_url", "artists", "available_markets", "disc_number", "track_href", "is_local",
                     "track_preview_url", "track_number", "type", "track_uri", "external_urls.spotify"]

# Remove the specified columns from the 'data' data frame
data = data.drop(columns=columns_to_remove)

# Assuming the specified column order
column_order = ["album_id", "track_id", "track_name", "album_name", "album_release_date", "album_release_year",
                "danceability", "energy", "key", "loudness", "mode", "speechiness", "acousticness",
                "instrumentalness", "liveness", "valence", "tempo", "time_signature", "duration_ms", "explicit",
                "key_name", "mode_name", "key_mode"]

# Create a new data frame with columns in the specified order
data = data[column_order]

# Select only the numeric columns for basic statistics
numeric_data = data.select_dtypes(include=['number'])

# Remove the "Year" column from numeric_data
numeric_data = numeric_data.drop(columns=['album_release_year'])

# Compute basic summary statistics
summary_stats = numeric_data.describe()
# Save summary statistics to a CSV file
summary_stats.to_csv('summary_statistics.csv', index=True)
summary_stats

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,duration_ms
count,194.0,194.0,194.0,194.0,194.0,194.0,194.0,194.0,194.0,194.0,194.0,194.0,194.0
mean,0.59182,0.571149,4.510309,-7.910356,0.92268,0.060171,0.286196,0.008185,0.147087,0.375365,123.576289,3.989691,236445.572165
std,0.106741,0.178326,3.314655,2.981381,0.267789,0.054466,0.307694,0.051625,0.093326,0.208453,29.75659,0.203332,49365.306771
min,0.316,0.131,0.0,-15.489,0.0,0.025,0.000191,0.0,0.054,0.0374,73.942,3.0,146436.0
25%,0.51525,0.438,1.25,-9.7425,1.0,0.032725,0.028125,0.0,0.0943,0.19425,100.014,4.0,204852.0
50%,0.6,0.571,5.0,-7.3445,1.0,0.0433,0.157,1e-06,0.1165,0.3595,119.0245,4.0,231475.0
75%,0.66075,0.71175,7.0,-5.79125,1.0,0.06265,0.50975,5.1e-05,0.152,0.532,143.99875,4.0,257360.75
max,0.87,0.915,11.0,-1.909,1.0,0.39,0.967,0.488,0.611,0.921,208.918,5.0,613026.0


**Inferences from Summary Statistics**

The summary statistics provide valuable insights into the key audio features of Taylor Swift's songs, enabling us to better understand their characteristics:

 - **Danceability (Mean: 0.592, Std: 0.107):** On average, Taylor Swift's songs exhibit moderate danceability, with a mean value of approximately 0.592. This suggests that her music tends to have a balance between dance-worthy and less energetic elements.

 - **Energy (Mean: 0.571, Std: 0.178):** The energy of her songs averages around 0.571, indicating a moderate level of intensity. This suggests a blend of energetic and calmer tracks within her discography.

 - **Key (Mean: 4.510, Std: 3.315):** The key, which represents the tonal center of the music, varies widely across Taylor Swift's songs. The mean key value of 4.510 implies a diverse range of tonalities in her music.

 - **Loudness (Mean: -7.910 dB, Std: 2.981 dB):** Taylor Swift's songs have an average loudness level of approximately -7.910 dB. This information helps us understand the overall volume and dynamic range of her tracks.

 - **Speechiness (Mean: 0.0602, Std: 0.0545):** With an average speechiness value of 0.0602, Taylor Swift's songs tend to have a low presence of spoken words relative to singing.

 - **Acousticness (Mean: 0.286, Std: 0.308):** The average acousticness of her songs is around 0.286, suggesting that a significant portion of her music includes acoustic elements.

 - **Instrumentalness (Mean: 0.0082, Std: 0.0516):** Her songs generally have low instrumentalness, with a mean value of 0.0082, indicating a strong presence of vocals and lyrics.

 - **Liveness (Mean: 0.147, Std: 0.0933):** Taylor Swift's music typically exhibits a moderate level of liveness, with a mean value of 0.147, indicating a mix of studio and live recordings.

 - **Valence (Mean: 0.375, Std: 0.208):** The valence, representing the positivity or happiness of the music, averages at 0.375. This suggests that her songs cover a spectrum of emotional tones.

 - **Tempo (Mean: 123.576 BPM, Std: 29.757 BPM):** The average tempo of her songs is approximately 123.576 beats per minute, indicating a diverse tempo range across her music.

 - **Time Signature (Most Frequent: 4/4):** The most frequent time signature in her songs is 4/4, a common time signature in popular music, indicating a standard rhythmic structure.

 - **Duration (Mean: 236,446 ms, Std: 49,365 ms):** The average duration of her songs is around 236,446 milliseconds, or approximately 3.94 minutes. This provides insight into the typical length of her tracks.


In [64]:
# Select relevant columns for correlation analysis
selected_df = data[["danceability", "energy", "key", "loudness", "speechiness", "acousticness", "instrumentalness",
                    "liveness", "valence", "tempo"]]

# Calculate the correlation matrix
correlation_matrix = selected_df.corr()

# Create a correlation heatmap using plotly
heatmap = go.Figure(data=go.Heatmap(z=correlation_matrix.values,
                                    x=correlation_matrix.columns,
                                    y=correlation_matrix.columns,
                                    colorscale="GnBu"))

# Customize the layout and center-align the title
heatmap.update_layout(
    title="Correlation Heatmap of Taylor Swift Audio Features",
    xaxis_title="Audio Features",
    yaxis_title="Audio Features",
    font=dict(family="Arial", size=12),
    width=700,
    height=500,
    title_x=0.5,  # Set the title's horizontal position to the middle (0 to 1)
)


heatmap.show()

# Save the correlation heatmap to an HTML file
heatmap.write_html("spotify_heatmap_plotly.html")

# Display the HTML file (optional)
from IPython.display import IFrame
IFrame(src="spotify_heatmap_plotly.html", width="100%", height=600)

correlation_matrix


The correlation heatmap analysis reveals intriguing relationships among the audio features of Taylor Swift's songs. Notably, there is a strong positive correlation between energy and loudness, indicating that her more energetic tracks tend to be louder, a connection that aligns with musical intuition. Conversely, a compelling negative correlation emerges between speechiness and acousticness, implying a trade-off between spoken content and the presence of acoustic instrumentation. This suggests that when her music leans towards spoken word elements, it tends to reduce its reliance on acoustic qualities. Additionally, the moderate positive correlation between valence (positivity) and energy underscores that her songs with higher energy levels tend to convey a more positive emotional tone, a dynamic that resonates with listeners seeking uplifting musical experiences. Furthermore, the correlation heatmap highlights the interplay between danceability and tempo, revealing that while faster-paced songs are less danceable, slower-paced tracks tend to invite more danceability—an observation that adds depth to the understanding of her music's rhythmic characteristics. 

In [65]:


# Assuming you already have the 'data' DataFrame loaded

# Select columns to exclude
exclude_columns = ['track_id', 'track_name', 'album_name', 'key_name', 'key', 'loudness', 'tempo',
                   'duration_ms', 'album_id', 'album_release_date', 'album_release_year', 'mode_name',
                   'key_mode', 'mode', 'time_signature', 'explicit']

# Calculate the mean of remaining columns
mean_track_features = data.drop(columns=exclude_columns).mean()

# Create a scatter polar plot
fig = go.Figure(data=go.Scatterpolar(
    r=[0.5918196, 0.5918196, 0.06017113, 0.2861963, 0.008184635, 0.1470871, 0.3753655, 0.5918196],
    theta=['danceability', 'energy', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'danceability'],
    fill='toself',
    fillcolor='rgba(212, 239, 223, 1)',
    marker=dict(color='rgba(40, 180, 99, 1)')
))

# Customize the layout
fig.update_layout(
    title="Music Characteristics of Taylor Swift",
    title_x=0.5,  # Set the title's horizontal position to the middle (0 to 1)
    title_y=0.95, 
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, 1]
        )
    ),
    showlegend=False
)

# Show the radar plot
fig.show()

# Save the radar plot to an HTML file
pio.write_html(fig, "radar_plotly.html")

# Display the HTML file (optional)
from IPython.display import IFrame
IFrame(src="radar_plotly.html", width="100%", height=600)


The radar plot showcasing the mean values of Taylor Swift's music characteristics unveils key insights into her distinctive musical style. Taylor Swift's songs, on average, exhibit a remarkable combination of attributes that resonate with her audience. Notably, her music is marked by a lively and energetic vibe, with high scores in danceability, energy, and tempo, signifying an innate ability to craft tracks that inspire movement and enthusiasm. Furthermore, her compositions consistently maintain a positive and joyful emotional tone, as indicated by the elevated valence score.

Despite the emphasis on energy and liveliness, Taylor Swift artfully balances her musical narratives with lower speechiness and instrumentalness scores, highlighting her dedication to melodic storytelling and lyrical depth. The moderate levels of acousticness and liveness showcase her versatility in blending digital and acoustic elements while preserving a sense of dynamism. Altogether, these characteristics illuminate Taylor Swift's enduring appeal as an artist who not only creates chart-topping hits but also crafts emotionally resonant songs that connect with a diverse and devoted fanbase worldwide.

In [66]:


# Assuming you already have the 'data' DataFrame loaded

# Function to map energy and valence to emotion categories and names
def map_emotion(energy, valence):
    if energy >= 0.5 and valence >= 0.5:
        return "Happy/Joyful"
    elif energy < 0.5 and valence >= 0.5:
        return "Relaxed/Calm"
    elif energy >= 0.5 and valence < 0.5:
        return "Angry/Tense"
    else:
        return "Sad/Depressing"

# Add the 'emotion' column to the data frame
data['emotion'] = data.apply(lambda row: map_emotion(row['energy'], row['valence']), axis=1)

# Define custom color mapping for emotions
color_mapping = {
    "Happy/Joyful": "#1ABC9C",
    "Relaxed/Calm": "#A2D9CE",
    "Angry/Tense": "#148F77",
    "Sad/Depressing": "#0E6251"
}

# Create a scatter plot of energy vs valence with color-coded emotions
fig = px.scatter(data, x='energy', y='valence', color='emotion',
                 hover_data=['track_name', 'album_name', 'album_release_year'],
                 title='Taylor Swift Songs Emotion',
                 color_discrete_map=color_mapping)

# Customize marker size and opacity
fig.update_traces(marker=dict(size=10, opacity=0.7))

# Add horizontal and vertical dashed lines
fig.add_hline(y=0.5, line_dash="dash", line_color="black")
fig.add_vline(x=0.5, line_dash="dash", line_color="black")

# Center-align the title
fig.update_layout(
    title_x=0.5,  # Set the title's horizontal position to the middle (0 to 1)
)

# Show the scatter plot
fig.show()

# Save the emotion scatter plot to an HTML file
fig.write_html("emotion_plotly.html")

# Display the HTML file (optional)
IFrame(src="emotion_plotly.html", width="100%", height=600)


The emotion plot vividly illustrates the dominant emotional themes present in Taylor Swift's music catalog. It's evident that her songs predominantly evoke feelings of joy and intensity, categorized as "Happy/Joyful" and "Angry/Tense." These emotions align with her ability to craft catchy, energetic tracks that connect with listeners on an emotional level. However, Taylor Swift's artistry goes beyond surface emotions, as she skillfully explores deeper, introspective themes, as indicated by the substantial representation of "Sad/Depressing" songs. While "Angry/Tense" songs were prominent in albums from various years, the year 2022 stood out for the prevalence of "Sad/Depressing" songs in albums like "Midnights" suggesting a thematic exploration of deeper and more introspective emotions during that period.


A closer look at the breakdown of song counts by album and year reveals an intriguing pattern. Each album appears to be a journey through a range of emotions, with some albums emphasizing specific emotional tones. This emotional diversity is a testament to Taylor Swift's songwriting prowess, allowing her to create music that resonates with a wide audience, whether it's uplifting anthems, fiery expressions of emotion, or poignant reflections on life and love. Ultimately, this emotional depth is a defining feature of her music, making her a beloved and relatable artist across generations.