## **Neccessary Imports**

In [1]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

import warnings 
warnings.filterwarnings("ignore")

## **Dataset analysis**

In [2]:
spotify_data = pd.read_csv(r'/kaggle/input/top-spotify-songs-in-countries/spotify_history.csv')
spotify_data_cleaned = spotify_data.copy()

In [3]:
spotify_data.head()

Unnamed: 0,spotify_track_uri,ts,platform,ms_played,track_name,artist_name,album_name,reason_start,reason_end,shuffle,skipped
0,2J3n32GeLmMjwuAzyhcSNe,2013-07-08 02:44:34,web player,3185,"Say It, Just Say It",The Mowgli's,Waiting For The Dawn,autoplay,clickrow,False,False
1,1oHxIPqJyvAYHy0PVrDU98,2013-07-08 02:45:37,web player,61865,Drinking from the Bottle (feat. Tinie Tempah),Calvin Harris,18 Months,clickrow,clickrow,False,False
2,487OPlneJNni3NWC8SYqhW,2013-07-08 02:50:24,web player,285386,Born To Die,Lana Del Rey,Born To Die - The Paradise Edition,clickrow,unknown,False,False
3,5IyblF777jLZj1vGHG2UD3,2013-07-08 02:52:40,web player,134022,Off To The Races,Lana Del Rey,Born To Die - The Paradise Edition,trackdone,clickrow,False,False
4,0GgAAB0ZMllFhbNc3mAodO,2013-07-08 03:17:52,web player,0,Half Mast,Empire Of The Sun,Walking On A Dream,clickrow,nextbtn,False,False


In [4]:
print(spotify_data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149860 entries, 0 to 149859
Data columns (total 11 columns):
 #   Column             Non-Null Count   Dtype 
---  ------             --------------   ----- 
 0   spotify_track_uri  149860 non-null  object
 1   ts                 149860 non-null  object
 2   platform           149860 non-null  object
 3   ms_played          149860 non-null  int64 
 4   track_name         149860 non-null  object
 5   artist_name        149860 non-null  object
 6   album_name         149860 non-null  object
 7   reason_start       149717 non-null  object
 8   reason_end         149743 non-null  object
 9   shuffle            149860 non-null  bool  
 10  skipped            149860 non-null  bool  
dtypes: bool(2), int64(1), object(8)
memory usage: 10.6+ MB
None


In [5]:
print(spotify_data.isnull().sum())

spotify_track_uri      0
ts                     0
platform               0
ms_played              0
track_name             0
artist_name            0
album_name             0
reason_start         143
reason_end           117
shuffle                0
skipped                0
dtype: int64


**Insights**
1. The timestamp(ts) column needs to be converted to Datetime formate from the object format.
2. There are several nulls or empty columns in the reason_start and reason_end.
3. We can just drop spotify_track_uri as it doesn't provide much information.
4. Filter out rows where ms_played = 0.
5. We can create some new timebased feature like (hours, day, weekday, month, year, date) using the timestamp column.

## **Cleaning the Dataset**

In [6]:
spotify_data_cleaned['ts'] = pd.to_datetime(spotify_data_cleaned['ts'])

spotify_data_cleaned_cleaned = spotify_data_cleaned[spotify_data_cleaned['ms_played'] > 0].reset_index(drop=True)

# 3. Create new time-based features
spotify_data_cleaned['hour'] = spotify_data_cleaned['ts'].dt.hour
spotify_data_cleaned['day'] = spotify_data_cleaned['ts'].dt.day
spotify_data_cleaned['weekday'] = spotify_data_cleaned['ts'].dt.day_name()
spotify_data_cleaned['month'] = spotify_data_cleaned['ts'].dt.month
spotify_data_cleaned['year'] = spotify_data_cleaned['ts'].dt.year
spotify_data_cleaned['date'] = spotify_data_cleaned['ts'].dt.date

spotify_data_cleaned = spotify_data_cleaned.dropna(subset=['reason_start', 'reason_end']).reset_index(drop=True)


In [7]:
spotify_data_cleaned.head()

Unnamed: 0,spotify_track_uri,ts,platform,ms_played,track_name,artist_name,album_name,reason_start,reason_end,shuffle,skipped,hour,day,weekday,month,year,date
0,2J3n32GeLmMjwuAzyhcSNe,2013-07-08 02:44:34,web player,3185,"Say It, Just Say It",The Mowgli's,Waiting For The Dawn,autoplay,clickrow,False,False,2,8,Monday,7,2013,2013-07-08
1,1oHxIPqJyvAYHy0PVrDU98,2013-07-08 02:45:37,web player,61865,Drinking from the Bottle (feat. Tinie Tempah),Calvin Harris,18 Months,clickrow,clickrow,False,False,2,8,Monday,7,2013,2013-07-08
2,487OPlneJNni3NWC8SYqhW,2013-07-08 02:50:24,web player,285386,Born To Die,Lana Del Rey,Born To Die - The Paradise Edition,clickrow,unknown,False,False,2,8,Monday,7,2013,2013-07-08
3,5IyblF777jLZj1vGHG2UD3,2013-07-08 02:52:40,web player,134022,Off To The Races,Lana Del Rey,Born To Die - The Paradise Edition,trackdone,clickrow,False,False,2,8,Monday,7,2013,2013-07-08
4,0GgAAB0ZMllFhbNc3mAodO,2013-07-08 03:17:52,web player,0,Half Mast,Empire Of The Sun,Walking On A Dream,clickrow,nextbtn,False,False,3,8,Monday,7,2013,2013-07-08


In [8]:
spotify_data_cleaned.isnull().sum()

spotify_track_uri    0
ts                   0
platform             0
ms_played            0
track_name           0
artist_name          0
album_name           0
reason_start         0
reason_end           0
shuffle              0
skipped              0
hour                 0
day                  0
weekday              0
month                0
year                 0
date                 0
dtype: int64

# **EDA**

## **Track-Level Insights**

In [9]:
print('-' * 90)
avg_listen_time_per_track = (spotify_data_cleaned.groupby('track_name')['ms_played'].mean().sort_values(ascending=False)/ 60000).reset_index(name='avg_minutes_played')
num_unique_tracks = spotify_data_cleaned['track_name'].nunique()
num_unique_artist = spotify_data_cleaned['artist_name'].nunique()
num_unique_album = spotify_data_cleaned['album_name'].nunique()

print(f'Total Tracks: {len(spotify_data_cleaned)}')
print(f'Unique Artists: {num_unique_artist}')
print(f'Unique Albums: {num_unique_album}')
print(f'Unique Tracks: {num_unique_tracks}')
print(f'Average play time per track: {avg_listen_time_per_track["avg_minutes_played"].sum():.1f} minutes')
print(f"Platform Used: {spotify_data_cleaned['platform'].unique()}")
print('-' * 90)

------------------------------------------------------------------------------------------
Total Tracks: 149648
Unique Artists: 4105
Unique Albums: 7934
Unique Tracks: 13801
Average play time per track: 33084.1 minutes
Platform Used: ['web player' 'windows' 'android' 'iOS' 'cast to device' 'mac']
------------------------------------------------------------------------------------------


### 1. Most played tracks

In [10]:
top_tracks = (
    spotify_data_cleaned.groupby(['track_name', 'artist_name'], as_index=False)['ms_played']
    .sum()
    .sort_values('ms_played', ascending=False)
    .head(10)
)

top_tracks['short_track'] = top_tracks['track_name'].str.slice(0, 20) + '...'
top_tracks['track_display'] = top_tracks['track_name'] + ' - ' + top_tracks['artist_name']

fig = px.bar(
    data_frame=top_tracks,
    x='short_track',
    y='ms_played',
    hover_name='track_display',
    title='Top 10 Most Played Tracks',
    color='ms_played',
    text='ms_played',
    labels={'track_display': 'Track - Artist', 'ms_played': 'Total Milliseconds Played'}
)

fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(
    xaxis_title='Tracks',
    yaxis_title='MS Played',
    template='plotly_dark',
    xaxis_tickangle=-45,
    height=600
)

fig.show(renderer='iframe')


Franchise Power: <br>
* Multiple tracks from The Lord of the Rings suggest strong fan engagement with cinematic soundtracks.

Diverse Appeal: <br>
* Mix of genres — from rock to Latin to instrumental — shows varied user tastes.

Top 3 Dominate: <br>
* The top two tracks alone account for over 130M milliseconds — a significant chunk of total playtime.

### 2. Longest Duration Tracks

In [11]:
avg_listen_time_per_track = (
    spotify_data_cleaned.groupby(['track_name', 'artist_name'], as_index=False)['ms_played']
    .mean()
    .sort_values(by='ms_played', ascending=False)
)
avg_listen_time_per_track['avg_minutes_played'] = avg_listen_time_per_track['ms_played'] / 60000

avg_listen_time_per_track['short_track'] = avg_listen_time_per_track['track_name'].apply(lambda x: x[:25] + '...' if len(x) > 28 else x)
avg_listen_time_per_track['track_display'] = avg_listen_time_per_track['track_name'] + ' - ' + avg_listen_time_per_track['artist_name']

top_avg_tracks = avg_listen_time_per_track.head(10)

fig = px.bar(
    data_frame=top_avg_tracks,
    x='short_track',
    y='avg_minutes_played',
    hover_name='track_display',
    title='Average Listening Time Per Track',
    color='avg_minutes_played',
    text='avg_minutes_played',
    labels={'short_track': 'Track', 'avg_minutes_played': 'Avg Minutes Played'}
)

fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig.update_layout(
    xaxis_title='Tracks',
    yaxis_title='Avg Minutes Played',
    template='plotly_dark',
    xaxis_tickangle=-45,
    height=600
)

fig.show(renderer='iframe')


Long-Form Tracks Win: <br>
* Extended compositions like “Tubular Bells” and “Terrapin Station” hold attention longer.

Live & Classical Performances Engage: <br>
* Tracks with rich instrumentation or live energy show strong retention.

Emotional & Narrative Depth Matters: <br>
* Songs like “We Three” and “Mortal Man” suggest listeners value storytelling.

## **Time-based Usage**

### 1.  Streaming Activity by Hour

In [12]:
hourly_minutes = (
    spotify_data_cleaned.groupby('hour')['ms_played']
    .sum()
    .reset_index(name='total_ms_played')
)

hourly_minutes['minutes_played'] = hourly_minutes['total_ms_played'] / 60000

fig = px.line(
    hourly_minutes,
    x='hour',
    y='minutes_played',
    title='Total Listening Time by Hour of Day',
    labels={'hour': 'Hour of Day', 'minutes_played': 'Total Minutes Played'},
    template='plotly_dark',
)

fig.update_traces(mode='lines+markers', text=hourly_minutes['minutes_played'].round(1), textposition='top center')

fig.update_layout(xaxis=dict(dtick=1), height=500)
fig.show(renderer='iframe')


* Peak Hours:<br>
Listening sharply increases between 16:00 and 19:00, with 17:00 and 18:00 showing the highest activity.

---

* Late Night Activity:<br>
Surprisingly, the early hours (00:00–03:00) also show relatively high playtime — likely from night owls or sleep/ambient sessions.

---

* Low Engagement Period:<br>
Drops significantly between 08:00–12:00, possibly due to school/work hours.

### 2.  Streaming Activity by Day of Week

In [13]:
weekday_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

spotify_data_cleaned['weekday'] = pd.Categorical(spotify_data_cleaned['weekday'], categories=weekday_order, ordered=True)
weekday_activity = spotify_data_cleaned.groupby('weekday').size().reset_index(name='play_count')

fig = px.line(
    weekday_activity,
    x='weekday',
    y='play_count',
    title='Streaming Activity by Day of Week',
    template='plotly_dark'
)

fig.update_traces(mode='lines+markers')
fig.show(renderer='iframe')

Friday Peak:<br>
* Friday has the highest streaming activity, suggesting users are more engaged heading into the weekend — perhaps due to free time, parties, or winding down the week.

Weekend Drop-off:<br>
Saturday and Sunday show the lowest activity. This may seem counterintuitive, but it might indicate:

* Less structured routine = less music through the app (e.g., more socializing, travel).

* Users switching to other platforms or devices not tracked.

Midweek Consistency:<br>
* Monday through Thursday are fairly balanced, with a small dip on Tuesday and a bump on Wednesday.

### 3.  Streaming Activity by Month

In [14]:
monthly_activity = spotify_data_cleaned.groupby('month').size().reset_index(name='play_count')

fig = px.line(
    monthly_activity,
    x='month',
    y='play_count',
    title='Streaming Activity by Month',
    template='plotly_dark',
)

fig.update_traces(mode='lines+markers')
fig.show(renderer='iframe')

August & September Peaks

Streaming activity hits its highest in **September**, with **August** close behind. This might be due to:

- Back-to-school routines boosting structured habits  
- Fewer holidays = more consistent listening patterns  
- Possible surge in new music releases during this period  

---

February Dip

**February** consistently shows the **lowest activity**, likely influenced by:

- Fewer days in the month (only 28 or 29)  
- Post-holiday fatigue or reset periods affecting engagement  

---

Summer Climb (May → August)

A **gradual rise** in streaming from **May through August**, potentially linked to:

- School and college breaks creating more free time  
- Increased travel and leisure fueling playlist consumption  

---

Slight Year-End Decline

A **dip in November and December** may stem from:

- Holiday distractions pulling attention away  
- Switching to offline listening, smart speakers, or alternative platforms  


## **Behaviour Pattern**

### **Skip analysis**

#### 1. Skipped vs Non-Skipped ratio

In [15]:
skiped_counts = spotify_data_cleaned['skipped'].value_counts().reset_index()
skiped_counts.columns = ['skipped', 'Count']

fig = px.pie(
    skiped_counts,
    values='Count',
    names='skipped',
    title='Skipped vs Non-skipped Tracks',
    hole=0.4,
    template='plotly_dark',
    color_discrete_sequence=["#1ED62A", "#DD5907"],
    height=500,
    width=500
)

fig.show(renderer='iframe')

Non-skipped Tracks (false) — **94.8%**

* A vast majority of tracks are listened to without being skipped.

Skipped Tracks (true) — **5.18%**

* A relatively low skip rate, which is a good sign.

#### 2. Number of Skips per Songs

In [16]:
top_skipped_tracks = (
    spotify_data_cleaned[spotify_data_cleaned['skipped'] == True]
    .groupby(['track_name', 'artist_name'], as_index=False)
    .size()
    .sort_values(by='size', ascending=False)
    .rename(columns={'size': 'skip_count'})
)

top_skipped_tracks['track_display'] = top_skipped_tracks['track_name'] + ' - ' + top_skipped_tracks['artist_name']
top_skipped_tracks['short_track'] = top_skipped_tracks['track_name'].apply(lambda x: x[:25] + '...' if len(x) > 28 else x)

fig = px.bar(
    top_skipped_tracks.head(10),
    x='short_track',
    y='skip_count',
    text='skip_count',
    hover_name='track_display',
    title='Top Skipped Tracks',
    template='plotly_dark',
    labels={'short_track': 'Track', 'skip_count': 'Skips'},
    color='skip_count'
)

fig.update_traces(textposition='outside')
fig.update_layout(xaxis_tickangle=-45, height=600)
fig.show(renderer='iframe')


* **Paraiso** stands out with a significantly higher number of skips.

* Most other tracks cluster between 14–21 skips.

#### 3. Artists with highest skip Ratio

In [17]:
total_plays_per_artist = spotify_data_cleaned.groupby('artist_name').size().reset_index(name='total_plays')

skips_per_artist = spotify_data_cleaned[spotify_data_cleaned['skipped'] == True] \
    .groupby('artist_name').size().reset_index(name='skips')

artist_skip_ratio = pd.merge(skips_per_artist, total_plays_per_artist, on='artist_name')
artist_skip_ratio['skip_ratio'] = artist_skip_ratio['skips'] / artist_skip_ratio['total_plays']

# Top 10 by skip ratio
top_skip_ratio = artist_skip_ratio[artist_skip_ratio['total_plays'] >= 30] \
    .sort_values('skip_ratio', ascending=False).head(10)


In [18]:
fig = px.bar(
    top_skip_ratio,
    x='artist_name',
    y='skip_ratio',
    title='Artists with Highest Skip Ratio',
    template='plotly_dark',
    text='skip_ratio',
    labels={'artist_name': 'Artist', 'skip_ratio': 'Skip Ratio'},
    color='skip_ratio',
    height=600
)
fig.update_traces(texttemplate='%{text:.2%}', textposition='outside')
fig.update_layout(xaxis_tickangle=-45)
fig.show(renderer='iframe')


* **Dvicio** and **The Notorious B.I.G.** have extremely high skip rates (>80%), indicating that most plays are skipped quickly. These could be accidental plays, playlist fillers, or mismatch with your music taste.

* **One Direction** also has a notable skip rate, suggesting inconsistent interest across their tracks.

* Artists like **Kygo** and **Stevie Wonder** appear lower on the list with ~35% skip ratios, which is relatively moderate.

#### 4. Tracks Frequently Clicked but Immediately Skipped

In [19]:
clicked_but_skipped = spotify_data_cleaned[
    (spotify_data_cleaned['reason_start'] == 'clickrow') &
    (spotify_data_cleaned['ms_played'] == 0)
]

top_clicked_skipped = (
    clicked_but_skipped.groupby(['track_name', 'artist_name'], as_index=False)
    .size()
    .sort_values(by='size', ascending=False)
    .rename(columns={'size': 'immediate_skips'})
)

top_clicked_skipped['track_display'] = top_clicked_skipped['track_name'] + ' - ' + top_clicked_skipped['artist_name']
top_clicked_skipped['short_track'] = top_clicked_skipped['track_name'].apply(lambda x: x[:25] + '...' if len(x) > 28 else x)

fig = px.bar(
    top_clicked_skipped.head(10),
    x='short_track',
    y='immediate_skips',
    text='immediate_skips',
    hover_name='track_display',
    title='Tracks Clicked but Skipped Instantly',
    template='plotly_dark',
    color='immediate_skips',
    labels={'short_track': 'Track', 'immediate_skips': 'Instant Skips'}
)

fig.update_traces(textposition='outside')
fig.update_layout(xaxis_tickangle=-45, height=600)
fig.show(renderer='iframe')


## **Reason Analysis**

### 1. Distribution of reason_start and reason_end

In [20]:
# Frequency of reason_start
start_reason_counts = spotify_data_cleaned['reason_start'].value_counts().reset_index()
start_reason_counts.columns = ['reason_start', 'count']

fig1 = px.bar(
    start_reason_counts,
    x='reason_start',
    y='count',
    title='Frequency of reason_start',
    text='count',
    template='plotly_dark',
    color='count',
    height=540
)
fig1.update_traces(textposition='outside')
fig1.update_layout(xaxis_tickangle=-45)
fig1.show(renderer='iframe')

# Frequency of reason_end
end_reason_counts = spotify_data_cleaned['reason_end'].value_counts().reset_index()
end_reason_counts.columns = ['reason_end', 'count']

fig2 = px.bar(
    end_reason_counts,
    x='reason_end',
    y='count',
    title='Frequency of reason_end',
    text='count',
    template='plotly_dark',
    color='count',
    height=580
)
fig2.update_traces(textposition='outside')
fig2.update_layout(xaxis_tickangle=-45)
fig2.show(renderer='iframe')


#### **Reason Start**
Majority of Tracks Start Automatically (trackdone – **76.5k**): <br>

* Most playback begins when the previous track finishes.

* Indicates users often listen in a continuous flow (playlists, albums, or radio).

High Manual Skipping (fwdbtn – **53.7k**): <br>

* A significant portion of listening involves users hitting “Next.”

* Suggests low listener engagement with many tracks or exploratory behavior.

Direct Click Plays (clickrow – **11.2k**): <br>

* Tracks were explicitly chosen by users.

#### **Reason End**
trackdone:<br>
* Most frequent reason with **77,110** occurrences. Likely indicates users completing a track or activity.

fwdbtn:<br>
* Second highest with **53,462** occurrences. Suggests users actively navigating forward.

endplay:<br>
* Third with **10,116** occurrences. Possibly users manually stopping playback.

User Intent: <br>
* High counts for trackdone and fwdbtn suggest intentional and goal-oriented user behavior.

Experience Gaps: <br>
* Unexpected exits (paused or otherwise) may point to technical issues or user frustration.

Rare Events:<br>
* Low-frequency reasons like trackererror or popup might be edge cases but worth monitoring for anomalies.


### 2. Start reason most associated with skips

In [21]:
skip_rate_by_reason_start = (
    spotify_data_cleaned.groupby('reason_start')['skipped']
    .mean()
    .reset_index()
    .rename(columns={'skipped': 'skip_rate'})
    .sort_values(by='skip_rate', ascending=False)
)

fig = px.bar(
    skip_rate_by_reason_start,
    x='reason_start',
    y='skip_rate',
    text=skip_rate_by_reason_start['skip_rate'].apply(lambda x: f'{x:.2%}'),
    title='Skip Rate by reason_start',
    template='plotly_dark',
    color='skip_rate',
    labels={'reason_start': 'Start Reason', 'skip_rate': 'Skip Rate'},
    height=500
)

fig.update_traces(textposition='outside')
fig.update_layout(xaxis_tickangle=-45)
fig.show(renderer='iframe')


Popups Are Problematic:<br>
* With a 100% skip rate, popups may be disrupting user flow. Consider redesigning or removing them.

Autoplay & Endplay Work Well:<br>
* These passive or completion-based starts retain users effectively.

Manual Actions Vary:<br>
* clickrow, fwdbtn, and backbtn show moderate skip rates — suggesting room for UX improvement.

## **Platform Usage**

### 1. Number of Session by Platforms

In [22]:
platform_counts = spotify_data_cleaned['platform'].value_counts().reset_index()
platform_counts.columns = ['platform', 'session_count']

fig1 = px.bar(
    platform_counts,
    x='platform',
    y='session_count',
    text='session_count',
    title='Number of Sessions by Platform',
    template='plotly_dark',
    color='session_count',
    height=500
)
fig1.update_traces(textposition='outside')
fig1.update_layout(xaxis_tickangle=-45)
fig1.show(renderer='iframe')


Android Is King: <br>
* The overwhelming session count suggests Android should be the focus for optimization, feature rollout, and testing.

iOS Opportunity: <br>
* If parity in features exists, marketing or UX tweaks could boost iOS engagement.

Web Player Underperformance: <br>
* With only 225 sessions, it may need better visibility or functionality improvements.

Cross-Device Potential: <br>
* “Cast to Device” shows users are interested in flexible playback — a good area to expand.

### 2. Skip Rate per Platform

In [23]:
skip_rate_by_platform = (
    spotify_data_cleaned.groupby('platform')['skipped']
    .mean()
    .reset_index()
    .rename(columns={'skipped': 'skip_rate'})
    .sort_values(by='skip_rate', ascending=False)
)

fig2 = px.bar(
    skip_rate_by_platform,
    x='platform',
    y='skip_rate',
    text=skip_rate_by_platform['skip_rate'].apply(lambda x: f'{x:.2%}'),
    title='Skip Rate by Platform',
    color='skip_rate',
    template='plotly_dark',
    labels={'skip_rate': 'Skip Rate'},
    height=500
)
fig2.update_traces(textposition='outside')
fig2.update_layout(xaxis_tickangle=-45)
fig2.show(renderer='iframe')


Windows Needs Attention: 
* The highest skip rate suggests potential friction — investigate UI, performance, or content relevance.

Apple Ecosystem Consistency: 
* iOS and Mac show similar skip behavior, hinting at shared UX patterns.

Android Strength: 
* Low skip rate aligns with its high session count — a well-performing platform.

Casting Wins: 
* Zero skips imply casting is a deliberate and satisfying experience.

Web Player Curiosity:
* Zero skips may be due to low usage or highly targeted sessions — worth exploring further.

## **Shuffle Analysis**

### 1. Average Playtime By Shuffle Mode

In [24]:
avg_playtime_shuffle = (
    spotify_data_cleaned.groupby('shuffle')['ms_played']
    .mean()
    .reset_index()
)

avg_playtime_shuffle['minutes_played'] = avg_playtime_shuffle['ms_played'] / 60000

fig1 = px.pie(
    avg_playtime_shuffle,
    values='minutes_played',
    names='shuffle',
    title='Average Playtime (in Minutes) by Shuffle Mode',
    color_discrete_sequence=["#1ED62A", "#DD5907"],
    template='plotly_dark',
    height=500,
    width=500
)

fig1.update_traces(textinfo='label+percent+value', texttemplate='%{label}: %{value:.2f} min')
fig1.show(renderer='iframe')


Shuffle Reduces Engagement: 
* Users spend significantly less time when shuffle is enabled — possibly due to lack of control or mismatch in content.

Intentional Listening Wins: 
* Non-shuffle sessions suggest users are more invested in specific tracks or sequences.

### 2. Skip Rate By Shuffle Mode

In [25]:
skip_rate_shuffle = (
    spotify_data_cleaned.groupby('shuffle')['skipped']
    .mean()
    .reset_index()
    .rename(columns={'skipped': 'skip_rate'})
)

fig = px.pie(
    skip_rate_shuffle,
    values='skip_rate',
    names='shuffle',
    title='Skip Rate by Shuffle Mode',
    color_discrete_sequence=["#1ED62A", "#DD5907"],
    template='plotly_dark',
    height=500,
    width=500
)

fig.update_traces(textinfo='percent+label')
fig.update_layout(showlegend=True)
fig.show(renderer='iframe')

Shuffle May Disrupt Flow: 
* The higher skip rate suggests that randomized playback might not align with user expectations or mood.

Sequential Listening Is Stickier: 
* Users appear more committed when content follows a set order.