## 📊 Average Session Length by Smart Shuffle Usage

This bar chart shows how session length differs between users who had Smart Shuffle enabled vs. those who didn’t. We're looking to see if the feature drives higher engagement (longer listening sessions).

In [3]:
import pandas as pd

# Load the mock dataset
df = pd.read_csv('./data/mock_smart_shuffle_data.csv')

#preview
df.head()

Unnamed: 0,user_id,region,is_premium,smart_shuffle_enabled,session_length,skips,timestamp
0,25795,North America,False,True,35.712309,3,2025-04-01 00:00:00
1,10860,Latin America,False,False,31.186791,5,2025-04-01 01:00:00
2,86820,North America,False,True,39.954116,1,2025-04-01 02:00:00
3,64886,North America,True,True,25.857079,1,2025-04-01 03:00:00
4,16265,Asia,True,False,36.473327,3,2025-04-01 04:00:00


In [4]:
df.isnull().sum()

user_id                  0
region                   0
is_premium               0
smart_shuffle_enabled    0
session_length           0
skips                    0
timestamp                0
dtype: int64

In [5]:
df.session_length.mean()

32.39731591353088

In [7]:
df.shape

(2000, 7)

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(8,5))
sns.barplot(x = 'smart_shuffle_enabled', y = 'session_length', data = df)

plt.title('Average Length of Session by Smart Shuffle Usage')
plt.xlabel('Smart Shuffle Enabled')
plt.ylabel('Session Length (min)')
plt.xticks([0, 1], ['No', 'Yes'])
plt.tight_layout()
plt.show()

#Insight Smart Shuffle by Session Length 

By the barcharts there is a difference in average session length between users who have smart shuffle enabled (35 minutes) and those who don't (30 minutes). This could be because users who have smart shuffle enabled listen songs that engage them more with the app, if they like the songs they are then more likely to spend more time listening to music. 

In [None]:

region_df = df.groupby('region')['smart_shuffle_enabled'].mean().reset_index()

plt.figure(figsize=(10, 6))
sns.barplot(x='region', y='smart_shuffle_enabled', data=region_df)

plt.title('Smart Shuffle Adoption by Region')
plt.xlabel('Region')
plt.ylabel('% with Smart Shuffle Enabled')
plt.ylim(0, 1) 
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#Insight: Smart Shuffle Adoption by Region

Based on the graphs it seems there is no difference in percent of region that uses Smart Shuffle. All 5 regions have about 50% of users using Smart Shuffle. 

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(x='smart_shuffle_enabled', y= 'skips',data = df)

plt.title('Smart Shuffle enables by Skips used')

#Inight Smart Shuffle by Average Amount of Skips

It is clear by the barchart above that there is a difference in average skips between users who have smart shuffle enables - average of 2 -  vs those who do not - average of 3. This shows smart shuffle can improve the satisfaction of songs played during a session where the user may use less skips since the music is tailored to their liking. 

#T-Test

In [15]:
from scipy import stats

In [16]:
df.columns

Index(['user_id', 'region', 'is_premium', 'smart_shuffle_enabled',
       'session_length', 'skips', 'timestamp'],
      dtype='object')

In [17]:
enabled_skips = df[df['smart_shuffle_enabled'] == 1]['skips']
disabled_skips = df[df['smart_shuffle_enabled'] == 0]['skips']

In [18]:
t_stat, p_value =  stats.ttest_ind(enabled_skips, disabled_skips, equal_var=False)
print("T-statsitic: ", t_stat)
print("P-value: ", p_value)

T-statsitic:  -8.843118069936338
P-value:  2.0014527241221168e-18


###Insight Results from T-Test
The difference in means between skips for users with smart shuffle is vastly different than users without smart shuffle.
The P-value is very small (<0.05) meaning there is strong statistical evidence that the smart shuffle feature reduces the average amount of skips per session, likley improving user satisfaction with song reccommendations. 