# Spotify Wrapped Dupe

### Outline
Wrapped has gotten kinda mid, and i realized it would be pretty easy to recreate, so thats what I'm going to do.

### Want to find...
**Songs**
- Total minutes streamed
- Number of songs streamed
- Times top song was streamed
- Top 5 songs

**Albums**
- Total number of albums
- Minutes top album was streamed
- Top 5 albums

**Artists**
- Total number of artists streamed
- Top 5 artists
- Minutes top artist was streamed

Wrapped also gave some other weird insights this year that nobody really knew what to do with (listening age, clubs?) so we're gonna leave that out for now.

Genres would be cool to look at also, but I'd need to connect to API for that, I can think about that for a later date.

## Setup

In [8]:
import pandas as pd

df = pd.read_csv("streams_raw.csv")
df["ts"] = pd.to_datetime(df["ts"])

df.info()

print("")
print("max date:")
print(max(df['ts']))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365072 entries, 0 to 365071
Data columns (total 16 columns):
 #   Column             Non-Null Count   Dtype              
---  ------             --------------   -----              
 0   ts                 365072 non-null  datetime64[ns, UTC]
 1   platform           365072 non-null  object             
 2   s_played           365072 non-null  float64            
 3   conn_country       365072 non-null  object             
 4   ip_addr            365072 non-null  object             
 5   track              363920 non-null  object             
 6   artist             363920 non-null  object             
 7   album              363920 non-null  object             
 8   spotify_track_uri  363920 non-null  object             
 9   reason_start       364924 non-null  object             
 10  reason_end         364932 non-null  object             
 11  shuffle            365072 non-null  bool               
 12  skipped            365072 non-

We only want one year of data, gonna take Aug 1 2024 - July 31 2025 bc thats what i have lol.

In [2]:
start_date = "2024-08-01"
end_date = "2025-08-01"

filtered = df.loc[
    (df['ts'] >= start_date) &
    (df['ts']  <= end_date)].copy()

filtered['ts'].min(), filtered['ts'].max(), len(filtered)

(Timestamp('2024-08-01 01:10:57+0000', tz='UTC'),
 Timestamp('2025-07-31 21:43:38+0000', tz='UTC'),
 38140)

Now that we have our filtered dataframe we can start actually finding what we need.

## Songs
- Total minutes streamed
- Number of songs streamed
- Times top song was streamed
- Top 5 songs

In [3]:
# tot minutes streamed
tot_minutes_streamed = sum(filtered['s_played']/60)

# number of songs
unique_songs = filtered['track'].nunique()

# top 5 songs
# According to Spotify, a stream is counted when a listener plays a song for at least 30 seconds
# So top 5 songs are counted by how many times each track was streamed over 30 sec

streams = filtered.loc[(filtered['s_played'] >= 30)].copy()

top_5_songs_streams = (
    streams.groupby('track')
    .size()
    .sort_values(ascending = False)
    .head(5)
)

## Albums
- Total number of albums
- Minutes top album was streamed
- Top 5 albums

In [4]:
# total albums
unique_albums = filtered['album'].nunique()

# top 5 albums (by tot minutes streamed)
top_5_albums_minutes = (
    filtered.groupby('album')['s_played']
    .sum()
    .div(60)
    .round()
    .sort_values(ascending=False)
    .head(5)
)

## Artists
- Total number of artists streamed
- Top 5 artists
- Minutes top artist was streamed

In [5]:
# total artists
unique_artists = filtered['artist'].nunique()

# top artists
top_5_artists = (
    filtered.groupby('artist')['s_played']
    .sum()
    .div(60)
    .round()
    .sort_values(ascending=False)
    .head(5)
)

## Output:

In [6]:
print(f"date range: {start_date} - {end_date}")
print("")
print("You listened to a lot of music this year! Lets take a look at your favourites.")
print("")
print(f"You spent {tot_minutes_streamed:.0f} minutes listening to {unique_songs} different songs! What a diverse range!")
print("")
print("Even with that many tracks, these 5 stood out...")
print(top_5_songs_streams)
print("")
print("")
print(f'You listened to {unique_albums} albums! These were your favourites:')
print("")
print(top_5_albums_minutes)
print("")
print("")
print(f'You also listened to {unique_artists} different artists this year. Your favourites were:')
print(top_5_artists)

date range: 2024-08-01 - 2025-08-01

You listened to a lot of music this year! Lets take a look at your favourites.

You spent 67782 minutes listening to 8556 different songs! What a diverse range!

Even with that many tracks, these 5 stood out...
track
Craving 4 U (feat. bbyclose)    68
The Days - NOTION Remix         67
White Noise                     55
misses                          49
It's Not Right But It's Okay    48
dtype: int64


You listened to 5389 albums! These were your favourites:

album
Sunburn                            710.0
White Noise 3 Hour Long            649.0
channel ORANGE                     595.0
THE FIRST TIME (DELUXE VERSION)    571.0
Because the Internet               544.0
Name: s_played, dtype: float64


You also listened to 2468 different artists this year. Your favourites were:
artist
Drake           2777.0
Mac Miller      1841.0
The Weeknd      1637.0
A$AP Rocky      1589.0
Dominic Fike    1281.0
Name: s_played, dtype: float64
