# Spotify Data Analysis Project
#### Description: 
In this project, you will analyze a dataset containing information about various songs on Spotify. The dataset provides a range of attributes for each song, including details like the artist, album, release date, genre, and various audio features (such as danceability, energy, loudness, and tempo). This analysis will help uncover trends and insights in the music industry, particularly in how certain attributes contribute to a song's popularity on Spotify.

#### Aim:
The primary aim of this project is to perform an Exploratory Data Analysis (EDA) on the Spotify music dataset to:
- Identify Trends: Analyze trends in the music industry over time, such as how the popularity of songs has changed or how musical attributes have evolved.
- Correlate Features: Investigate the relationships between different audio features (e.g., energy, tempo) and their influence on a song’s popularity.
- Visualize Insights: Create interactive and informative visualizations using Python and Streamlit to effectively communicate the findings.
- Dashboard Creation: Develop a well-designed dashboard that allows users to explore the data and insights interactively

### (1.) Import Relevant Libraries/Packages

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

### (2.) Load Original Data and Data Exploration

In [3]:
# Load the data
df = pd.read_csv('data/spotify_data_original.csv', encoding='latin1')
df.head()

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
0,Seven (feat. Latto) (Explicit Ver.),"Latto, Jung Kook",2,2023,7,14,553,147,141381703,43,...,125,B,Major,80,89,83,31,0,8,4
1,LALA,Myke Towers,1,2023,3,23,1474,48,133716286,48,...,92,C#,Major,71,61,74,7,0,10,4
2,vampire,Olivia Rodrigo,1,2023,6,30,1397,113,140003974,94,...,138,F,Major,51,32,53,17,0,31,6
3,Cruel Summer,Taylor Swift,1,2019,8,23,7858,100,800840817,116,...,170,A,Major,55,58,72,11,0,11,15
4,WHERE SHE GOES,Bad Bunny,1,2023,5,18,3133,50,303236322,84,...,144,A,Minor,65,23,80,14,63,11,6


In [5]:
# Information about the data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 953 entries, 0 to 952
Data columns (total 24 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   track_name            953 non-null    object
 1   artist(s)_name        953 non-null    object
 2   artist_count          953 non-null    int64 
 3   released_year         953 non-null    int64 
 4   released_month        953 non-null    int64 
 5   released_day          953 non-null    int64 
 6   in_spotify_playlists  953 non-null    int64 
 7   in_spotify_charts     953 non-null    int64 
 8   streams               953 non-null    object
 9   in_apple_playlists    953 non-null    int64 
 10  in_apple_charts       953 non-null    int64 
 11  in_deezer_playlists   953 non-null    object
 12  in_deezer_charts      953 non-null    int64 
 13  in_shazam_charts      903 non-null    object
 14  bpm                   953 non-null    int64 
 15  key                   858 non-null    ob

In [6]:
# Statistical summary of the data
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
artist_count,953.0,1.556139,0.893044,1.0,1.0,1.0,2.0,8.0
released_year,953.0,2018.238195,11.116218,1930.0,2020.0,2022.0,2022.0,2023.0
released_month,953.0,6.033578,3.566435,1.0,3.0,6.0,9.0,12.0
released_day,953.0,13.930745,9.201949,1.0,6.0,13.0,22.0,31.0
in_spotify_playlists,953.0,5200.124869,7897.60899,31.0,875.0,2224.0,5542.0,52898.0
in_spotify_charts,953.0,12.009444,19.575992,0.0,0.0,3.0,16.0,147.0
in_apple_playlists,953.0,67.812172,86.441493,0.0,13.0,34.0,88.0,672.0
in_apple_charts,953.0,51.908709,50.630241,0.0,7.0,38.0,87.0,275.0
in_deezer_charts,953.0,2.666317,6.035599,0.0,0.0,0.0,2.0,58.0
bpm,953.0,122.540399,28.057802,65.0,100.0,121.0,140.0,206.0


In [7]:
# Statistical summary of the data for the object data type
df.describe(include='object').transpose()

Unnamed: 0,count,unique,top,freq
track_name,953,943,Flowers,2
artist(s)_name,953,645,Taylor Swift,34
streams,953,949,395591396,2
in_deezer_playlists,953,348,0,24
in_shazam_charts,903,198,0,344
key,858,11,C#,120
mode,953,2,Major,550


In [8]:
# Check for the missing values
df.isnull().sum()

track_name               0
artist(s)_name           0
artist_count             0
released_year            0
released_month           0
released_day             0
in_spotify_playlists     0
in_spotify_charts        0
streams                  0
in_apple_playlists       0
in_apple_charts          0
in_deezer_playlists      0
in_deezer_charts         0
in_shazam_charts        50
bpm                      0
key                     95
mode                     0
danceability_%           0
valence_%                0
energy_%                 0
acousticness_%           0
instrumentalness_%       0
liveness_%               0
speechiness_%            0
dtype: int64

In [11]:
# check for the duplicate rows
df.duplicated().sum()

np.int64(0)

In [12]:
# columns in the data
df.columns

Index(['track_name', 'artist(s)_name', 'artist_count', 'released_year',
       'released_month', 'released_day', 'in_spotify_playlists',
       'in_spotify_charts', 'streams', 'in_apple_playlists', 'in_apple_charts',
       'in_deezer_playlists', 'in_deezer_charts', 'in_shazam_charts', 'bpm',
       'key', 'mode', 'danceability_%', 'valence_%', 'energy_%',
       'acousticness_%', 'instrumentalness_%', 'liveness_%', 'speechiness_%'],
      dtype='object')

In [13]:
# top 5 artists with the most songs
df['artist(s)_name'].value_counts().head()

artist(s)_name
Taylor Swift    34
The Weeknd      22
Bad Bunny       19
SZA             19
Harry Styles    17
Name: count, dtype: int64

In [4]:
# Top 5 songs on spotify playlist
df.nlargest(5, 'in_spotify_playlists')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
757,Get Lucky - Radio Edit,"Pharrell Williams, Nile Rodgers, Daft Punk",3,2013,1,1,52898,0,933815613,203,...,116,F#,Minor,79,87,81,4,0,10,4
630,Mr. Brightside,The Killers,1,2003,9,23,51979,15,1806617704,306,...,148,C#,Major,35,24,93,0,0,10,8
720,Wake Me Up - Radio Edit,Avicii,1,2013,1,1,50887,34,1970673297,315,...,124,D,Major,53,66,78,0,0,16,5
624,Smells Like Teen Spirit - Remastered 2021,Nirvana,1,1991,9,10,49991,9,1690192927,265,...,117,C#,Major,52,73,91,0,0,11,7
199,Take On Me,a-ha,1,1984,10,19,44927,17,1479115056,34,...,84,F#,Minor,57,86,90,2,0,9,5


In [5]:
# Lowest 5 songs on spotify playlist
df.nsmallest(5, 'in_spotify_playlists')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
94,Still With You,Jung Kook,1,2020,6,5,31,39,38411956,2,...,88,C#,Minor,53,34,47,9,0,83,4
327,Peaches (from The Super Mario Bros. Movie),Jack Black,1,2023,4,7,34,0,68216992,0,...,92,A#,Minor,71,41,31,79,0,10,5
112,LAGUNAS,"Jasiel Nuï¿½ï¿½ez, Peso P",2,2023,6,22,58,18,39058561,2,...,116,B,Major,77,79,62,33,1,15,3
185,Cheques,Shubh,1,2023,5,19,67,8,47956378,7,...,90,E,Minor,74,36,63,26,0,27,5
104,New Jeans,NewJeans,1,2023,7,7,77,35,29562220,8,...,134,E,Minor,81,53,72,51,0,12,5


In [6]:
# Top 5 songs on apple music playlist
df.nlargest(5, 'in_apple_playlists')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
55,Blinding Lights,The Weeknd,1,2019,11,29,43899,69,3703895074,672,...,171,C#,Major,50,38,80,0,0,9,7
403,One Kiss (with Dua Lipa),"Calvin Harris, Dua Lipa",2,2017,6,2,27705,10,1897517891,537,...,124,A,Minor,79,59,86,4,0,8,11
620,Dance Monkey,Tones and I,1,2019,5,10,24529,0,2864791672,533,...,98,F#,Minor,82,54,59,69,0,18,10
407,Don't Start Now,Dua Lipa,1,2019,10,31,27119,0,2303033973,532,...,124,B,Minor,79,68,79,1,0,10,8
84,STAY (with Justin Bieber),"Justin Bieber, The Kid Laroi",2,2021,7,9,17050,36,2665343922,492,...,170,C#,Major,59,48,76,4,0,10,5


In [7]:
# Lowest 5 songs on apple music playlist
df.nsmallest(5, 'in_apple_playlists')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
29,Dance The Night (From Barbie The Album),Dua Lipa,1,2023,5,25,2988,101,127408954,0,...,110,B,Minor,67,78,85,2,0,33,5
53,(It Goes Like) Nanana - Edit,Peggy Gou,1,2023,6,15,2259,59,57876440,0,...,130,G,Minor,67,96,88,12,19,8,4
154,Locked Out Of Heaven,Bruno Mars,1,2012,12,5,1622,9,1481349984,0,...,144,F,Major,73,87,70,6,0,28,5
169,When I Was Your Man,Bruno Mars,1,2012,12,5,2420,11,1661187319,0,...,145,,Major,60,43,27,94,0,14,4
170,Let Me Down Slowly,Alec Benjamin,1,2018,5,25,5897,19,1374581173,0,...,150,C#,Minor,65,51,55,73,0,14,3


**Note:** We can not analyze the 'Deezer' Playlist song due to it's datatype. We will preprocessed it's data later.

In [11]:
# Let's check most number of artists count
df.nlargest(5, 'artist_count')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
35,Los del Espacio,"Big One, Duki, Lit Killah, Maria Becerra, FMK,...",8,2023,6,1,1150,31,123122413,22,...,120,,Major,81,63,68,11,0,11,4
642,Se Le Ve,"Arcangel, De La Ghetto, Justin Quiles, Lenny T...",8,2021,8,12,1560,0,223319934,72,...,84,G,Minor,56,61,76,10,0,14,11
506,We Don't Talk About Bruno,"Adassa, Mauro Castillo, Stephanie Beatriz, Enc...",7,2021,11,19,2785,0,432719968,95,...,206,,Minor,58,83,45,36,0,11,8
667,"Cayï¿½ï¿½ La Noche (feat. Cruz Cafunï¿½ï¿½, Ab...","Quevedo, La Pantera, Juseph, Cruz Cafunï¿½ï¿½,...",7,2022,1,14,1034,1,245400167,19,...,174,F,Minor,67,74,75,44,0,7,30
393,Jhoome Jo Pathaan,"Arijit Singh, Vishal Dadlani, Sukriti Kakar, V...",6,2022,12,22,138,4,1365184,13,...,105,G,Major,82,62,74,10,0,33,7


In [12]:
# Let's check least number of artists count
df.nsmallest(5, 'artist_count')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
1,LALA,Myke Towers,1,2023,3,23,1474,48,133716286,48,...,92,C#,Major,71,61,74,7,0,10,4
2,vampire,Olivia Rodrigo,1,2023,6,30,1397,113,140003974,94,...,138,F,Major,51,32,53,17,0,31,6
3,Cruel Summer,Taylor Swift,1,2019,8,23,7858,100,800840817,116,...,170,A,Major,55,58,72,11,0,11,15
4,WHERE SHE GOES,Bad Bunny,1,2023,5,18,3133,50,303236322,84,...,144,A,Minor,65,23,80,14,63,11,6
7,Columbia,Quevedo,1,2023,7,7,714,43,58149378,25,...,100,F,Major,67,26,71,37,0,11,4


In [14]:
# Let's check the highest percentage of energy used in the songs
df.nlargest(5, 'energy_%')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
42,I'm Good (Blue),"Bebe Rexha, David Guetta",2,2022,8,26,12482,80,1109433169,291,...,128,G,Minor,56,38,97,4,0,35,4
319,Murder In My Mind,Kordhell,1,2022,1,21,2459,20,448843705,20,...,120,A#,Major,71,57,97,1,0,13,11
60,Tï¿½ï¿,"dennis, MC Kevin o Chris",2,2023,5,4,731,15,111947664,27,...,130,B,Major,86,59,96,50,1,9,5
795,That That (prod. & feat. SUGA of BTS),"PSY, Suga",2,2022,4,29,802,0,212109195,16,...,130,E,Major,91,91,96,3,0,3,9
367,Bombonzinho - Ao Vivo,"Israel & Rodolffo, Ana Castela",2,2022,11,3,1254,6,263453310,26,...,158,C#,Major,65,72,95,31,0,92,5


In [15]:
# Let's check the highest percentage of danceability used in the songs
df.nlargest(5, 'danceability_%')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
595,Peru,"Ed Sheeran, Fireboy DML",2,2021,12,23,2999,0,261286503,60,...,108,G,Minor,96,71,42,57,0,8,9
224,Players,Coi Leray,1,2022,11,30,4096,6,335074782,118,...,105,F#,Major,95,62,52,3,0,5,16
250,The Real Slim Shady,Eminem,1,2000,1,1,20763,27,1424589568,81,...,104,F,Minor,95,78,66,3,0,4,6
321,CAIRO,"Karol G, Ovy On The Drums",2,2022,11,13,2418,26,294352144,52,...,115,F,Minor,95,43,69,47,0,9,31
423,Super Freaky Girl,Nicki Minaj,1,2022,8,12,4827,0,428685680,104,...,133,D,Major,95,91,89,6,0,31,24


In [16]:
# Let's check the highest percentage of instrumentalness used in the songs
df.nlargest(5, 'instrumentalness_%')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
684,Alien Blues,Vundabar,1,2015,7,24,1930,0,370068639,3,...,82,D#,Major,47,44,76,8,91,9,3
284,METAMORPHOSIS,INTERWORLD,1,2021,11,25,1561,24,357580552,18,...,175,G,Minor,59,15,64,43,90,12,10
917,Poland,Lil Yachty,1,2022,6,23,1584,0,115331792,38,...,150,F,Minor,70,26,56,14,83,11,5
691,Forever,Labrinth,1,2019,10,4,3618,0,282883169,21,...,80,E,Minor,56,19,46,92,72,11,3
4,WHERE SHE GOES,Bad Bunny,1,2023,5,18,3133,50,303236322,84,...,144,A,Minor,65,23,80,14,63,11,6


In [17]:
# Let's check the highest percentage of speechiness used in the songs
df.nlargest(5, 'speechiness_%')

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
247,Cartï¿½ï¿½o B,"MC Caverinha, KayBlack",2,2023,5,11,269,4,71573339,7,...,108,A,Minor,84,55,47,26,0,20,64
935,On BS,"Drake, 21 Savage",2,2022,11,4,1338,0,170413877,9,...,158,A,Major,84,33,36,2,0,39,59
209,Area Codes,"Kaliii, Kaliii",2,2023,3,17,1197,13,113509496,44,...,155,C#,Major,82,51,39,2,0,9,49
426,Limbo,Freddie Dredd,1,2022,8,11,688,0,199386237,14,...,75,B,Minor,80,46,62,3,6,11,46
780,Savior,"Kendrick Lamar, Sam Dew, Baby Keem",3,2022,5,13,2291,0,86176890,9,...,123,G#,Major,61,66,71,53,0,32,46


### (3.) Data Pre-Processing

In [9]:
df.iloc[574]

track_name                            Love Grows (Where My Rosemary Goes)
artist(s)_name                                          Edison Lighthouse
artist_count                                                            1
released_year                                                        1970
released_month                                                          1
released_day                                                            1
in_spotify_playlists                                                 2877
in_spotify_charts                                                       0
streams                 BPM110KeyAModeMajorDanceability53Valence75Ener...
in_apple_playlists                                                     16
in_apple_charts                                                         0
in_deezer_playlists                                                    54
in_deezer_charts                                                        0
in_shazam_charts                      

In [10]:
print('Irrelevant data for particular track (Love Grows):',df.iloc[574]['streams'])

Irrelevant data for particular track (Love Grows): BPM110KeyAModeMajorDanceability53Valence75Energy69Acousticness7Instrumentalness0Liveness17Speechiness3


It is not a value which we need for our analysis. Let's explore spotify for this track and extract the approx. streams on this track. After that, we will change the stream value with irrelevant value.

In [16]:
# Spotify showing '268,498,010' streams for this 'Love Grows' track, let's change it to 268498010
df['streams'].iloc[574] = '268498010'

# Check the data type of the streams column
df['streams'] = df['streams'].str.replace(',', '').astype(int)

Hence, we successfully replace the value of streams for that particular track and changed the datatype for whole 'streams' column.

In [21]:
# Deezers Playlist column has object data type, but it should be int. Let's change it to int
df['in_deezer_playlists'] = df['in_deezer_playlists'].str.replace(',', '').astype(int)

In [17]:
df.head()

Unnamed: 0,track_name,artist(s)_name,artist_count,released_year,released_month,released_day,in_spotify_playlists,in_spotify_charts,streams,in_apple_playlists,...,bpm,key,mode,danceability_%,valence_%,energy_%,acousticness_%,instrumentalness_%,liveness_%,speechiness_%
0,Seven (feat. Latto) (Explicit Ver.),"Latto, Jung Kook",2,2023,7,14,553,147,141381703,43,...,125,B,Major,80,89,83,31,0,8,4
1,LALA,Myke Towers,1,2023,3,23,1474,48,133716286,48,...,92,C#,Major,71,61,74,7,0,10,4
2,vampire,Olivia Rodrigo,1,2023,6,30,1397,113,140003974,94,...,138,F,Major,51,32,53,17,0,31,6
3,Cruel Summer,Taylor Swift,1,2019,8,23,7858,100,800840817,116,...,170,A,Major,55,58,72,11,0,11,15
4,WHERE SHE GOES,Bad Bunny,1,2023,5,18,3133,50,303236322,84,...,144,A,Minor,65,23,80,14,63,11,6


In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 953 entries, 0 to 952
Data columns (total 24 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   track_name            953 non-null    object
 1   artist(s)_name        953 non-null    object
 2   artist_count          953 non-null    int64 
 3   released_year         953 non-null    int64 
 4   released_month        953 non-null    int64 
 5   released_day          953 non-null    int64 
 6   in_spotify_playlists  953 non-null    int64 
 7   in_spotify_charts     953 non-null    int64 
 8   streams               953 non-null    int64 
 9   in_apple_playlists    953 non-null    int64 
 10  in_apple_charts       953 non-null    int64 
 11  in_deezer_playlists   953 non-null    int64 
 12  in_deezer_charts      953 non-null    int64 
 13  in_shazam_charts      903 non-null    object
 14  bpm                   953 non-null    int64 
 15  key                   858 non-null    ob