# Dive Deeper

## Spotify Data Analysis
 
In this section of this part, you are asked to analyze spotify data from top music in 2020-2021. The data have information about artist, song name, number of streams, and audio features like popularity, danceability, etc. 

*Di bagian ini, Anda diminta untuk menganalisis data spotify dari musik top tahun 2020-2021. Data tersebut memiliki informasi tentang artis, nama lagu, jumlah streaming, dan fitur audio seperti popularitas, danceability, dll.*

**Data Sources**
- [Spotify Top 200 Charts](https://www.kaggle.com/sashankpillai/spotify-top-200-charts-20202021).

### Import Library

We need to import the libraries for the data analysis process.

*Kita perlu mengimpor library yang kita butuhkan untuk proses analisis data*

In [46]:
# Import library
import pandas as pd

# Set option
pd.set_option('display.float_format', lambda x: '%.2f' % x)

### Data Preparation

Read the data and save into variable names `spotify`.

*Baca data dan simpanlah kedalam variabel dengan nama `spotify`*

In [47]:
# Import data
spotify = pd.read_csv('./data/spotify_clean.csv')
spotify.head()

Unnamed: 0,Song Name,Streams,Artist,Artist Followers,Release Date,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
0,Beggin',48633449,Måneskin,3377762.0,2017-12-08,100.0,0.71,0.8,-4.81,0.05,0.13,0.36,134.0,211560.0,0.59
1,STAY (with Justin Bieber),47248719,The Kid LAROI,2230022.0,2021-07-09,99.0,0.59,0.76,-5.48,0.05,0.04,0.1,169.93,141806.0,0.48
2,good 4 u,40162559,Olivia Rodrigo,6266514.0,2021-05-21,99.0,0.56,0.66,-5.04,0.15,0.34,0.08,166.93,178147.0,0.69
3,Bad Habits,37799456,Ed Sheeran,83293380.0,2021-06-25,98.0,0.81,0.9,-3.71,0.03,0.05,0.36,126.03,231041.0,0.59
4,INDUSTRY BABY (feat. Jack Harlow),33948454,Lil Nas X,5473565.0,2021-07-23,96.0,0.74,0.7,-7.41,0.06,0.02,0.05,150.0,212000.0,0.89


### Change Data Types

Perform information about data types in each columns. Does data types store in the correct form? If no, then change into appropriate data types.

*Coba check tipe data untuk setiap kolom dari data `spotify`. Apakah ada tipe data yang tidak sesuai?, jika ada, maka ubahlah menjadi tipe data yang sesuai.*

In [48]:
##code here
spotify.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1556 entries, 0 to 1555
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Song Name         1556 non-null   object 
 1   Streams           1556 non-null   int64  
 2   Artist            1556 non-null   object 
 3   Artist Followers  1545 non-null   float64
 4   Release Date      1545 non-null   object 
 5   Popularity        1545 non-null   float64
 6   Danceability      1545 non-null   float64
 7   Energy            1545 non-null   float64
 8   Loudness          1545 non-null   float64
 9   Speechiness       1545 non-null   float64
 10  Acousticness      1545 non-null   float64
 11  Liveness          1545 non-null   float64
 12  Tempo             1545 non-null   float64
 13  Duration (ms)     1545 non-null   float64
 14  Valence           1545 non-null   float64
dtypes: float64(11), int64(1), object(3)
memory usage: 182.5+ KB


In [49]:
"""
Artist --> Category
Release Date --> datetime64
"""
spotify['Release Date'] = spotify['Release Date'].astype('datetime64')
spotify['Artist'] = spotify['Artist'].astype('category')
spotify.dtypes 

Song Name                   object
Streams                      int64
Artist                    category
Artist Followers           float64
Release Date        datetime64[ns]
Popularity                 float64
Danceability               float64
Energy                     float64
Loudness                   float64
Speechiness                float64
Acousticness               float64
Liveness                   float64
Tempo                      float64
Duration (ms)              float64
Valence                    float64
dtype: object

## Most Popular Song and Artist

When you inspect the data, you will find that there is information about number of streams in each song and also number of artist followers. 

Please find:

1. What is the most popular song based on the number of streams?
2. Who is the most popular artist based on number of followers?

*Ketika melakukan inspek pada data, kamu akan menemkan informasi tentan* ***jumlah streams di setiap lagu*** *dan juga* ***jumlah followers artis tersebut***

*Coba tentukan:*

1. *Lagu apa yang paling populer berdasarkan jumlah streaming?*
2. *Siapa artist terpopuler berdasarkan jumlah pengikut?*

In [50]:
# Deskripsi dataset spotify
spotify.describe()

Unnamed: 0,Streams,Artist Followers,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
count,1556.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0
mean,6340219.38,14716902.87,70.09,0.69,0.63,-6.35,0.12,0.25,0.18,122.81,197940.82,0.51
std,3369478.84,16675788.51,15.82,0.14,0.16,2.51,0.11,0.25,0.14,29.59,47148.93,0.23
min,4176083.0,4883.0,0.0,0.15,0.05,-25.17,0.02,0.0,0.02,46.72,30133.0,0.03
25%,4915322.25,2123734.0,65.0,0.6,0.53,-7.49,0.05,0.05,0.1,97.96,169266.0,0.34
50%,5275747.5,6852509.0,73.0,0.71,0.64,-5.99,0.08,0.16,0.12,122.01,193591.0,0.51
75%,6455044.25,22698747.0,80.0,0.8,0.75,-4.71,0.17,0.39,0.22,143.86,218902.0,0.69
max,48633449.0,83337783.0,100.0,0.98,0.97,1.51,0.88,0.99,0.96,205.27,588139.0,0.98


In [51]:
# Deskripsi 2
spotify.describe(exclude='number')

  spotify.describe(exclude='number')


Unnamed: 0,Song Name,Artist,Release Date
count,1556,1556,1545
unique,1556,716,475
top,Beggin',Taylor Swift,2020-01-17 00:00:00
freq,1,52,34
first,,,1942-01-01 00:00:00
last,,,2021-08-13 00:00:00


In [52]:
# Pertanyaan 1
# spotify[(spotify['Streams'] == 48633449.00)]
spotify[spotify['Streams']== spotify['Streams'].max()] 

Unnamed: 0,Song Name,Streams,Artist,Artist Followers,Release Date,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
0,Beggin',48633449,Måneskin,3377762.0,2017-12-08,100.0,0.71,0.8,-4.81,0.05,0.13,0.36,134.0,211560.0,0.59


> From the output above you will find that the most popular song is `Beggin'` which have number of streams around `48633449`

In [53]:
# Pertanyaan 2
spotify[spotify['Artist Followers'] == spotify['Artist Followers'].max()]

Unnamed: 0,Song Name,Streams,Artist,Artist Followers,Release Date,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
529,Photograph,4974880,Ed Sheeran,83337783.0,2014-06-21,83.0,0.61,0.38,-10.48,0.05,0.61,0.1,107.99,258987.0,0.2
541,I Don't Care (with Justin Bieber),4984399,Ed Sheeran,83337783.0,2019-05-10,80.0,0.8,0.68,-5.04,0.04,0.09,0.09,101.96,219947.0,0.84
548,Thinking out Loud,4995623,Ed Sheeran,83337783.0,2014-06-21,82.0,0.78,0.45,-6.06,0.03,0.47,0.18,79.0,281560.0,0.59
800,Beautiful People (feat. Khalid),4813796,Ed Sheeran,83337783.0,2019-06-28,79.0,0.64,0.65,-8.11,0.19,0.12,0.08,92.98,197867.0,0.55
1349,South of the Border (feat. Camila Cabello & Ca...,5315827,Ed Sheeran,83337783.0,2019-07-12,77.0,0.86,0.62,-6.38,0.08,0.15,0.09,97.99,204467.0,0.67


> From the output above you will find that the most popular artist is `Ed Sheeran` which have number of followers around `83337783`

## Hi Ed Sheeran, I want to know more about your song! :)

Perform conditional subsetting only from most popular artist a.k.a Mr.Ed Sheeran to analyze and knows more about his song and answer the following question :

1. How many popular song that hit the chart on 2020-2021 from artist Ed Sheeran? 
2. What is the most streamed song?

*Menggunakan conditional subsetting hanya dari artis paling popular yaitu Mr.xxx, lakukan analisis dan cari tau lebih banyak tentang lagu-lagunya, dan coba jawab pertanyaan berikut:*

1. *Berapa banyak lagu popular yang tembus ke daftar pada tahun 2020-2021 dari artis Ed Sheeran?*
2. *Apa judul lagu yang paling banyak diputar?*

In [54]:
# 1. *Berapa banyak lagu popular yang tembus ke daftar pada tahun 2020-2021 dari artis xxx?*
spotify[spotify['Artist'] == 'Ed Sheeran'].shape[0]

9

> Insight: Ada 9 lagu Ed Sheeran yang masuk top chart pada tahun 2020-2021

In [55]:
# Membuat DataFrame khusus artis Ed Sheeran
spotify_ed_Sheeran = spotify[spotify['Artist'] == 'Ed Sheeran']
# spotify_ed_Sheeran.describe()

In [56]:
# Judul lagu Ed Sheeran yang paling banyak diputar
spotify_ed_Sheeran[spotify_ed_Sheeran['Streams'] == spotify_ed_Sheeran['Streams'].max()]

Unnamed: 0,Song Name,Streams,Artist,Artist Followers,Release Date,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
3,Bad Habits,37799456,Ed Sheeran,83293380.0,2021-06-25,98.0,0.81,0.9,-3.71,0.03,0.05,0.36,126.03,231041.0,0.59


> Insight: Judul lagu yang paling banyak diputar dari artis Ed Sheeran adalah Bad Habits

3. Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. Value close to 1 indicate that the song most danceable. 
    + What is average value of danceability from Mr.Ed Sheeran songs?

3. *Danceability menggambarkan seberapa cocok trek untuk menari berdasarkan kombinasi elemen musik termasuk tempo, stabilitas ritme, kekuatan ketukan, dan keteraturan keseluruhan. Nilai yang mendekati 1 menunjukkan bahwa lagu tersebut paling danceable.* 
    + *Berapa nilai rata-rata danceability dari lagu Mr.Ed Sheeran?*

In [57]:
##code here
spotify_ed_Sheeran['Danceability'].mean()

0.7292222222222223

> Insight: Rata-rata `danceability` dari lagu Ed Sheeran adalah 0.73 yang artinya lagu-lagu Ed Sheeran itu cukup danceable.

## Try your Own

Try to make 2 questions that you can get from the spotify data above.

*Coba buatlah 2 pertanyaan yang bisa kamu dapatkan dari data spotify diatas*

1. Cari lagu yang memiliki popularitas tertinggi.
2. Cari lagu dengan durasi terlama.
3. Deskripsikan rata-rata `Loudness`, `Speechiness`, `Accousticness`, dll dari lagu-lagu yang memiliki popularitas lebih besar sama dengan  90.
4. Cari lagu paling danceable

In [58]:
spotify.describe()

Unnamed: 0,Streams,Artist Followers,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
count,1556.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0,1545.0
mean,6340219.38,14716902.87,70.09,0.69,0.63,-6.35,0.12,0.25,0.18,122.81,197940.82,0.51
std,3369478.84,16675788.51,15.82,0.14,0.16,2.51,0.11,0.25,0.14,29.59,47148.93,0.23
min,4176083.0,4883.0,0.0,0.15,0.05,-25.17,0.02,0.0,0.02,46.72,30133.0,0.03
25%,4915322.25,2123734.0,65.0,0.6,0.53,-7.49,0.05,0.05,0.1,97.96,169266.0,0.34
50%,5275747.5,6852509.0,73.0,0.71,0.64,-5.99,0.08,0.16,0.12,122.01,193591.0,0.51
75%,6455044.25,22698747.0,80.0,0.8,0.75,-4.71,0.17,0.39,0.22,143.86,218902.0,0.69
max,48633449.0,83337783.0,100.0,0.98,0.97,1.51,0.88,0.99,0.96,205.27,588139.0,0.98


In [59]:
# Pertanyaan 1
spotify[spotify['Popularity'] == spotify['Popularity'].max()]

Unnamed: 0,Song Name,Streams,Artist,Artist Followers,Release Date,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
0,Beggin',48633449,Måneskin,3377762.0,2017-12-08,100.0,0.71,0.8,-4.81,0.05,0.13,0.36,134.0,211560.0,0.59


In [60]:
# Pertanyaan 2
spotify[spotify['Duration (ms)'] == spotify['Duration (ms)'].max()]

Unnamed: 0,Song Name,Streams,Artist,Artist Followers,Release Date,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
259,SWEET / I THOUGHT YOU WANTED TO DANCE (feat. B...,9142721,"Tyler, The Creator",6777818.0,2021-06-25,79.0,0.47,0.65,-4.91,0.07,0.33,0.55,140.22,588139.0,0.39


In [23]:
# Pertanyaan 3
spotify_popularity_90_until_100 = spotify[spotify['Popularity'] >= 90]
spotify_popularity_90_until_100.describe()

Unnamed: 0,Streams,Artist Followers,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
count,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0
mean,20986429.63,16944506.57,93.46,0.67,0.65,-5.6,0.08,0.24,0.18,135.46,198284.2,0.55
std,10168353.31,18296106.48,2.86,0.14,0.15,1.8,0.05,0.25,0.12,28.96,35018.92,0.24
min,8843110.0,83689.0,90.0,0.37,0.27,-10.5,0.03,0.0,0.05,80.02,137876.0,0.08
25%,14301399.5,5629315.0,91.0,0.59,0.59,-6.6,0.04,0.02,0.09,118.09,173673.5,0.34
50%,17617965.0,6266514.0,93.0,0.69,0.66,-5.04,0.06,0.16,0.11,130.0,195053.0,0.59
75%,24790859.5,33113264.5,95.0,0.76,0.75,-4.37,0.12,0.32,0.29,165.43,213753.5,0.73
max,48633449.0,83293380.0,100.0,0.89,0.9,-3.42,0.23,0.87,0.42,180.92,287120.0,0.96


In [62]:
spotify[spotify['Danceability'] >= spotify['Danceability'].max()]

Unnamed: 0,Song Name,Streams,Artist,Artist Followers,Release Date,Popularity,Danceability,Energy,Loudness,Speechiness,Acousticness,Liveness,Tempo,Duration (ms),Valence
583,Dancing in My Room,4914093,347aidan,211500.0,2020-10-26,79.0,0.98,0.41,-11.05,0.1,0.67,0.17,119.99,180139.0,0.76


## Conclusion

- Lagu dengan popularity tertinggi adalah lagu dengan jumlah streams terbanyak
- 