
# Analysis of Top 50 Spotify Tracks of 2019

###### At the end of each year, Spotify compiles a playlist of the songs streamed most often over the course of that year. The playlist Top Tracks of 2019 includes 50 songs. The question is: What is the relationship between top Genres and Artists? Why do people like these songs?
##### Data Source: Kaggle
##### Data Description: There is one .xlsx file in the dataset. (Spotify analysis.xlsx) This file includes:
1. Name of the song
2. Artist of the song
3. Genre of the song
4. Audio features for the song (such as danceability, tempo, key etc.)
At the end of this notebook, we will provide a conclusion of our study


###  Import Numpy, Pandas, Matplotlib, Seaborn

In [None]:
import pandas as pd
import numpy as np

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

###      Read the csv file as a Dataframe named df

In [None]:
df=pd.read_csv('../input/top50spotify2019/top50.csv',encoding='ISO-8859-1')

In [None]:
df.info()

In [None]:
df.columns=['Position','Track Name','Artist Name','Genre','Beats Per Minute','Energy','Danceability','Loudness','Liveness','Valence','Length','Acousticness','Speechiness','Popularity']

In [None]:
df.head()

### Conversion of column 'length' into standard time format

In [None]:
df['temp']=((df['Length']/60).astype('int'))

In [None]:
df['Duration'] = pd.to_timedelta((df['Length']/60).astype('int'), unit='m')+pd.to_timedelta((df['Length']-(df['temp']*60)).astype('int'), unit='s')

In [None]:
df.drop('temp',axis=1,inplace=True)

In [None]:
df.drop('Length',axis=1,inplace=True)

In [None]:
df.drop('Position',axis=1,inplace=True)

### Identification of correlation between columns

For this we will make use pf corr() function of pandas to create a heat map to find the correlation between various columns.

In [None]:
df.head()
sns.heatmap(df.corr(),cmap='coolwarm')

### Top 5 Artists with Maximum Presence in 2018

In [None]:
df['Artist Name'].value_counts().head(5)

### Danceability Column Analysis


In [None]:
sns.set_style('darkgrid');
sns.distplot(df['Danceability'])



##### Analysis:  In this graph we can see that all tracks with value greater than a 60 are considered danceable. To make better analysis, let us divide them into 3 groups:
1. Greater than 80: Extremely Danceable
2. Greater than 60 & Less than 80: Moderately Danceable
3. Less than 60: Non Danceable

In [None]:
Vd=df['Danceability']>=80
Rd=(df['Danceability']>=60) & (df['Danceability']<80)
Nd=df['Danceability']<60

In [None]:
Dancing=[Vd.sum(),Rd.sum(),Nd.sum()]

In [None]:
Dance=pd.DataFrame(Dancing,columns=['Percent'],index=['Extremely Danceable','Moderately Danceable','Not Danceable'])

In [None]:
Dance

### Energy Column Analysis

In [None]:
sns.distplot(df['Energy'])

##### Analysis:
In this graph we can see that all tracks with value greater than 60 are considered energetic. To make better analysis, let us divide them into 3 groups:
1. Greater than 75%: Extremely Energetic
2. Greater than 60% & Less than 75%: Moderately Energetic
3. Less than 60%: Non Energetic

In [None]:
Ve=df['Energy']>=75
Re=(df['Energy']>=60) & (df['Energy']<75)
Ne=df['Energy']<60

In [None]:
Energy=[Ve.sum(),Re.sum(),Ne.sum()]

In [None]:
En=pd.DataFrame(Energy,columns=['Total'],index=['Extremely Energetic','Moderately Energetic','Not Energetic'])

In [None]:
En

### Correlation Zone
We will consider the most important columns according to the preliminary analysis of Heat Map previously seen

In [None]:
correlation=df[['Energy','Danceability','Loudness','Liveness','Valence','Acousticness','Speechiness','Popularity']]

In [None]:
sns.heatmap(correlation.corr(),cmap='coolwarm')

###  Top 10 most Danceable Songs

In [None]:
df[['Track Name','Artist Name','Genre','Energy','Danceability','Popularity']].sort_values('Danceability',ascending=False).head(10)


###  Top 10 most Energetic Songs

In [None]:
df[['Track Name','Artist Name','Genre','Energy','Danceability','Popularity']].sort_values('Energy',ascending=False).head(10)


### We maintain data between Energy and Popularity

In [None]:
plt.subplot(1,1,1)
sns.barplot(x='Energy',y='Popularity',data=df,palette='coolwarm')
plt.tight_layout()

### We maintain data between Energy and Popularity

In [None]:
plt.subplot(1,1,1)
sns.barplot(x='Danceability',y='Popularity',data=df,palette='coolwarm')
plt.tight_layout()

### Musical Tempo
Musical Tempo is defined as the speed or pace at which a section of music is played. Tempo helps the composer to convey a feeling of either intensity or relaxation. We can think of the tempo as the speedometer of the music. Typically, the speed of the music is measured in beats per minute, or BPM.

The 5 most usual ones were:
1. Adagio: slow and majestic (66 to 85 bpm)
2. Andante: at the pace, quiet, a little vivacious (86 to 100 bpm)
3. Allegretto: Fast beats (101 to 120 bpm)
4. Allegro: animated and fast. (120 to 156 bpm)
5. Vivace: faster then Allegro. (156 to 176 bpm)
5. Presto: very fast. (176 to 200 bpm)

In [None]:
def Rhythm(value):
    if value <= 85:
        return 'Adagio'
    elif value > 86 and value <= 100:
        return 'Andante'
    elif value>101 and value<=120:
        return 'Allegretto'
    elif value > 120 and value <= 156:
        return 'Allegro'
    elif value > 156 and value <= 176:
        return 'Vivace'
    elif value>176:
        return 'Presto'

In [None]:
df['Rhythm']=df['Beats Per Minute'].apply(Rhythm)
df.head()


### Classification according to Tempo¶

In [None]:
df['Rhythm'].value_counts()

### Analysis of Top 3 Artists

In [None]:
df['Artist Name'].value_counts().head(3)

### Artist : Ed Sheeran

In [None]:
EdSheeran=df[df['Artist Name']=='Ed Sheeran']
EdSheeran[['Artist Name','Track Name','Genre','Danceability','Energy','Loudness','Liveness','Valence','Acousticness','Speechiness','Popularity',]]

In [None]:
sns.countplot(x='Rhythm',data=EdSheeran,palette='coolwarm')

### Artist : J Balvin

In [None]:
JBalvin=df[df['Artist Name']=='J Balvin']
JBalvin[['Artist Name','Track Name','Genre','Danceability','Energy','Loudness','Liveness','Valence','Acousticness','Speechiness','Popularity',]]

In [None]:
sns.countplot(x='Rhythm',data=JBalvin,palette='coolwarm')

### Artist : Post Malone

In [None]:
PostMalone=df[df['Artist Name']=='Post Malone']
PostMalone[['Artist Name','Track Name','Genre','Danceability','Energy','Loudness','Liveness','Valence','Acousticness','Speechiness','Popularity']]

In [None]:
sns.countplot(x='Rhythm',data=PostMalone,palette='coolwarm')

### Analysis of Top 3 Genres

In [None]:
df['Genre'].value_counts().head(5)

### Genre: Dance Pop

In [None]:
DancePop=df[df['Genre']=='dance pop']
DancePop[['Artist Name','Genre','Danceability','Energy','Loudness','Liveness','Valence','Acousticness','Speechiness','Rhythm']]

In [None]:
fig=plt.figure(figsize=(7,5))
fig=sns.countplot(x='Rhythm',data=DancePop,palette='coolwarm')

### Genre: Pop

In [None]:
Pop=df[df['Genre']=='pop']
Pop[['Artist Name','Genre','Danceability','Energy','Loudness','Liveness','Valence','Acousticness','Speechiness','Rhythm']]

In [None]:
fig=plt.figure(figsize=(7,5))
fig=sns.countplot(x='Rhythm',data=Pop,palette='coolwarm')

### Genre: Latin

In [None]:
Latin=df[df['Genre']=='latin']
Latin[['Artist Name','Genre','Danceability','Energy','Loudness','Liveness','Valence','Acousticness','Speechiness','Rhythm']]

In [None]:
fig=plt.figure(figsize=(7,5))
fig=sns.countplot(x='Rhythm',data=Latin,palette='coolwarm')

### Top 10 Most Positive Songs

Tracks with High Valence sound more positive, while Tracks with Low Valence sound negative

In [None]:
df[['Track Name','Artist Name','Genre','Energy','Danceability','Loudness','Liveness','Valence','Acousticness','Speechiness','Popularity',
    'Rhythm']].sort_values(by='Valence',ascending=False).head(10)

##### Analysis:  
The Top 10 Songs in the above categories show that songs with categories Andante and Allegro are more positive and songs with categories Allegro are more danceable

In [None]:
plt.figure(figsize=(9,5))
plt.subplot(1,2,1)
sns.barplot(x='Rhythm', y='Danceability', data=df, palette='coolwarm')
plt.subplot(1,2,2)
sns.barplot(x='Rhythm', y='Energy', data=df, palette='coolwarm')

##### Analysis:  
The above graphs show that Rhythms Andante and Allegro are more danceable while Presto and Vivace have more Energy

In [None]:
fig=plt.figure(figsize=(7,5))
fig=sns.barplot(x='Rhythm',y='Loudness',data=df,palette='viridis')

In [None]:
plt.figure(figsize=(9,5))
plt.subplot(2,2,1)
sns.stripplot(x='Rhythm',y='Valence',data=df,palette='coolwarm')
plt.subplot(2,2,2)
sns.stripplot(x='Rhythm',y='Loudness',data=df,palette='coolwarm')
plt.subplot(2,2,3)
sns.stripplot(x='Rhythm',y='Acousticness',data=df,palette='coolwarm')
plt.subplot(2,2,4)
sns.stripplot(x='Rhythm',y='Popularity',data=df,palette='coolwarm')

##### Analysis:  
It clearly shows that the rhythm Allegro has higher number of the most popular songs but varies in overall popularity while Andante produces constant results which have decent popularity

In [None]:
plt.figure(figsize=(15,5))
plt.subplot(2,2,1)
sns.countplot(x='Artist Name',data=DancePop,palette='coolwarm')
plt.subplot(2,2,2)
sns.countplot(x='Artist Name',data=Pop,palette='coolwarm')
plt.subplot(2,2,3)
sns.countplot(x='Artist Name',data=Latin,palette='coolwarm')

## Conclusion

To answer the questions posed by this data set, we can say that the 3 most popular artists follow Pop, Latin and Rap.
On the other hand, Dance Pop genre had the most mentions 7 different artists following that Genre.
When we look it from a different perspective, we can see that Allegro and Andante were some of the most popular Rhythms. These Rhythms are characteristics of songs from the Genre of Pop and Dance Pop.
To conclude we can say that the most popular Genres were 
1. Dance Pop 
2. Pop
3. Latin

They shared 15 artists among them. Top Artists liked Ed Sheeran and J Balvin followed these genres.

We further concluded the relation between danceability, eneergy and valence based on which we analysed the top energetic, danceable and positive tracks