<div style="background:#96D225; padding:0.5rem; font-size:1.5rem">
Author: Ikshit Gupta
    </div>

<div style="background:#96DDD5; padding:1rem; font-size:1.5rem">
 
Introduction
</div>

<div style="background:#91f2f5; padding:1rem; font-size:1.5rem">
 
Songs on Spotify span several genres. There are just 15 genres in the data I'm planning to use for this project The objective here is to analyze all the genres with different features and check how they relate to genres.
</div>

## <div style="background:#96DDD5; padding:1rem">Data Loading</div>

In [112]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# plt.rcParams['figure.figsize'] = [15,8]
# plt.rcParams['axes.spines.right'] = False
# plt.rcParams['axes.spines.top'] = False

import warnings
warnings.filterwarnings('ignore')

In [113]:
df = pd.read_csv("../input/dataset-of-songs-in-spotify/genres_v2.csv")
df.head()

In [114]:
df.shape

In [115]:
df.isna().sum()

In [116]:
df = df.drop(['type', 'uri', 'track_href', 'analysis_url', 'Unnamed: 0', 'title'], 1)

In [117]:
df.select_dtypes('number').describe().T

## <div style="background:#96DDD5; padding:1rem">Data Distribution</div>

In [118]:
categorical_cols = ['time_signature', 'genre']

fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(15,6))
axs = np.ravel(axs)

for i, col in enumerate(categorical_cols):
    plt.sca(axs[i])
    sns.countplot(data=df, x=col)
    plt.xticks(rotation=90)

plt.show()

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Data contains only a few instances of ‘time signature’ 1, 3, and 5. Most instances are 4 ‘time signature’.
- Underground Rap has the maximum number, and Pop has the minimum number of instances.
</div>

In [119]:
numerical_cols = df.select_dtypes('number').drop('time_signature', 1).columns.to_list()

In [120]:
fig, axs = plt.subplots(nrows=4, ncols=3, figsize=(15,20))
axs = np.ravel(axs)

for i, col in enumerate(numerical_cols):
    plt.sca(axs[i])
    sns.kdeplot(data=df, x=col, fill=True, color='green')

plt.tight_layout()
plt.show()

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Danceability follows an approximately normal distribution.
- Energy feature is negatively skewed.
- The loudness feature is normally distributed with a few outliers.
- The number of instances with mode 1 is greater than mode 0.
- The Speechiness feature is positively skewed, meaning the 'speechiness' feature's mean is greater than the median.
- 'Acousticness', 'instrumentalness', 'liveness', and valence features are all positively skewed.
- The "temp" and "duration_ms" features seem to have a normal distribution.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Danceability</div>

In [121]:
sns.barplot(data=df, x='genre', y='danceability')
plt.xticks(rotation=15)
plt.show()

In [122]:
def plot_genre_horizontal_bar(col, title=None):
    data = df.groupby('genre')[col].mean().sort_values()

    cmap = plt.cm.coolwarm_r
    norm = plt.Normalize(vmin=data.min(), vmax=data.max())
    colors = [cmap(norm(value)) for value in data]

    data.plot.barh(color=colors)
    plt.xlabel(col)
    plt.title(title, fontdict={'size': 18, 'color': '#de5d83'})
    plt.show()

In [123]:
plot_genre_horizontal_bar('danceability',
                          title="Average Danceability in each genre")

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Songs in the genre 'techhouse' have the maximum average danceability, followed by underground Rap. And hardstyle songs have minimum danceability.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Energy</div>

In [124]:
sns.barplot(data=df, x='genre', y='energy')
plt.xticks(rotation=15)
plt.show()

In [125]:
plot_genre_horizontal_bar('energy', title="Average energy in each genre")

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Songs in genre "trap", "psytrance", "hardstyle", "trance", and "dnb" have the highest energy, and RnB, Rap, and Underground Rap lowest.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Loudness</div>

In [126]:
sns.barplot(data=df, x='genre', y='loudness')
plt.xticks(rotation=15)
plt.show()

In [127]:
plot_genre_horizontal_bar('loudness', title="Average loudness in each genre")

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Songs in all genres have loudness scores in negatives. "Trap" genre has the highest "loudness", and "techno" genre lowest.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Speechiness</div>

In [128]:
sns.barplot(data=df, x='genre', y='speechiness')
plt.xticks(rotation=15)
plt.show()

In [129]:
plot_genre_horizontal_bar('speechiness',
                          title='Average speechiness in each genre')

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Underground Rap, Rap, and Hiphop are the top-3 genres for “speechiness”, and trance, techno, and psytrance are the lowest.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Acousticness</div>

In [130]:
sns.barplot(data=df, x='genre', y='acousticness')
plt.xticks(rotation=15)
plt.show()

In [131]:
plot_genre_horizontal_bar('acousticness',
                          title='Average acousticness in each genre')

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Songs in genre 'RnB' have the maximum acousticness score and followed by Hiphop and Rap.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Instrumentalness</div>

In [132]:
sns.barplot(data=df, x='genre', y='instrumentalness')
plt.xticks(rotation=15)
plt.show()

In [133]:
plot_genre_horizontal_bar('instrumentalness',
                          title='Average instrumentalness in each genre')

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- The 'techno' genre songs have the maximum instrumentalness score, followed by 'psytrance' genre.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Liveness</div>

In [134]:
sns.barplot(data=df, x='genre', y='liveness')
plt.xticks(rotation=15)
plt.show()

In [135]:
plot_genre_horizontal_bar('liveness', title='Average liveness in each genre')

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- "Trance" and "psytrance" are the top genres with liveness scores.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Valence</div>

In [136]:
sns.barplot(data=df, x='genre', y='valence')
plt.xticks(rotation=15)
plt.show()

In [137]:
plot_genre_horizontal_bar('valence', title='Average valance in each genre')

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- The valence bis maximum for 'techhouse' genre followed by Pop and minimum for 'techno' and 'trance'.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Tempo</div>

In [138]:
sns.barplot(data=df, x='genre', y='tempo')
plt.xticks(rotation=15)
plt.show()

In [139]:
plot_genre_horizontal_bar('tempo', title='Average tempo in each genre')

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Tempo score the highest for 'dnb' genre followed by 'Hiphop' genre. And 'techhouse' has the lowest.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Time Duration</div>

In [140]:
sns.barplot(data=df, x='genre', y='duration_ms')
plt.xticks(rotation=15)
plt.show()

In [141]:
plot_genre_horizontal_bar('duration_ms',
                          title='Average time duration in each genre')

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Songs in genres 'psytrance' and 'techno' are the longest than all other genres.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Mode</div>

In [142]:
data = (df
        .groupby(['genre', 'mode'])['mode']
        .count()
        .unstack(1))

data.style.background_gradient(cmap=plt.cm.coolwarm_r)

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- In both modes, Underground Rap has the maximum number of instances.
</div>

## <div style="background:#96DDD5; padding:1rem">Song Genre and Time Signature</div>

In [143]:
data = (df
        .groupby(['genre', 'time_signature'])['time_signature']
        .count()
        .unstack(1))

data.style.background_gradient(cmap=plt.cm.coolwarm_r)

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- Except for 'time-signature' 4, all have the maximum number of instances in the 'Dark Trap' genre. Underground Rap has the maximum number of instances in the 'time-signature' 4.
</div>

## <div style="background:#96DDD5; padding:1rem">Correlation</div>

In [144]:
corr_mat = df[numerical_cols].corr()

sns.heatmap(corr_mat,
            annot=True,
            fmt='.2f',
            cmap=plt.cm.coolwarm_r,
            mask=np.triu(corr_mat, k=0))
plt.show()

<div style="background:#fae7b5; padding:1.5rem; font-size:1.5rem">
 
- 'Acousticness' has positive correlation with 'tempo' and 'valence'.
- 'Energy' has positive correlation with 'loudness', 'instrumentalness', 'liveness', and 'time duration'.
- 'Danceability' has positive correlation with 'valence' feature.

</div>