# Spotify Song Database Analysis

The dataset contains songs from 1921-2020 and are grouped by artist, year, or genre in the data section.
The "data.csv" file contains more than 160,000 songs.
Primary:
- id (Id of track generated by Spotify)
Numerical:
- acousticness (Ranges from 0 to 1)
- danceability (Ranges from 0 to 1)
- energy (Ranges from 0 to 1)
- duration_ms (Integer typically ranging from 200k to 300k)
- instrumentalness (Ranges from 0 to 1)
- valence (Ranges from 0 to 1)
- popularity (Ranges from 0 to 100)
- tempo (Float typically ranging from 50 to 150)
- liveness (Ranges from 0 to 1)
- loudness (Float typically ranging from -60 to 0)
- speechiness (Ranges from 0 to 1)
- year (Ranges from 1921 to 2020)

In [None]:
project_name = "Spotify-analysis"

In [None]:
!pip install jovian --upgrade -q

In [None]:
import jovian

In [None]:
jovian.commit(project="Spotify-analysis")

In [None]:
#Installing the upgraded version of all libraries
!pip install numpy pandas matplotlib seaborn --upgrade --quiet

In [None]:
## here we are importing all the libraries we are going to use in this notebook

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
!pip install jovian --upgrade -q

In [None]:
import jovian

In [None]:
jovian.commit(project=project_name)

## Data Preparation and Cleaning

As the data set contain vast ammount of information , we are going to import the libraries we will use in the notebook and read the data, handle the null values and understand the data so that we can analyze the data further.

> Instructions (delete this cell):
>
> - Load the dataset into a data frame using Pandas
> - Explore the number of rows & columns, ranges of values etc.
> - Handle missing, incorrect and invalid data
> - Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)

In [None]:
#Reading the matches.csv into matches_df dataframe
spotify_df= pd.read_csv('data.csv')

In [None]:
spotify_df

In [None]:
# Information about the spotify dataframe
spotify_df.info()

In [None]:
#Finding the shape of the spotify dataframe
spotify_df.shape

In [None]:
#Finding the columns of spotify dataframe
spotify_df.columns

In [None]:
spotify_df[['artists','name','year']].head(15)

In [None]:
spotify_df[['artists','name','year']].tail(15)

In [None]:
## let find the unique values in the data frame,it ignores the nan values

spotify_df.nunique()

In [None]:
## Check for Null values

spotify_df.isnull().sum()

In [None]:
## this is the visual representation of above

sns.heatmap(spotify_df.isnull())

In [None]:
import jovian

In [None]:
jovian.commit()

In [None]:
## We observed that there are null values in the columns director,cast,country,date added and rating.
##Lets deal with these and create a copy so that the original is intact

sf_df=spotify_df.copy()

In [None]:
sf_df

In [None]:
sf_df.drop(['danceability','energy','instrumentalness','id','loudness','liveness'],axis=1,inplace=True)

sf_df

In [None]:
jovian.commit(project='spotify-analysis', environment=None)

## Exploratory Analysis and Visualization

In this section we ask interesting questions to analyze the data and draw some conclusions based on the analysis.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Q1:How many songs composed by each artists ? And which artist has composed maximum songs?

In [None]:
##value_count() Return a Series containing counts of unique values.

sf_df['artists'].value_counts()

Ernest Hemmingway has composed most songs in spotify.

Q2:Which are the songs with hishest tempo of the songs in the data frame.

In [None]:
## lets get the songs with heighest tempo , we use groupby() ,count() and sort_values() to get the result
## Groupby() This can be used to group large amounts of data and compute operations on these groups.
## count() counts the number of values in each column.
## sort_values() Sort by the values along either axis

toptempo_songs = sf_df.groupby('artists').count().sort_values('tempo',ascending=False)

toptempo_songs.reset_index(inplace=True)

toptempo_songs

In [None]:
plt.figure(figsize=(15, 15))

sns.scatterplot(toptempo_songs.tempo,toptempo_songs.artists.head(15));

Q3 What are the most common popularity of the songs in the dataset?

In [None]:
## below is the depiction of the most common popularity which is TV_MA in this case

sf_df_popularity = sf_df['popularity'].value_counts()
sf_df_popularity = pd.DataFrame(sf_df_popularity).reset_index()
sf_df_popularity.columns = ['popularity','name']
sns.barplot(y = 'popularity',x = 'name', data=sf_df_popularity)
sf_df_popularity

Q4:What are the top 10 populated songs ?

In [None]:
top_listed=sf_df["name"].value_counts()
top_listed.head(10)

In [None]:
sf_df.hist(figsize=(20, 20))
plt.show()

Let us save and upload our work to Jovian before continuing

In [None]:
plt.figure(figsize=(16, 4))
sns.distplot(spotify_df["liveness"])

It's time to find out which is the popular song and the artist also.

In [None]:
plt.figure(figsize=(16, 4))
sns.set(style="whitegrid")
x = spotify_df.groupby("name")["popularity"].mean().sort_values(ascending=False).head(20)
axis = sns.barplot(x.index, x)
axis.set_title('Top Tracks with Popularity')
axis.set_ylabel('Popularity')
axis.set_xlabel('Tracks')
plt.xticks(rotation = 90)

In [None]:
plt.figure(figsize=(16, 4))
sns.set(style="whitegrid")
x = spotify_df.groupby("artists")["popularity"].sum().sort_values(ascending=False).head(20)
ax = sns.barplot(x.index, x)
ax.set_title('Top Artists with Popularity')
ax.set_ylabel('Popularity')
ax.set_xlabel('Artists')
plt.xticks(rotation = 90)

In [None]:
# Time analysis 
plt.figure(figsize=(16, 10))
sns.set(style="whitegrid")
x = spotify_df.groupby("year")["id"].count()
axis = sns.lineplot(x.index,x)
ax.set_title('Count of Tracks added')
ax.set_ylabel('Count')
ax.set_xlabel('Year')

In [None]:
import jovian

In [None]:
jovian.commit()

In [None]:
import jovian

In [None]:
jovian.commit()

## Inferences and Conclusion

1. In this note book we have tried to analyze various trends on songs in spotify.
2. While working on this project, I searched lot of information regarding pandas and plots.
3. After the analysis we concluded that spotify has huge collection of song database from various category spanning from 1921 to 2020.
4. While doing this project i have realized that there is lot more to learn, I'm excited to move forward.

In [None]:
import jovian

In [None]:
jovian.commit()

## References and Future Work

1. https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks(link to the dataset)
2. Numerical computing with Numpy: https://jovian.ml/aakashns/python-numerical-computing-with-numpy
3. Analyzing tabular data with Pandas: https://jovian.ml/aakashns/python-pandas-data-analysis
4. Matplotlib & Seaborn tutorial: https://jovian.ml/aakashns/python-matplotlib-data-visualization
5. Pandas user guide: https://pandas.pydata.org/docs/user_guide/index.html
6. Matplotlib user guide: https://matplotlib.org/3.3.1/users/index.html
7. Seaborn user guide & tutorial: https://seaborn.pydata.org/tutorial.html

In [None]:
import jovian

In [None]:
jovian.commit()