# Spotify Music - EDA 
In continuation of previous kernel about spotify music data extraction -Part 1 
https://www.kaggle.com/pavansanagapati/spotify-music-api-data-extraction-part1

We now will use the data extracted from Spotify to perform two steps as follows

#### 1. Explore the Audio Features and analyze
#### 2. Build a Machine Learning Model 

## 1. Explore the Audio Features and analyze

In [None]:
#Import Libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
from sklearn.model_selection import cross_val_predict
from sklearn import metrics
from sklearn import svm
%matplotlib inline
import pandas_profiling 

#### Let us first analyse at high level the data in the spotify music dataframe that we build by accessing the spotify data as shown in part 1 of this kernel https://www.kaggle.com/pavansanagapati/spotify-music-api-data-extraction-part1.

In [None]:
spotify_music_df = pd.read_csv('../input/spotify-music-data/spotify_music.csv')
spotify_music_df.profile_report()

In [None]:
spotify_music_df.shape

In [None]:
spotify_music_df.head()

In [None]:
spotify_music_df.columns

Let us now add few more dataframes available datasets in kaggle for our deeper analysis

In [None]:
spotify_music_other_df = pd.read_csv('../input/spotifyclassification/data.csv')
spotify_music_other_df.shape

In [None]:
spotify_music_other_df.head()

In [None]:
spotify_music_other_df.columns

#### **Important Note**: Considered only those columns which are related to audio features as follows :

**Acousticness :** A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

**Danceability** : Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

**Energy** : Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

**Instrumentalness**: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

**Liveness**: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

**Loudness**: he overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.

**Speechiness**: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

**Valence**: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

**Tempo**: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

In [None]:
# Create data frame with features
def features(df,who):
    if who == 1:
         features = df.loc[: ,['acousticness', 'danceability','energy','instrumentalness','liveness', 'loudness','speechiness', 'tempo','valence']]         
    elif who == 0 :   
          features = df.loc[:,['acousticness', 'danceability', 'energy', 'instrumentalness','liveness', 'loudness', 'speechiness', 'tempo', 'valence','popularity']]           
    else:
        return 'Error'
    return features

In [None]:
spotify_music_other_audio_features_df = features(spotify_music_other_df, 1)
spotify_music_other_audio_features_df.head()

In [None]:
spotify_music_audio_features_df = features(spotify_music_df,0)
spotify_music_audio_features_df.head()

In [None]:
# Let us find the no of records for both datasets with respect to artist
spotify_music_other_df.artist.count()

In [None]:
spotify_music_df.album.count()

Now let create a dictionary in which the keys are the artists of both dataframes and the values are the total of songs for each singer or group.

In [None]:
spotify_music_df['album'].value_counts().head(50).plot(kind='barh')

In [None]:
spotify_music_other_df['artist'].value_counts().head(100).plot(kind='barh', figsize=(20,20))

### Visualise the data:
We will plot a Bar chart and a Radar Chart showing the means of the features.

In [None]:
# Number of features
N = len(spotify_music_audio_features_df.mean())
# Array with the number of features
ind = np.arange(N) 
width = 0.35  

#Bar plot with Micheal Jackson data
plt.barh(ind, spotify_music_audio_features_df.mean() , width, label='Spotify Music Data - Micheal Jackson', color = 'blue')
#X- label
plt.xlabel('Mean', fontsize = 12)
# Title
plt.title('Mean Values of the Audio Features for Micheal Jackson')
#Vertical ticks
plt.yticks(ind + width / 2, (list(spotify_music_audio_features_df)[:]), fontsize = 12)
#legend
plt.legend(loc='best')
# Figure size
plt.rcParams['figure.figsize'] =(8,8)
# Set style
style.use("ggplot")
plt.show()

In [None]:
# Number of features for other artists
N = len(spotify_music_other_audio_features_df.mean())
# Array with the number of features
ind = np.arange(N) 
width = 0.35  

#Bar plot with Other artists data
plt.barh(ind, spotify_music_other_audio_features_df.mean() , width, label='Spotify Music Data - Other Artists', color = 'red')
#X- label
plt.xlabel('Mean', fontsize = 12)
# Title
plt.title('Mean Values of the Audio Features for Other Artists')
#Vertical ticks
plt.yticks(ind + width / 2, (list(spotify_music_other_audio_features_df)[:]), fontsize = 12)
#legend
plt.legend(loc='best')
# Figure size
plt.rcParams['figure.figsize'] =(8,8)
# Set style
style.use("ggplot")
plt.show()

In [None]:
labels= list(spotify_music_audio_features_df)[:]
stats= spotify_music_audio_features_df.mean().tolist()


angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)

# close the plot
stats=np.concatenate((stats,[stats[0]]))

angles=np.concatenate((angles,[angles[0]]))

#Size of the figure
fig=plt.figure(figsize = (18,18))

ax = fig.add_subplot(221, polar=True)
ax.plot(angles, stats, 'o-', linewidth=2, label = "Micheal Jackson", color= 'blue')
ax.fill(angles, stats, alpha=0.25, facecolor='blue')
ax.set_thetagrids(angles * 180/np.pi, labels , fontsize = 13)


ax.set_rlabel_position(250)
plt.yticks([0.2 , 0.4 , 0.6 , 0.8  ], ["0.2",'0.4', "0.6", "0.8"], color="blue", size=12)
plt.ylim(0,1)


ax.set_title('Mean Values of the audio features for Micheal Jackson')
ax.grid(True)

plt.legend(loc='best', bbox_to_anchor=(0.1, 0.1))

In [None]:
labels= list(spotify_music_other_audio_features_df)[:]
stats= spotify_music_other_audio_features_df.mean().tolist()


angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)

# close the plot
stats=np.concatenate((stats,[stats[0]]))

angles=np.concatenate((angles,[angles[0]]))

#Size of the figure
fig=plt.figure(figsize = (18,18))

ax = fig.add_subplot(221, polar=True)
ax.plot(angles, stats, 'o-', linewidth=2, label = "Other Artists", color= 'red')
ax.fill(angles, stats, alpha=0.25, facecolor='red')
ax.set_thetagrids(angles * 180/np.pi, labels , fontsize = 13)


ax.set_rlabel_position(250)
plt.yticks([0.2 , 0.4 , 0.6 , 0.8  ], ["0.2",'0.4', "0.6", "0.8"], color="red", size=12)
plt.ylim(0,1)


ax.set_title('Mean Values of the audio features for other artists')
ax.grid(True)

plt.legend(loc='best', bbox_to_anchor=(0.1, 0.1))

The standard deviation of the audio features themselves do not give us much information ( as we can see in the plots below), we can sum them up and calculate the mean of the standard deviation of the lists.

In [None]:
plt.subplot(221)

spotify_music_audio_features_df.std().sort_values(ascending= False).plot(kind = 'bar', color = 'lightslategray')

plt.xlabel('Features', fontsize = 14)
plt.ylabel('Standard Deviation', fontsize = 14)
plt.title("Standard Deviation of Micheal Jackson Audio Features")

plt.subplot(222)

spotify_music_other_audio_features_df.std().sort_values(ascending= False).plot(kind = 'bar', color = 'mediumvioletred')

plt.xlabel('Features', fontsize = 14)
plt.ylabel('Standard Deviation', fontsize = 14)
plt.title("Standard Deviation of Other Artist Audio Features")
plt.rcParams['figure.figsize'] =(20,20)


### Correlation Between Variables

We will correlate the feature **valence** which describes the musical positiveness with **danceability** and **energy**.


#### Valence and Energy
The correlation between valence and energy shows us that there is a conglomeration of songs with high energy and a low level of valence. This means that many of my energetic songs sound more negative with feelings of sadness, anger and depression ( NF takes special place here haha). whereas when we look at the grays dots we can see that as the level of valence - positive feelings increase, the energy of the songs also increases. Although her data is split , we can identify this pattern which indicates a kind of 'linear' correlation between the variables.

In [None]:
fig, ax = plt.subplots()
style.use('seaborn')
spotify_music_audio_features_df.plot(kind='scatter',x='valence', y='energy',ax = ax ,c='red', colormap = 'Accent_r' ,title="Valence x Energy for Micheal Jackson")
ax.set_xlabel("Valence")
ax.set_ylabel("Energy")
plt.show()

In [None]:
fig, ax = plt.subplots()
style.use('seaborn')
spotify_music_other_audio_features_df.plot(kind='scatter',x='valence', y='energy',ax = ax ,c='red', colormap = 'viridis_r' ,title="Valence x Energy for other artists")
ax.set_xlabel("Valence")
ax.set_ylabel("Energy")
plt.show()

#### Valence and Danceability

In [None]:
fig,ax = plt.subplots()
spotify_music_audio_features_df.plot(kind = 'scatter', x = 'valence', y = 'danceability', c = 'red',ax = ax, colormap = 'Accent_r', title = 'Valence x Danceability for Micheal Jackson')
ax.set_xlabel("Valence")
ax.set_ylabel("Danceability")
plt.show()

In [None]:
fig,ax = plt.subplots()
spotify_music_other_audio_features_df.plot(kind = 'scatter', x = 'valence', y = 'danceability', c = 'red',ax = ax, colormap = 'Accent_r', title = 'Valence x Danceability for other artists')
ax.set_xlabel("Valence")
ax.set_ylabel("Danceability")
plt.show()

## 2. The Machine Learning Approach
I will be using different algorithms as I improve this kernel notebook to improve the model accuracy.So please keep watching this space on a frequent basis.

Removing Features
The first step is to preprocess our data set in order to have a dataframe with numerical values in all of the columns. So let's start off dropping all features which are not relevant to our model such as id, album, name, uri, popularity and track_number and separate the target from other artist dataframe. We can easily do that by building the function feature_elimination which receives a list with the features we want to drop as a parameter.

Notice that after its removal, we still have a categorical feature (artist). So, we'll have to deal with that in the second step. Also, important to mention that we have two slightly balanced classes which indicate whose list the song belongs to.

In [None]:
def feature_elimination(features_list):
    for i in features_list:
        spotify_music_other_df.drop(i, axis = 1, inplace = True)
    return ';)'

In [None]:
spotify_music_other_df.head()

In [None]:
feature_elimination(['Unnamed: 0', 'song_title', 'duration_ms', 'time_signature', 'mode', 'key'])
spotify_music_other_df.head(3)

In [None]:
#Remove target column from our data set
target = spotify_music_other_df['target']
spotify_music_other_df.drop('target', axis = 1, inplace = True)

In [None]:
target.head()

In [None]:
# Let us observe how the data is ? Is it balanced or not .Let us see.
target.value_counts()

So it is well balanced dataset

In [None]:
spotify_music_other_df.head()

#### Label Encoder
The second task is to transform all categocal data (artists names) into numeric data. Why do we have to do that? Well, the ML algorithm only accepts numerical data, hence, the reason why we have to use the class LabelEncoder to encode each artist name into a specific number. The encoding process is shown below.

In [None]:
# Import Label Encoder
from sklearn.preprocessing import LabelEncoder

# create Label Encoder instance
label_encoder = LabelEncoder()

# Set the artist labels
artist_labels = label_encoder.fit_transform(spotify_music_other_df.artist)

#Create column containing the labels
spotify_music_other_df['labels_artists'] = artist_labels

#Remove artist column as it contains categorical data
feature_elimination(['artist'])
spotify_music_other_df.sample(10)

In [None]:
spotify_music_other_df.labels_artists.value_counts()

# Music CLASSIFICATION


### Introduction
When we get started with data science, we start with simple projects like Loan Prediction problem or Big Mart Sales Prediction. These problems have structured data arranged neatly in a tabular format i.e we are spoon-fed the hardest part in data science pipeline.The datasets in real life are much more complex and unstructured format like audio/image, collect it from various sources and arrange it in a format which is ready for processing. 


I have choosen an unstructured data as this problem of urban sound classification as it represents huge under-exploited opportunity. It is closer to how we communicate and interact as humans. It also contains a lot of useful & powerful information. For example, if a person speaks; you not only get what he / she says but also what were the emotions of the person from the voice.Also the body language of the person can show you many more features about a person, because actions speak louder than words! So in short, unstructured data is complex but processing it can reap easy rewards.


#### So what is audio data really mean ? 

Lets understand this with some theory before we actually jump in the real problem and its solution.

Directly or indirectly, you are always in contact with audio. Your brain is continuously processing and understanding audio data and giving you information about the environment. A simple example can be your conversations with people which you do daily. This speech is discerned by the other person to carry on the discussions. Even when you think you are in a quiet environment, you tend to catch much more subtle sounds, like the rustling of leaves or the splatter of rain. This is the extent of your connection with audio.

So in order to catch this audio floating around us there are devices which record in computer readable format. Examples of these formats are

- wav (Waveform Audio File) format
- mp3 (MPEG-1 Audio Layer 3) format
- WMA (Windows Media Audio) format

Audio typically looks like a wave like format of data, where the amplitude of audio change with respect to time. This can be pictorial represented as follows.

![](sound.png)


Real Time Applications of Audio Processing include but not limited

- Indexing music collections according to their audio features.
- Recommending music for radio channels
- Similarity search for audio files (aka Shazam)
- Speech processing and synthesis – generating artificial voice for conversational agents 

#### Data Handling in audio domain

Audio data has a couple of preprocessing steps which have to be followed namely,

- Firstly Load the data into a machine understandable format. 
    For this, we simply take values after every specific time steps. For example; in a 2 second audio file, we extract values at half a second. This is called ***sampling of audio data***, and the rate at which it is sampled is called the ***sampling rate***.
    In this approach we have disadvantage i.e  When we sample an audio data, we require much more data points to represent the whole data and also, the sampling rate should be as high as possible.To offset this we can look at second approach.

- The second approach of representing audio data is by converting it into a different domain of data representation, namely the ***frequency domain*** which require lesser computational space is required. . 

Now let us get more idea on this in detail

![](time_freq.png)

Here, we separate one audio signal into 3 different pure signals, which can now be represented as three unique values in frequency domain.

There are a few more ways in which audio data can be represented, for example. using MFCs (Mel-Frequency cepstrums. PS: We will cover this in the later article). These are nothing but different ways to represent the data.

Now the next step is to extract features from this audio representations, so that our algorithm can work on these features and perform the task it is designed for. Here’s a visual representation of the categories of audio features that can be extracted.

![](audio-features.png)


After extracting these features, it is then sent to the machine learning model for further analysis.

Now enough theory.Lets jump into solving the Urban Sound Classifcation Problem

### Objective

The automatic classification of environmental sound is a growing research field with multiple applications to largescale, content-based multimedia indexing and retrieval. In particular, the sonic analysis of urban environments is the subject of increased interest, partly enabled by multimedia sensor networks, as well as by large quantities of online multimedia content depicting urban scenes.

However, while there is a large body of research in related areas such as speech, music and bioacoustics, work on the analysis of urban acoustic environments is relatively scarce.Furthermore, when existent, it mostly focuses on the classification of auditory scene type, e.g. street, park, as opposed to the identification of sound sources in those scenes, e.g.car horn, engine idling, bird tweet. 



There are primarily two major challenges with urban sound research namely

- Lack of labeled audio data. Previous work has focused on audio from carefully produced movies or television tracks from specific environments such as elevators or office spaces and on commercial or proprietary datasets . The large effort involved in manually annotating real-world data means datasets based on field recordings tend to be relatively small (e.g. the event detection dataset of the IEEE AASP Challenge consists of 24 recordings per each of 17 classes).

- Lack of common vocabulary when working on urban sounds.This means the classification of sounds into semantic groups may vary from study to study, making it hard to compare results

so the objective of this notebook is to address the above two mentioned challenges.


### Data

The dataset is called UrbanSound and contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: -
The dataset contains 8732 sound excerpts (<=4s) of urban sounds from 10 classes, namely:

- Air Conditioner
- Car Horn
- Children Playing
- Dog bark
- Drilling
- Engine Idling
- Gun Shot
- Jackhammer
- Siren
- Street Music

The attributes of data are as follows:

ID – Unique ID of sound excerpt

Class – type of sound

The evaluation metric for this problem is "Accuracy Score"

#### Source

- Source of the dataset : https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU
- Source of research document : https://serv.cusp.nyu.edu/projects/urbansounddataset/salamon_urbansound_acmmm14.pdf


Now let me look at a glance a sample sound excerpt from the dataset

In [None]:
import IPython.display as ipd
ipd.Audio('../input/ultrasound-dataset/train/Train/2022.wav')

To load the audio files into the jupyter notebook ass a numpy array I have used 'librosa' library in python by using the pip command as follows

 ***pip install librosa***

In [None]:
!pip install librosa

In [None]:
import os
import pandas as pd
import librosa
import librosa.display
import glob
%pylab inline
from sklearn.preprocessing import LabelEncoder
import numpy as np
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import Adam
from sklearn import metrics 

Now let us load a sample audio file using librosa

In [None]:
data,sampling_rate = librosa.load('../input/ultrasound-dataset/train/Train/2010.wav')
plt.figure(figsize=(12,4))
librosa.display.waveplot(data,sr=sampling_rate)

Now let us visually inspect data and see if we can find patterns in the data

In [None]:
train = pd.read_csv('../input/ultrasound-dataset/train/train.csv')
i = random.choice(train.index)

audio_name = train.ID[i]
path = os.path.join('../input/ultrasound-dataset/train/', 'Train', str(audio_name) + '.wav')

print('Class: ', train.Class[i])
x, sr = librosa.load('../input/ultrasound-dataset/train/Train/' + str(train.ID[i]) + '.wav')

plt.figure(figsize=(12, 4))
librosa.display.waveplot(x, sr=sr)

As you can see the air conditioner class is shown as random class and we can see its pattern.Let us again see another class by using the same code to randomly select another class and observe its pattern

In [None]:
i = random.choice(train.index)
audio_name = train.ID[i]
path = os.path.join('../input/ultrasound-dataset/train/', 'Train', str(audio_name) + '.wav')
print('Class: ', train.Class[i])
x, sr = librosa.load('../input/ultrasound-dataset/train/Train/' + str(train.ID[i]) + '.wav')
plt.figure(figsize=(12, 4))
librosa.display.waveplot(x, sr=sr)

Let us see the class distributions for this problem

In [None]:
print(train.Class.value_counts(normalize=True)) #distribution of data

It appears that jackhammer has more count than any other classes

Now let us see how we can leverage the concepts we learned above to solve the problem. We will follow these steps to solve the problem.

- Step 1: Load audio files & Extract features
- Step 2: Convert the data to pass it in our deep learning model
- Step 3: Run a deep learning model and get results

#### Step 1: Load audio files & Extract features

Let us create a function to load audio files and extract features

In [None]:
def parser(row):
    file_name = os.path.join(os.path.abspath('../input/ultrasound-dataset/train/'),'Train',str(row.ID)+'.wav')
    try:
        # here kaiser_fast is a technique used for faster extraction
        X,sample_rate = librosa.load(file_name,res_type='kaiser_fast')
        # we extract mfcc feature from data
        mfccs = np.mean(librosa.feature.mfcc(y=X,sr=sample_rate,n_mfcc=40).T,axis=0)
    except Exception as e:
        print('Error encountered while parsing the file:',file_name)
        
        return 'None', 'None'
    
    feature = mfccs
    
    label = row.Class
    #print(file_name)
    print(feature)
    print(label)
    return pd.Series([feature, label],index=['feature','label'])

In [None]:
temp = train.apply(parser,axis =1)
temp.columns = ['feature', 'label']

#### Step 2: Convert the data to pass it in our deep learning model


In [None]:
X = np.array(temp.feature.tolist())
y = np.array(temp.label.tolist())

label_encoder = LabelEncoder()
print(temp.label.dtype)

In [None]:
y = np_utils.to_categorical(label_encoder.fit_transform(y))   

## If you like this kernel greatly appreciate to upvote.