# <center> Spotify Song Attributes | EDA and Prediction
    
![image.png](https://c.tenor.com/d_IO8M1rCD0AAAAC/listening-to-music-jerry.gif)
 

# <span style="color:Crimson;">Importing Libraries</span>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter("ignore")

# <span style="color:Crimson;">Data Description</span>

* **acousticness:** A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

* **Danceability:** Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable

* **Duration_ms:** The duration of the track in milliseconds.

* **Energy:** Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

* **Instrumentalness:** Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

* **Key:** The key the track is in. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.

* **Liveness:** Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

* **Loudness:** The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.

* **mode:** Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

* **Speechiness:** Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

* **Tempo:** The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

* **Time_signature:** An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).

* **valence:** A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

* **Target:**  "1" meaning I like it and "0" for songs I don't like

* **Song_Title:** song title

* **Artist:** artist of song

In [None]:
df = pd.read_csv("/kaggle/input/spotifyclassification/data.csv")
df.head(n=5)

In [None]:
df = df.drop('Unnamed: 0',axis=1)

we don't need unamed: 0

In [None]:
#The shape of the data
print("shape of the dataset: ",df.shape)

In [None]:
#Checking the number of unique values in each column
dict = {}
for i in list(df.columns):
    dict[i]=df[i].value_counts().shape[0]
pd.DataFrame(dict,index=['Unique count']).T

> **Observations**
* Total artists are 1343
* Total 12 keys

In [None]:
df.info()

# <span style="color:Crimson;">Summary Statistics</span>

In [None]:
df.describe()

> **Observations**

* Highest liveness is -0.30 and lowest is 0.19
* maximum enery 0.97

# <span style="color:Crimson;">checking null values</span>

In [None]:
pd.DataFrame(df.isnull().sum(),columns=['null'])

# <span style="color:Crimson;">Exploratory Data Analysis</span>

In [None]:
# Classifying data into numerical and categorical variables.
data_numerical=df[['acousticness','danceability','duration_ms','energy','instrumentalness','liveness','loudness','speechiness','tempo','valence']]
data_categorical=df[['key','mode','time_signature','target']]

In [None]:
# Skewness and kurtosis
s_k=[]
for i in data_numerical.columns:
    s_k.append([i,data_numerical[i].skew(),data_numerical[i].kurt()])
skew_kurt=pd.DataFrame(s_k,columns=['Columns','Skewness','Kurtosis'])
skew_kurt

* If the skewness is between -0.5 & 0.5, the data are nearly symmetrical.If the skewness is between -1 & -0.5 (negative skewed) or between 0.5 & 1(positive skewed), the data are slightly skewed.If the skewness is lower than -1 (negative skewed) or greater than 1 (positive skewed), the data are extremely skewed.
* Kurtosis is a statistical measure, whether the data is heavy-tailed or light-tailed in a normal distribution

> **Observations**
* acousticness,durations_ms,instrumentalness,liveness,loudness,energy,danceability and speechiness are extremely skewed

## Numerical Variable analysis

In [None]:
plt.style.use('ggplot')
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'acousticness'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'acousticness'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for acousticness',weight='bold')

plt.show()

* highly Skewed to left

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'danceability'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'danceability'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for danceability',weight='bold')

plt.show()

* Normally skewed little bit towards right

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'duration_ms'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'duration_ms'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for duration',weight='bold')

plt.show()

* highly skewed towards left

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'energy'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'energy'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for energy',weight='bold')

plt.show()

* highly skewed towards right

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'instrumentalness'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'instrumentalness'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for instrumentalness',weight='bold')

plt.show()

* highly skewed towards left

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'liveness'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'liveness'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for liveness',weight='bold')

plt.show()

* Highly towards left

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'loudness'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'loudness'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for loudness',weight='bold')

plt.show()

* highly skewed towards right

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'speechiness'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'speechiness'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for speechiness',weight='bold')

plt.show()

* Highly skewed towards left

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'tempo'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'tempo'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for tempo',weight='bold')

plt.show()

In [None]:
fig, ax = plt.subplots(figsize = (12,6))
fig.patch.set_facecolor('#f6f5f7')
ax.set_facecolor('#f6f5f5')
sns.kdeplot(df.loc[(df['target']==1),'valence'], color='r',
            shade=True, Label='Liked')
  
sns.kdeplot(df.loc[(df['target']==0),'valence'], color='b',
            shade=True, Label='Not Liked')
for i in ["top","right"]:
    ax.spines[i].set_visible(False)
plt.title('Kde Plots for valence',weight='bold')

plt.show()

In [None]:
plt.figure(figsize=(15,20))
plt.subplot(3,2,1)
sns.scatterplot(data=df,x=df['acousticness'],y=df['danceability'],hue=df['target'],palette="OrRd",style=df['target'])
plt.title('Scatterplot for acousticness vs danceability')
plt.subplot(3,2,2)
sns.scatterplot(data=df,x=df['duration_ms'],y=df['energy'],hue=df['target'],palette="OrRd",style=df['target'])
plt.title('Scatterplot for duration_ms vs enery')
plt.subplot(3,2,3)
sns.scatterplot(data=df,x=df['instrumentalness'],y=df['liveness'],hue=df['target'],palette="OrRd",style=df['target'])
plt.title('Scatterplot for instrumentalness vs liveness')
plt.subplot(3,2,4)
sns.scatterplot(data=df,x=df['loudness'],y=df['speechiness'],hue=df['target'],palette="OrRd",style=df['target'])
plt.title('Scatterplot for loudness vs speechiness')
plt.subplot(3,2,5)
sns.scatterplot(data=df,x=df['tempo'],y=df['valence'],hue=df['target'],palette="OrRd",style=df['target'])
plt.title('Scatterplot for tempo vs valence')

plt.show()

> **Observations:**


* There are few outliers in acousticness vs danceability and loudness vs speechiness.

* The acousticness vs danceability and duration_ms vs enery group is heavily distributed between 0-0.2 and 0.2-0.4.

## Correlation plot for numerical variables

In [None]:
plt.figure(figsize=(10,6))
sns.heatmap(data_numerical.corr(),annot=True,cmap='OrRd')
plt.show()

* energy and loudness have high corelation among all

In [None]:
fig=plt.figure(figsize=(20,15),dpi=100)
sns.pairplot(data=df,hue='target',size=2,palette='OrRd')
plt.show()

## Univariate Analysis of Categorical Variables

In [None]:
fig=plt.figure(figsize=(20,23))
background_color = '#f6f5f7'
fig.patch.set_facecolor(background_color) 
for indx,val in enumerate(data_categorical.columns):
    ax=plt.subplot(4,2,indx+1)
    ax.set_facecolor(background_color)
    ax.set_title(val,fontweight='bold',fontfamily='serif')
    for i in ['top','right']:
        ax.spines[i].set_visible(False)
    ax.grid(linestyle=':',axis='y')
    sns.countplot(data_categorical[val],palette='OrRd')

> **Observations:**
* 2 key is more than any other in our data
* The number of songs with time_signature 4.0 are to high
* target is distributed equally
* the number of songs of mode is more in major more than minor

## Analysing Categorical Variables with target

In [None]:
data_cat=df[['key','mode','time_signature']]
fig=plt.figure(figsize=(20,23))
background_color = '#f6f5f7'
fig.patch.set_facecolor(background_color) 
for indx,val in enumerate(data_cat.columns):
    ax=plt.subplot(4,2,indx+1)
    ax.set_facecolor(background_color)
    ax.set_title(val,fontweight='bold',fontfamily='serif')
    for i in ['top','right']:
        ax.spines[i].set_visible(False)
    ax.grid(linestyle=':',axis='y')
    sns.countplot(data_cat[val],palette='OrRd_r',hue=df['target'])

In [None]:
plt.figure(figsize=(12,8))
sns.heatmap(df.corr(),annot=True,cmap='OrRd')
plt.show()

> **Observations:**
* No strong correlation between our features.

In [None]:
df.head(n=3)

In [None]:
df['artist'].value_counts().plot()
plt.show()

In [None]:
plt.figure(figsize=(15,8))
sns.boxplot(data=df ,orient="h",color='crimson')
plt.show()

In [None]:
from sklearn.preprocessing import LabelEncoder
cols = ['song_title','artist']
df[cols] = df[cols].apply(LabelEncoder().fit_transform)
df.head(n=5)

In [None]:
from scipy import stats
zscore = np.abs(stats.zscore(df))
print(zscore)

In [None]:
threshold = 3
print(np.where(zscore > 3))

In [None]:
df = df[(zscore<3).all(axis=1)]

In [None]:
x = df.drop('target',axis=1)
y = df['target']

## Splitting the data into training and testing sets

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=0)

In [None]:
# Standardizing our training and testing data.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# <span style="color:Crimson;">Training models</span>

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score,confusion_matrix

**KNeighborsClassifier**

In [None]:
log_reg = LogisticRegression()
log_reg.fit(x_train,y_train)

log_acc=accuracy_score(y_test,log_reg.predict(x_test))

print("Train Set Accuracy:"+str(accuracy_score(y_train,log_reg.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,log_reg.predict(x_test))*100))

In [None]:
plt.figure(figsize=(6,4))
df_ = pd.DataFrame(confusion_matrix(y_test, log_reg.predict(x_test)), range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_, annot=True,annot_kws={"size": 16}, fmt='g')
plt.xlabel('Predicted Class')
plt.ylabel('Original Class')
plt.show()

In [None]:
y_pred= log_reg.predict(x_test).ravel()

from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred)

from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(8,6))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Logistic (area = {:.3f})'.format(auc_keras))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

In [None]:
d_tree = DecisionTreeClassifier()
d_tree.fit(x_train,y_train)

d_acc=accuracy_score(y_test,d_tree.predict(x_test))

print("Train Set Accuracy:"+str(accuracy_score(y_train,d_tree.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,d_tree.predict(x_test))*100))

In [None]:
plt.figure(figsize=(6,4))
df_ = pd.DataFrame(confusion_matrix(y_test, d_tree.predict(x_test)), range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_, annot=True,annot_kws={"size": 16}, fmt='g')
plt.xlabel('Predicted Class')
plt.ylabel('Original Class')
plt.show()

In [None]:
y_pred= d_tree.predict(x_test).ravel()

from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred)

from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(8,6))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Decision (area = {:.3f})'.format(auc_keras))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

In [None]:
r_for = RandomForestClassifier()
r_for.fit(x_train,y_train)

r_acc=accuracy_score(y_test,r_for.predict(x_test))

print("Train Set Accuracy:"+str(accuracy_score(y_train,r_for.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,r_for.predict(x_test))*100))

In [None]:
plt.figure(figsize=(6,4))
df_ = pd.DataFrame(confusion_matrix(y_test, r_for.predict(x_test)), range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_, annot=True,annot_kws={"size": 16}, fmt='g')
plt.xlabel('Predicted Class')
plt.ylabel('Original Class')
plt.show()

In [None]:
y_pred= r_for.predict(x_test).ravel()

from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred)

from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(8,6))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='Random forest (area = {:.3f})'.format(auc_keras))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

In [None]:
k_nei = KNeighborsClassifier()
k_nei.fit(x_train,y_train)

k_acc = accuracy_score(y_test,k_nei.predict(x_test))

print("Train set Accuracy:"+str(accuracy_score(y_train,k_nei.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,k_nei.predict(x_test))*100))

In [None]:
plt.figure(figsize=(6,4))
df_ = pd.DataFrame(confusion_matrix(y_test, k_nei.predict(x_test)), range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_, annot=True,annot_kws={"size": 16}, fmt='g')
plt.xlabel('Predicted Class')
plt.ylabel('Original Class')
plt.show()

In [None]:
y_pred= k_nei.predict(x_test).ravel()

from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred)

from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(8,6))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='K_nei (area = {:.3f})'.format(auc_keras))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

In [None]:
s_vec = SVC()
s_vec.fit(x_train,y_train)

s_acc = accuracy_score(y_test,s_vec.predict(x_test))

print("Train set Accuracy:"+str(accuracy_score(y_train,s_vec.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,s_vec.predict(x_test))*100))

In [None]:
plt.figure(figsize=(6,4))
df_ = pd.DataFrame(confusion_matrix(y_test, s_vec.predict(x_test)), range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_, annot=True,annot_kws={"size": 16}, fmt='g')
plt.xlabel('Predicted Class')
plt.ylabel('Original Class')
plt.show()

In [None]:
y_pred= s_vec.predict(x_test).ravel()

from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred)

from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(8,6))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='SVC (area = {:.3f})'.format(auc_keras))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

In [None]:
g_clf = GaussianNB()
g_clf.fit(x_train,y_train)

g_acc = accuracy_score(y_test,g_clf.predict(x_test))

print("Train set Accuracy:"+str(accuracy_score(y_train,g_clf.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,g_clf.predict(x_test))*100))

In [None]:
plt.figure(figsize=(6,4))
df_ = pd.DataFrame(confusion_matrix(y_test, g_clf.predict(x_test)), range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_, annot=True,annot_kws={"size": 16}, fmt='g')
plt.xlabel('Predicted Class')
plt.ylabel('Original Class')
plt.show()

In [None]:
y_pred= g_clf.predict(x_test).ravel()

from sklearn.metrics import roc_curve
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_test, y_pred)

from sklearn.metrics import auc
auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(8,6))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='gaussian (area = {:.3f})'.format(auc_keras))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

In [None]:
x_clf = XGBClassifier()
x_clf.fit(x_train,y_train)

x_acc = accuracy_score(y_test,x_clf.predict(x_test))

print("Train set Accuracy:"+str(accuracy_score(y_train,x_clf.predict(x_train))*100))
print("Train set Accuracy:"+str(accuracy_score(y_test,x_clf.predict(x_test))*100))

In [None]:
plt.figure(figsize=(6,4))
df_ = pd.DataFrame(confusion_matrix(y_test, x_clf.predict(x_test)), range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_, annot=True,annot_kws={"size": 16}, fmt='g')
plt.xlabel('Predicted Class')
plt.ylabel('Original Class')
plt.show()

In [None]:
models = pd.DataFrame({
    'Model': ['Logistic','KNN', 'SVC',  'Decision Tree ',
             'Random Forest',  'Gaussian','xgboost'],
    'Score': [ log_acc,k_acc, s_acc, d_acc, r_acc, g_acc,x_acc]
})

models.sort_values(by = 'Score', ascending = False)

In [None]:
plt.figure(figsize=(15,6))
sns.barplot(x='Model',y='Score',data=models)
plt.show()

https://open.spotify.com/playlist/1lXkBYLNc7DwttQmiwZwrz?si=eZNJni-dRnaUJn0hF0dUFQ 

😂😍This is my best playlist on spotify