# 🧐Evaluating Classification ML Models

Need to choose between different Models

**OR**

Choose between different features.

**OR even**

Choose between different Tuning Parameters

**Then**

We need to <font color="blue">***evaluate model***</font>

So, that we could find out that the model which we wanted to build is completed or it's need more work upon it.

In, this Kerne/Notebook I have discussed few most popular tech. used to evaluate your model.

So, sit back and have a look over it and find out what you need to evaluate your's model.

Before going forward :
<font color="Red">Please Upvote ( It motivates me )</font>

In this current Notebook I have discussed these following evaluation :
<font color="Purple">
1. Classification Accuracy
2. Confusion Matrix
3. F1 Score
4. Precision And Recall
5. ROC Curve
3. Classication Report </font>

## Preprocesing and Ploting

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict, StratifiedKFold
sns.set()

In [None]:
# Reading the Dataset
clas_data = pd.read_csv('../input/health-care-data-set-on-heart-attack-possibility/heart.csv')

### Let's first find out details of our dataset

Looking it's description, info, a head look gives us a rough idea of our data.

In [None]:
def data_info(data):
    print('\t\t Data Info:')
    print(clas_data.info())
    print('\n\n\t\t Data Head:')
    print(clas_data.head())
    print('\n\n\t\t Data Describe:')
    print(clas_data.describe())
    print('\n\nData Shape: ',clas_data.shape)
    print('\n\n\t\t Null Values')
    print(clas_data.isna().sum())

In [None]:
data_info(clas_data)

### Now, let's have some visual information of our dataset

First, let us find out about our **target** column.

In [None]:
var = 'target'
sns.countplot(clas_data[var])

followed with **gender/sex** column

In [None]:
var = 'sex'
sns.countplot(clas_data[var])

In [None]:
var = 'age'
f, ax = plt.subplots(figsize=(15,8))
sns.distplot(clas_data[var])
plt.xlim([0,80])

Cleary most of dataset of person more than 40 year's of age. And it rises max in about 60.

Below, I ploted avout some more columns

In [None]:
var = 'chol'
f, ax = plt.subplots(figsize=(15,8))
sns.distplot(clas_data[var])
plt.xlim([0,600])

In [None]:
var = 'trestbps'
f, ax = plt.subplots(figsize=(15,8))
sns.distplot(clas_data[var])
plt.xlim([0,250])

Now, comes the **heat map**

### 🔥Heat Map
A heat map (or heatmap) is a data visualization technique that shows magnitude of a phenomenon as color in two dimensions. The variation in color may be by hue or intensity, giving obvious visual cues to the reader about how the phenomenon is clustered or varies over space.

In [None]:
plt.figure(figsize=(18,18))
sns.heatmap(clas_data.corr(),annot=True,cmap='RdYlGn')

plt.show()

As given in data description, these 14 features are mostly used in all model. 

In [None]:
clas_data.columns

At here I have just divided the dataset first to X and y.

Followed by spliting for **test and train sets**

In [None]:
X = clas_data.iloc[:,:-1]
y = clas_data.iloc[:,-1]
print("\n\n\t\tIndependent features of Dataset: ")
print(X.head())
print("\n\n\t\tDependent features of Dataset: ")
print(y.head())

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 25)

## 👨🏼‍💻 Model Training

I used most basic Logistic Regression here (because our focus is on Evaluation rather than a well-trained model)

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
y_pred = log_reg.predict(X_test)

## 🧑🏼‍🍳Evaluation of model

From here we are followed by the different ways we could evaluate our classification model.

### Model Accuracy
In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

In [None]:
from sklearn import metrics
print("Classification Model Accuracy is: ",metrics.accuracy_score(y_test, y_pred))

### 🤯Confusion Matrix
Most used in classification model

A confusion matrix is a summary of prediction results on a classification problem.

The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix.

The confusion matrix shows the ways in which your classification model
is confused when it makes predictions.

It gives you insight not only into the errors being made by your classifier but more importantly the types of errors that are being made.

Let's now define the most basic terms, which are whole numbers (not rates):
* true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
* true negatives (TN): We predicted no, and they don't have the disease.
* false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")
* false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")

In [None]:
! pip install -q scikit-plot

In [None]:
import scikitplot as skplt
skplt.metrics.plot_confusion_matrix(y_test,y_pred)

### Precision and Recall
Precision attempts to answer the following question:

*What proportion of positive identifications was actually correct?*

And,Recall attempts to answer the following question:

*What proportion of actual positives was identified correctly?*

In [None]:
from sklearn.metrics import average_precision_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import plot_precision_recall_curve
average_precision = average_precision_score(y_test, y_pred)
print('Average precision-recall score: {0:0.2f}'.format(average_precision))
disp = plot_precision_recall_curve(log_reg, X_test, y_test)
disp.ax_.set_title('2-class Precision-Recall curve: AP={0:0.2f}'.format(average_precision))

### 😵F1 score
The F1 score is the harmonic mean of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall). The F1 score is also known as the Sørensen–Dice coefficient or Dice similarity coefficient (DSC).[](http://)

In [None]:
from sklearn.metrics import f1_score
print("Macro F1 Score: ",f1_score(y_test, y_pred, average='macro'))
print("Micro F1 Score: ",f1_score(y_test, y_pred, average='micro'))
print("Weighted F1 Score: ",f1_score(y_test, y_pred, average='weighted'))

### ROC Curve
ROC can be broken down into sensitivity and specificity. Choosing the best model is sort of a balance between predicting 1's accurately or 0's accurately. In other words sensitivity and specificity.

In [None]:
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred)

plt.plot(fpr, tpr)
plt.title('ROC curve for Heart Attack classifier')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)

### Classification Report 
At the I am using classification Report by **sckit-learn** which basically report all important evaluation criteria.

Ref : [https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html](http://)

I wouls suggest do read this documentation.

In [None]:
from sklearn.metrics import classification_report
target_names = ['class 0', 'class 1']
print(classification_report(y_test, y_pred, target_names=target_names))

And there are many more 

Do check out [https://scikit-learn.org/stable/modules/model_evaluation.html](http://)

So, Finally this NoteBook End here 🤵🏻

Before going a humble request if you liked the notebook the 

<font color="Red">Please Upvote ( It motivates me )</font>

<font color="Green">Do check my other notebooks: </font>
https://www.kaggle.com/iabhishekmaurya/used-car-price-prediction
https://www.kaggle.com/iabhishekmaurya/applied-machine-learning/notebook

And Stay Tuned... I WILL ALSO DO THE SAME FOR REGRESSION MODELS

Till then Happy Coding
# 🤝🤝🤝🤝🤝