# Classification Metrics

## Introduction
In this section of the notebook, we will learn about the various methods in which our models can be judged for their strength, these are called Metrics. <br>
Each Metric paints a different story of the model. Having the knowledge of multiple metrics allows a Data Scientist to have a larger understanding of how to measure accuracy in unique scenarios. <br>

One of the most useful tools for Metric estimation is the <b>Confusion Matrix</b>.<br>
In order to understand its importance we need to observe in action. Therefore, we will begin by running a classification on a dataset. 
We will be using the Breast Cancer dataset which contains 13 features to determine if the datapoint results in cancer (1) or not (0). The process below is as follows:-
<ol>
    <li>Block 1: We will be just import the necessary libraries to run a simple classification</li>
    <li>Block 2:</li> 
    <ol>
        <li>We are simply spliting the dataset to create a training set and a test set.</li>
        <li>Then we instantiate a classifier and then run a classification</li>
        <li>In the end we acqure the predicted values and we use the results to make a confusion matrix</li>
    </ol>
    
</ol>

```Python
################### Block 1 - Importing Libraries #############################################
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd

################### Block 2 - Running a Classification #############################################
X = pd.DataFrame(load_breast_cancer()['data']).iloc[:, :2].values
y = load_breast_cancer()['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
cls = RandomForestClassifier(n_estimators=90)
cls = cls.fit(X_train,  y_train.reshape((-1,)))
y_pred = cls.predict(X_test)
mat = confusion_matrix(y_test, y_pred)
```

In [1]:
# No Exercise Here


### No Solution Here

## Confusion Matrix
A confusion Matrix is a data structure that helps users with estimating a model's performance. <br>It requires 2 objects <br>(1) A list of true values of the tested data points and<br> (2) A list of the predicted values of your model. <br>

<img src='Images/confusion_matrix.png' width="350" height="200">

The above diagram depicts a confusion matrix for a dataset that has 2 classes. <br>
From the diagram it can be observed that the matrix splits the number of correctly predicted values from the incorrect ones. <br>
Lets begin from the True Values:-<br>
We have <br>
<ol>
    <li>Negative</li>
        With in True values (<b>Values which are established to be correct</b>) we have the values whose class is represented by being <b>Negative</b> (0)
    <li>Positive</li>
    With in True values we have the values whose class is represented by being <b>Positive</b> (1)
</ol>

and similarly in the Predicted value section, we see we have the negatively and positively predicted values <b>from the model</b>


<ul>
    <li><u>True Negative</u></li>
<small>
    These Values are <i>correctly</i> predicted values which are Negative
</small>

<li><u>False Negative</u></li>
<small>
    These Values are <i>incorrectly</i> predicted values which are supposed to be Positives but were classified as Negatives
</small>

<li><u>True Positive</u></li>
<small>
    These Values are <i>correctly</i> predicted values which are Positive
</small>

<li><u>False Positive</u></li>
<small>
    These Values are <i>incorrectly</i> predicted values which are supposed to be Negativses but were classifies as Positives
</small>
</ul>


We can access the Confusion Matrix method from sklearn.metrics 

```Python
print('Confusion Matrix\n', mat)
```
<img src="Images/mat.PNG" width="150" height="150">

<h4>The number of True Negatives: 57</h4>

```Python
TN = mat[0][0]
```

<h4>The number of True Positives: 111</h4>

```Python
TP = mat[0][1]
```

<h4>The number of False Negatives: 10</h4>

```Python
FN = mat[1][0]
```

<h4>The number of False Positives: 10</h4>

```Python
TP = mat[1][1]
```

In [2]:
# No Exercises Here

### No Solution Here

## Model Evaluations

Since, now we have access to a confusion matrix, we can now calculate different metrics based on the data we created previously.<br>
We will be using the example matrix mat, we will be using it. Metrics that can be calculated with a confusion matrix: 
<br>
<ol>
<li>Accuracy Score:-</li> Gives the ratio of correctly predicted values against the the whole set of values<br>
Score: (TP+TN)/(TP+TN+FP+FN)

```Python
accuracy_score = (mat[0][0] + mat[1][1])/(mat[0][0] + mat[0][1] + mat[1][0] + mat[1][1])
```
<li>Recall Score:-</li> Gives the ratio of Positively correctly predicted values against all the correctly predicted values<br>
Score:  TP/(TP+FN)

```Python
recall_score = mat[1][1]/(mat[0][0]+mat[1][1])
```
<li>Precision Score:-</li> Gives the ratio of Positively correctly predicted values against all the Predicted values<br>
Score:  TP/(TP+FP)

```Python
precision_score = mat[1][1]/(mat[1][1]+mat[1][0])
```
<li>F1 Score:-</li> It is a good metric to be relied on when the parts of the confusion_mtrix too one-sided <br>
Score: (2* precision_score * recall)/( precision_score + recall_score )


```Python
f1_score = (2* precision_score * recall_score)/( precision_score + recall_score )
```
</ol>

The above discussed scores are all ratios to get the percentage values all we need to do is multiply them by 100

<hr>
Exercise: For the following established y_true and y_pred values calculate the
accuracy_score, recall_score, precision_score, f1_score. <br>
Display your values within 2 decimal places, i.e. if the answer is 0.9234 <br>
just put 0.92



In [3]:
# Write your solution here
import numpy as np
from sklearn.metrics import confusion_matrix
np.random.seed(1)
y_true = np.random.randint(2, size=20)
y_pred = np.random.randint(2, size=20)


### Solution
```Python
import numpy as np
from sklearn.metrics import confusion_matrix
np.random.seed(1)
y_true = np.random.randint(2, size=20)
y_pred = np.random.randint(2, size=20)
mat = confusion_matrix(y_true=y_true, y_pred=y_pred)
accuracy_score = (mat[0][0] + mat[1][1])/(mat[0][0] + mat[0][1] + mat[1][0] + mat[1][1])
recall_score = mat[1][1]/(mat[0][0]+mat[1][1])
precision_score = mat[1][1]/(mat[1][1]+mat[1][0])
f1_score = (2* precision_score * recall_score)/( precision_score + recall_score )
print('Accuracy Score:  {0:.2f}'.format(accuracy_score))
print('Recall Score:    {0:.2f}'.format(recall_score))
print('Precision Score: {0:.2f}'.format(precision_score))
print('F1 Score:        {0:.2f}'.format(f1_score))
```


## ROC Curve
