# <center>Performance Metrics(Parameters) for classification:</u></center>

## 1. [Confusion Matrix](e.ConfusionMatrix.ipynb)

![](_pic/img-performance/ConfusionMatrics.png)

**Terms associated with Confusion matrix:**

**1. True Positives (TP):** 

- True positives are the cases when the actual class of the data point was 1(True) and the predicted is also 1(True)

**2. True Negatives (TN):** 

- True negatives are the cases when the actual class of the data point was 0(False) and the predicted is also 0(False)

**3. False Positives (FP):**

- False positives are the cases when the actual class of the data point was 0(False) and the predicted is 1(True). False is because the model has predicted incorrectly and positive because the class predicted was a positive one. (1)

**4. False Negatives (FN):** 

- False negatives are the cases when the actual class of the data point was 1(True) and the predicted is 0(False). False is because the model has predicted incorrectly and negative because the class predicted was a negative one. (0) 

**Accuracy** measures how well the test predicts both True and Negative classes.
(Overall correctness of model) 

<center>$\begin{align*}Accuracy = \frac{TP +TN}{TP + FP + FN + TN}\end{align*}$</center>


**Sensitivity (Recall or True positive rate)** measures the proportion of positives that are correctly identified as such 
(Accuracy of class 1)

 <center>$\begin{align*}Recall =\frac{TP }{TP + FN}\end{align*}$</center>


**Specificity (True negative rate)** measures the proportion of negatives that are correctly identified as such. 
(Accuracy of class 0)

 <center>$\begin{align*}Specificity = \frac{TN}{TN + FP}\end{align*}$</center>
 
**Precision (Positive Predictive Value)** is intuitively the ability of the classifier not to label as positive a sample that is
negative.
(How Many predicted 1 are actually 1)

<center>$\begin{align*}Precision =\frac{TP }{FP + TP}\end{align*}$</center>

**Negative Predictive Value** 

<center>$\begin{align*}Negative Predictive Value =\frac{TN}{FN + TN}\end{align*}$</center>


**False Positive Rate (FPR) :**
The false positive rate is the proportion of all negatives that still yield positive test outcomes.

 <center>$\begin{align*}False Positive Rate= \frac{FP}{FP + TN}\end{align*}$</center>
 
 
**F-1 Score:**
If we have immbalanced data like in titanic we have majority of sample belonging to 0 class.

Or suppose consider:

- 100 samples(instances)->class 0

- 20 samples(instances)->class 1

The above data is immbalanced.

So ,with immbalanced data we should test model performance by using F-1 score.

<center> $\begin{align*}F1 = 2 *\frac{precision * recall}{precision + recall} \end{align*}$ </center>

**Load dataset/Clean**

In [2]:
import pandas as pd
df=pd.read_csv('_dataset/Dataset-ConfusionMatrix/Online_Ads.csv')
#hot encoding
df['Gender']=df['Gender'].map({'Male':1, 'Female':2})
X=df.loc[:,('Age','EstimatedSalary','Gender')].values
y=df.loc[:,'Purchased'].values
df.head()

Unnamed: 0,Gender,Age,EstimatedSalary,Purchased
0,1,19,19000,0
1,1,35,20000,0
2,2,26,43000,0
3,2,27,57000,0
4,1,19,76000,0


**Preprocessing (Standard Scaler)**

In [5]:
from sklearn import preprocessing
sc=preprocessing.StandardScaler()
X_new=sc.fit_transform(X.astype(float))

In [10]:
len(X_new)

400

**Model selection (Train Test Split)**

In [11]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X_new,y,random_state=13)

In [12]:
len(X_train)

300

**Model Train/Test**

In [17]:
from sklearn.linear_model import LogisticRegression
log=LogisticRegression()
log.fit(X_train,y_train)

LogisticRegression()

In [18]:
test_pred=log.predict(X_test)

**Score**

In [19]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,test_pred)

0.83

In [24]:
from sklearn import neighbors
knn=neighbors.KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train,y_train)

KNeighborsClassifier(n_neighbors=7)

In [25]:
pred_test=knn.predict(X_test)

accuracy_score(y_test,pred_test)

0.9

**Confusion Matrix**

      - sklearn.metrics.confusion_matrix(y_true, y_pred)  

In [26]:
from sklearn import metrics
cm=metrics.confusion_matrix(y_test,pred_test)

#Mannual
tn=cm[0][0]
fp=cm[0][1]
fn=cm[1][0]
tp=cm[1][1]

Accuracy=(tn+tp)/(tn+fp+fn+tp)
Sensitivity = Recall = TruePositiveRate = TPR =tp/(tp+fn)
Specificity=tn/(tn+fp)
Precision=tp/(fp+tp)
FalsePositiveRate= FPR = (fp/(fp + tn))
F1Score=2*(Precision*Recall)/(Precision + Recall)
print("Accuracy :"+str(Accuracy))
print("Sensitivity = Recall = TruePositiveRate = TPR :"+str(Recall))
print("Specificity :"+str(Specificity))
print("Precision :"+str(Precision))
print("FalsePositiveRate :"+str(FalsePositiveRate))
print("F1Score :"+str(F1Score))

Accuracy :0.9
Sensitivity = Recall = TruePositiveRate = TPR :0.9285714285714286
Specificity :0.8888888888888888
Precision :0.7647058823529411
FalsePositiveRate :0.1111111111111111
F1Score :0.8387096774193549


In [28]:
metrics.recall_score(y_test,pred_test)

0.9285714285714286

In [29]:
metrics.precision_score(y_test,pred_test)

0.7647058823529411

In [30]:
metrics.f1_score(y_test,pred_test)

0.8387096774193549

In [31]:
y_train.size

300