# Binary Classification and Performance Metrics

## Learning Objectives

At the end of the experiment, you will be able to:

* learn about Classification tasks in Machine learning
* perform Logistic Regression, Softmax Regression
* learn the appropriate performance metrics according to use case
* have an understanding of Decision Boundaries

## Information

### Classification

**Classification** refers to a predictive modeling problem where a class label is predicted for a given example of input data.

**Examples include:**

* Email spam detection (spam or not).
* Churn prediction (churn or not).
* Conversion prediction (buy or not).

**Binary classification** refers to those classification tasks that have two class labels.

**Logistic Regression** is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). 

### Implementing Binary Classification with Logistic Regression 

## Dataset

In this example, we will be using "Social_Network_Ads" dataset. 

The variable descriptions are as follows:

* Age
* EstimatedSalary

The target feature is:
* Purchased

**Problem Statement:** To predict if a person will purchase an item based on age and estimated salary. 

### Importing required packages


In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

#### Importing the Dataset

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/Adithya-Thonse/python_ML_DL_basics/main/ML/Datasets/Social_Network_Ads.csv')
X = df.iloc[:, 1].values # estimated salary
y = df.iloc[:, -1].values
X = X.reshape(-1, 1)
df.head()

#### Splitting the dataset into the Training set and Test set

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)


In [None]:
print(X_train)

In [None]:
print(y_train)

In [None]:
print(X_test)

In [None]:
print(y_test)

#### Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
print(X_train)

In [None]:
print(X_test)

#### Training the Logistic Regression model on the Training set




In [None]:
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

#### Predicting a new test instance

In [None]:
print(classifier.predict(sc.transform([[87000]])))

#### Predicting the Test set results

In [None]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

### Model Evaluation 

To evaluate the performance of a classification model, the following metrics are used:

* Confusion matrix
  * Accuracy
  * Precision
  * Recall
  * F1-Score
* ROC curve
* AUROC

#### Confusion Matrix

* **Confusion matrix:**  is a table that is used to describe the performance of a classification model on a set of test data for which the true values are known. 

  * **true positive** for correctly predicted event values.
  * **false positive** for incorrectly predicted event values.
  * **true negative** for correctly predicted no-event values.
  * **false negative** for incorrectly predicted no-event values.
* **Accuracy:** it is the ratio of the number of correct predictions to the total number of input samples.


In [None]:
# Creating a confusion matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
print(classification_report(y_test, y_pred))

This Confusion Matrix tells us that there were 81 correct predictions and 19 incorrect ones.

* True Positive: 15
* True Negative: 66
* False Positive: 2
* False Negative: 17

#### Precision-Recall Metrics

* **Precision:** summarizes the fraction of examples assigned the positive class that belongs to the positive class.

    Precision = $\mathbf{\frac{TruePositive}{TruePositive + FalsePositive}}$

* **Recall:** summarizes how well the positive class was predicted and is the same calculation as sensitivity.

   Recall = $\mathbf{\frac{TruePositive}{TruePositive + FalseNegative}}$

* **F1-score:** precision and recall can be combined into a single score that seeks to balance both concerns, called the F-score or the F-measure.
  
   F1-score = $\mathbf{\frac{2*Precision*Recall}{Precision+Recall}}$

##### Plotting precision-recall curve using sklearn

In [None]:
# Use sklearn to plot precision-recall curves

from sklearn.metrics import plot_precision_recall_curve

plot_precision_recall_curve(classifier, X_test, y_test, name = 'Logistic Regression')

The above diagram shows the blue line as precision-recall curve.

### ROC-AUC curve

A ROC curve is a diagnostic plot for summarizing the behavior of a model by calculating the false positive rate and true positive rate for a set of predictions by the model under different thresholds.

Area Under Curve (AUC) is one of the most widely used metrics for evaluation. It is used for binary classification problems.

AUC has a range of [0, 1]. The greater the value, the better is the performance of our model.

#### Plotting the ROC-AUC curve for Logistic Regression algorithm using matplotlib

In [None]:
# roc_curve() computes the ROC for the classifier and returns the FPR, TPR, and threshold values
from sklearn.metrics import roc_curve

classifier.fit(X_train, y_train)
pred_prob1 = classifier.predict_proba(X_test)

# roc curve for models
fpr1, tpr1, thresh1 = roc_curve(y_test, pred_prob1[:,1], pos_label=1)


# roc curve for tpr = fpr 
random_probs = [0 for i in range(len(y_test))]
p_fpr, p_tpr, _ = roc_curve(y_test, random_probs, pos_label=1)

In [None]:
plt.style.use('seaborn')

# plot roc curves
plt.plot(fpr1, tpr1, linestyle='--',color='orange', label='Logistic Regression')

plt.plot(p_fpr, p_tpr, linestyle='--', color='blue')
# title
plt.title('ROC curve')
# x label
plt.xlabel('False Positive Rate')
# y label
plt.ylabel('True Positive rate')

plt.legend(loc='best')
plt.savefig('ROC',dpi=300)
plt.show();

The above diagram shows:

ROC curve: is the orange dotted line

AUROC: is the area under the orange dotted line

The blue dotted line is the reference line.

Please refer to the given [link](https://medium.com/@MohammedS/performance-metrics-for-classification-problems-in-machine-learning-part-i-b085d432082b) for further information on Performance metrics and [ROC-AUC curve](https://medium.com/greyatom/lets-learn-about-auc-roc-curve-4a94b4d88152)

### Example: Predicting Diabetes with Logistic Regression

Let us now apply the above learnings to perform a logistic regression using a 'UCI PIMA Indian Diabetes' dataset.

 * Fit the model
 * Do the prediction
 * Plot the ROC-AUC curve for the Logistic Regression algorithm



#### Dataset

In this example, we will be using the "UCI PIMA Indian Diabetes" dataset.

The datasets consist of several medical predictor variables and one target variable, Outcome. Predictor variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

The variable descriptions are as follows:

* Pregnancies: Number of Pregnancies
* Glucose: Plasma glucose concentration over 2 hours in an oral glucose tolerance test
* Blood pressure: Diastolic blood pressure (mm Hg)
* SkinThickness: Triceps skinfold thickness (mm)
* Insulin: 2-Hour serum insulin (mu U/ml)
* BMI: Body mass index (weight in kg/(height in m)2)
* DiabetesPedigreeFunction: Diabetes pedigree function (a function which scores likelihood of diabetes based on family history)
* Age: Age (years)
* Outcome: Class variable (0 if non-diabetic, 1 if diabetic)

Problem statement:

We will be using this dataset to predict if a person has diabetes or not using the medical attributes provided.

#### Loading the dataset

In [None]:
DF = pd.read_csv('https://raw.githubusercontent.com/Adithya-Thonse/python_ML_DL_basics/main/ML/Datasets/diabetes.csv')
print(DF.head())

#### Finding if there are any null values

In [None]:
# YOUR CODE HERE

#### Training our model

In [None]:
# Separating the data into independent and dependent variables

# YOUR CODE HERE

#### Splitting the data into training and testing data

In [None]:
# YOUR CODE HERE

#### Training the Logistic Regression model on the Training set

In [None]:
# YOUR CODE HERE

#### Training/Fitting the Model

In [None]:
# YOUR CODE HERE

#### Making Predictions

In [None]:
# YOUR CODE HERE

#### Confusion Matrix

In [None]:
# YOUR CODE HERE

#### Plotting the ROC curve for Logistic Regression algorithm using matplotlib

In [None]:
# YOUR CODE HERE

###  Softmax Regression

The **Softmax regression** is a form of logistic regression that normalizes an input value into a vector of values that follows a probability distribution whose total sums up to 1.

It is also called **multinomial logistic regression.**

Performing Softmax Regression on the above dataset "Social_Network_Ads"

In [None]:
X = df.iloc[:, :-1].values # considering age,estimated salary
y = df.iloc[:, -1].values

df.head()

#### Splitting the dataset into the Training set and Test set

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

#### Feature Scaling

In [None]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#### Training the Softmax Regression model on the Training set

In [None]:
softmax_reg = LogisticRegression(multi_class='multinomial', # switch to Softmax Regression
                                     solver='lbfgs', # handle multinomial loss, L2 penalty
                                     C=10)
softmax_reg.fit(X, y)


#### Predicting a new result

In [None]:
softmax_reg.predict(sc.transform([[30,87000]]))

In [None]:
softmax_reg.predict_proba(sc.transform([[30,87000]]))

### Decision Boundary

In classification problems with two or more classes, a decision boundary is a hypersurface that separates the underlying vector space into sets, one for each class.

#### Creating Dummy Dataset

In [None]:
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_classes=2, random_state=1)

#### Creating Decision Boundary

In [None]:
import matplotlib.gridspec as gridspec
from mlxtend.plotting import plot_decision_regions
gs = gridspec.GridSpec(3, 2)

fig = plt.figure(figsize=(14,10))

label = 'Logistic Regression'
clf = LogisticRegression()
clf.fit(X, y)

fig = plot_decision_regions(X=X, y=y, clf=clf, legend=2)
plt.title(label)
plt.show()

### Reference

https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-score-clearly-explained-b603420b292f

### Theory Questions

1. Is it a good idea to stop Mini-batch Gradient Descent immediately when the validation error goes up?

  Both Mini-batch and Stochastic gradient descent are not guaranteed to minimize the cost function after each step because they both have a degree of randomness built into them. Mini-bath randomly chooses which training examples to perform gradient descent on while Stochastic randomly chooses a single example. A better option is to save the model at regular intervals. When the model has not improved for a long time you can revert to the saved models.

2. Can Gradient Descent get stuck in a local minimum when training a Logistic Regression model?

  Gradient descent produces a convex-shaped graph that only has one global optimum. Therefore, it cannot get stuck in a local minimum.

3. Do all Gradient Descent algorithms lead to the same model provided you let them run long enough?

  No. The issue is that stochastic gradient descent and mini-batch gradient descent have randomness built into them. This means that they can find their way to nearby the global optimum, but they generally don't converge. One way to help them converge is to gradually reduce the learning rate hyperparameter.

4. Suppose you want to classify pictures as outdoor/indoor and daytime/nighttime, should you implement two Logistic Regression classifiers or one Softmax Regression classifier?

  Softmax regression does not handle multiple output classes (i.e. [indoor, daytime]). So you'll need to use two logistic regression classifiers.
