In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Logistic Regression: Training Algorithm

**Cost (loss) function** for logistic regression:

\begin{equation}
c(\theta) = \left\{
\begin{array}{cc}
-\log(\hat{p}) & \textit{if }y=1,\\
-\log(1-\hat{p}) & \textit{if }y=0.
\end{array}
\right.
\end{equation}

The cost function $c(\theta)$:

- small if $y=1$ (data example belongs to the class) and $\hat{p}$ is close to 1.
- small if $y=0$ (data example does not belong to the class) and $\hat{p}$ is close to 0.
- is a convex function, so that the gradient descent method always finds the minimum.

**Uniformed expression for the cost function**:

$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}\big[y^{(i)}\log(\hat{p}^{(i)}) + (1-y^{(i)})\log(1-\hat{p}^{(i)})\big]$

- $c(\theta) = J(\theta)$ for $y=0$ and $y=1$.
- There is no equivalent of the Normal Equation.
- $J(\theta)$ is a convex function.
- $\frac{\partial J}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^{m}\big(\sigma(\textbf{x}^{(i)}\cdot\theta^T) - y^{(i)}\big)x_j^{(i)}$.

**Question**: Why not use the mean-square-error (MSE) cost function?

## Logistic Regression: Varying The Threshold
We can change the default threshold to improve classification accuracy of one particular class. The tradeoff usually is the reduced accuracy on the other class. 
- An **Receiver Operating Characteristics (ROC)** can be used to show such tradeoffs.
    - x-axis: true positive rate (= true positive / (true positive + false negative))
    - y-axis: false positive rate (= false positive / (true negative + false positive))
- The **Area Under Curve (AUC)** score of the ROC curve is often used to measure the quality of the model:
    - AUC close to 1: The model give satisfactory classification results for most choices of thresholds.
    - AUC close to 0.5: The model does poorly for most thresholds.


In [None]:
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)

In [None]:
from sklearn.metrics import roc_auc_score

roc_auc_score(y_train_5, y_scores)

## Logistic Regression for Multiple Classes (Softmax regression)
**model**:

$\hat{p}_k = \frac{\exp(s_k(\textbf{x}))}{\sum_{i=1}^K\exp(s_i(\textbf{x}))}$.

$s_k(\textbf{x}) = \textbf{x}\cdot\theta_k^T$

- $\hat{p}_k$ is the probability that the instance belongs to class $k$.
- K is the number of classes.
- $\theta_k$ is the coefficient vector associated with class $k$. All these vectors are stored as rows in a parameter matrix $\Theta$.
- The softmax classifier predicts the class with the highest estimated probability (which is simply the class with the highest score).

**Cross entropy cost function**

$J(\Theta) = -\frac{1}{m}\sum_{i=1}^m\sum_{k=1}^K
y_k^{(i)}\log(\hat{p}_k^{(i)})$

- $y_k^{(i)}$ is equal to 1 if the target for the i-th instance is $k$; otherwise, it is equal to 0.


## Classifying Iris Data

In [2]:
# Train-test split
from sklearn.model_selection import train_test_split



In [1]:
# Build the logistic regression model
from sklearn.linear_model import LogisticRegression



In [3]:
# Calculate the training accuracy and testing accuracy
from sklearn.metrics import accuracy_score



In [4]:
# Display the confusion matrix
from sklearn.metrics import confusion_matrix



In [None]:
# Perform 3-fold cross validation
from sklearn.model_selection import cross_val_score



In [None]:
# Plot ROC curve for each class



## Homework:

1. Divide the dataset randomly into 80% training set and 20% test set, and build a logistic classifier to identify Iris-Setosa using the petal width and petal length. 
2. Calculate test accuracy, precision, recall, f1-score.
3. Plot the ROC curve and calculate AUC.
4. (optional for undergraduates) Build a grid of points using `np.meshgrid` and use their probabilities to draw the decision boundary of the model.