In [None]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# <font color="red">Test I: Wednesday, March 18th 11:00 - 11:45am </font>
- Use `numpy` and `pandas` to perform data handling.
- Use `sklearn` to train linear regression, polynomial regression, and logistic regression models.
- Understand how cost functions measure errors of the predictions.
- Understand how normal equation and gradient descent works

# Logistic Regression

*Readings: Chapter 4*

We have studied how to use linear regression and polynomial regression to *predict a target numeric value*. There is another learning task, **classification**, aiming at predicting group membership rather than numeric values. Email spam filter is a good example: it is trained with many example emails with their class (spam or non-spam), and it must learn how to classify new emails.

Linear regression is **not** a good choice for classification tasks. We will introduce the **logistic regression** model and use the iris dataset to illustrate how the model works.

## Logistic Regression: Intuition
- Picture the data as points on the plane.
- A classifier's job is to determine the decision regions for each class.
- If a point is far from the decision boundary, then the classifier should be fairly confident about its prediction.
- If a point is near the decision boundary, then the classifier may be less confident about its prediction.
- The **logistic regression** model aims to provide a **probablity distribution** for each point. The probability distribution has little variance if the point is far from decision boundary.
- **Probability distribution with high variance**: rolling a die - there is no way to predict the exact outcome
- **Probability distribution with low variance**: getting the flu today - probably not going to happen

<img src="https://mlr-org.com/docs/2015-07-28-Visualisation-of-predictions_files/figure-html/qda-1.png" width="600">

## Binary Classifier
- Suppose there are only two classes for the output feature: **Class 0** (the negative class) and **Class 1** (the positive class).
- A **binary classifer** tries to estimate the probability $p$ that a point belongs to Class 1.
- The probability that a point belongs to Class 0 is $1 - p$.
- Given the probability, the binary classifier will compare it with a chosen **threshold** (for example, 0.5), and then predict the class as
    - prediction = 1 if $\hat{p}$ $\ge$ threshold
    - prediction = 0 if $\hat{p}$ < threshold
- The **boundary** of decision regions is given by the curve formed by points whose probability equals to the threshold value.

## Logistic Regression: Model Assumption
**Binary classifier model**: Logistic regression model assumes that the decision boundary is represented as a linear function:

$\log\frac{\hat{p}}{1 - \hat{p}} = \theta_0 + \theta_1x_1 + \theta_2x_2 +\cdots + \theta_nx_n,$
- n: number of input features.
- $x_1, ..., x_n$: input features
- $\hat{p}$: the estimated probability of data belonging to the class
- $\theta_1,...,\theta_n$: parameters of the model

**Alternative format**:

$\hat{p} = \sigma(\textbf{x}\cdot\theta^T).$

- $\textbf{x} = (x_1, ..., x_n)$.
- $\theta = (\theta_1, ..., \theta_n)$.
- $\sigma(t) = \frac{1}{1+e^{-t}}$: logistic function

In [None]:
# Plot the graph of logistic function


## Logistic Regression: Decision Rule

**Decision rule**: Pick a threshold (for example, 0.5), and then

- prediction = 1 if $\hat{p}$ $\ge$ threshold
- prediction = 0 if $\hat{p}$ < threshold

**Trade-off with threshold**:
- If threshold is chosen closer to 1, then the positive predictions are __more likely__ to be correct (fewer **false positives**). However, the negative predictions are __less likely__ to be correct.
- If threshold is chosen closer to 0, then the negative predictions are __more likely__ to be correct (fewer **false negatives**). However, the positive predictions are __less likely__ to be correct.

<img src="https://hackernoon.com/hn-images/1*YV7zy1NGN1-HGQxY56nc_Q.png" width="600">

## Logistic Regression Example: The Iris Dataset

**Iris dataset** is a famous dataset that contains the sepal and petal length and width of 150 iris flowers of three different species: Iris-Setosa, Iris-Versicolor, and Iris-Virginica. [wiki page](https://en.wikipedia.org/wiki/Iris_flower_data_set)

- Import dataset using <code>sklearn.dataset.load_iris()</code>
- Explore the dataset: data description, feature names, data types, data histograms, scatter plots.
- Split the dataset into train_set and test_set
- Apply <code>sklearn.linear_model.LogisticRegression</code> to build a binary classifier on **Iris-Virginica**.
- Evaluate the performance of the model: Accuracy, cross-validation, precision vs. recall, confusion matrix...
- Visualize the model (show decision boundary)

<img src="https://lh3.googleusercontent.com/proxy/kGs0Y8tElhGYuH6BUpxNg4F14JsepyVrrWUfMoN-uUKaJh-V3AUHsWI6b4zBTy3z-ipCrXMG8IRQxaiIRyxMfSU" width="600">


In [None]:
# Load the dataset
from sklearn import datasets
iris = datasets.load_iris()

iris.keys()

In [None]:
# Explore the dataset
print(iris['DESCR'])

In [None]:
print(iris['feature_names'])

In [None]:
# Convert the data into a data frame
iris_df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
iris_df.head()

In [None]:
# Add the target class
# print(iris['target'])
iris_df['target'] = iris['target']
iris_df.head()

In [None]:
iris['target_names']

In [None]:
# Create a function that maps 0-2 to the actual type of iris
def get_target_name(x):
    return iris['target_names'][x]

x = iris_df.loc[0, 'target']
name = get_target_name(x)
print(x, name)

In [None]:
# Apply get_target_name() to all target values
iris_df['target_name'] = iris_df['target'].apply(get_target_name)
iris_df.head()

In [None]:
iris_df.describe()

In [None]:
# Draw scatter plots.
# scatter plot: sepal length vs. sepal width
plt.scatter(iris_df.iloc[:, 0], iris_df.iloc[:, 1], c=iris_df['target'])
plt.xlabel("sepal length")
plt.ylabel("sepal width")
plt.show()

In [None]:
# Draw all scatter plots
from pandas.plotting import scatter_matrix
scatter_matrix(iris_df.iloc[:, :4], figsize=(15, 15), marker='x',
               c=iris_df['target'])
plt.show()

## Build A Binary Classifier for Iris-Virginica

In [None]:
# Define a function is_virginica(target) that returns 1 if target is Virginica, and 0 otherwise



In [None]:
# Apply function is_virginica() to the data frame, creating a new column "Is_Virginica"



In [None]:
# Train-test split
# Split the data frame into 85% training data and 15% test data
from sklearn.model_selection import train_test_split
df_train, df_test = 


In [None]:
# Display the amount of Virginica and non-Virginica cases in the training set



In [None]:
# Build the logistic regression model
from sklearn.linear_model import LogisticRegression



## Model Evaluation
- Classification accuracy
- Cross Validation
- Examine four categories using the confusion matrix:
    - True Positive
    - True Negative
    - False Positive
    - False Negative
- Precision, recall, and F1 score

In [None]:
# 1. Find the prediction accuracy on test set
from sklearn.metrics import accuracy_score



In [None]:
# 2. cross validation
from sklearn.model_selection import cross_val_score
input_cols = iris_df.columns[:4]
print(cross_val_score(model, df_train[input_cols], df_train['is_virginica'],
                      cv=3))

In [None]:
# 3. confusion matrix
from sklearn.metrics import confusion_matrix
matrix = confusion_matrix(df_test['is_virginica'], test_predictions)
plt.matshow(matrix)
print(matrix)

### Precision and Recall
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Precisionrecall.svg/525px-Precisionrecall.svg.png" width="600">

In [None]:
# precision - recall - f1 score
from sklearn.metrics import precision_score, recall_score, f1_score



In [None]:
Y_pred = model2.predict(X)
Y_pred_prob = model2.predict_proba(X)


## Logistic Regression: Model Visualization
- Create a grid of points from a list of x coordinates and y coordinates.
- Use the model to obtain prediction probability on each point from the grid
- Find points with marginal probabilities.
- Plot the grid.

In [None]:
# Train a new logistic regression model on petal length and petal width only
model = LogisticRegression(solver='lbfgs')
model.fit(df_train['petal length (cm)', 'petal width (cm)'], df_train['is_virginica'])

In [None]:
# 1. Create a grid of points
x0, x1 = np.meshgrid(np.linspace(0, 7, 500),
                     np.linspace(0, 2.7, 500))
print(x0.shape, x1.shape)

In [None]:
# 2. Obtain prediction probabilities
X_new = np.hstack([x0.reshape([-1, 1]), x1.reshape([-1, 1])])
y_new_prob = model.predict_proba(X_new)

In [None]:
# 3. Find boundary points.
# Which points give 0.5 probability?
indices = np.where((y_new_prob[:, 1] > 0.499) & (y_new_prob[:, 1] < 0.501))
X_boundary = X_new[indices]

In [None]:
# 4. Plot the boundary
plt.plot(X_boundary[:, 0], X_boundary[:, 1])
index_virginica = (iris_df['is_virginica'] == 1)
index_not_virginica = (iris_df['is_virginica'] == 0)
plt.scatter(iris_df.loc[index_virginica, 'petal length (cm)'],
            iris_df.loc[index_virginica, 'petal width (cm)'],
            c='yellow',
            label='Virginica')
plt.scatter(iris_df.loc[index_not_virginica, 'petal length (cm)'],
            iris_df.loc[index_not_virginica, 'petal width (cm)'],
            c='purple',
            label='Not Virginica')
plt.legend()

In [None]:
# 5. Plot probabilities
plt.scatter(X_new[:, 0], X_new[:, 1], c=y_new_prob[:, 0])
plt.colorbar()
plt.scatter(iris_df.loc[index_virginica, 'petal length (cm)'],
            iris_df.loc[index_virginica, 'petal width (cm)'],
            c='yellow',
            label='Virginica')
plt.scatter(iris_df.loc[index_not_virginica, 'petal length (cm)'],
            iris_df.loc[index_not_virginica, 'petal width (cm)'],
            c='purple',
            label='Not Virginica')
plt.legend()