# Logistic Regression

Logistic regression is commonly used for binary classification.
First, let us prepare a dataset that has only 2 classes as its labels.

## 1 - Import necessary packages

Let's first import all the packages that you will need during this assignment.

- **numpy** is the main package for scientific computing with Python.
- **matplotlib** is a library to plot graphs in Python.
- **sklearn** features various algorithms of machine learning in Python.

In [None]:
import numpy as np
from sklearn import datasets
from sklearn import model_selection
from sklearn import metrics

import matplotlib.pyplot as plt
%matplotlib inline

## 2 - Load dataset

We can load the readily available IRIS dataset from scikit-learn which has 3 classes.
We will then remove 1 of the 3 classes to suit our needs.

In [None]:
# TODO: Replace {} with your solution to load the iris dataset
iris = datasets.load_{}()

In [None]:
dir(iris)

In [None]:
iris.data.shape

In [None]:
iris.target_names

In [None]:
iris.feature_names

Check the available classes/labels in IRIS dataset.

In [None]:
np.unique(iris.target)

Remove the data entries with label 2.

In [None]:
# TODO: Replace {} with your solution to remove the data entries with label 2
idx = iris.target != {}

In [None]:
print(iris.target)

In [None]:
print(idx)

Load features to "data" and load targets to "target".

In [None]:
# TODO: Replace {} with your solution to load features to "data"
{} = iris.data[idx].astype(np.float32)

# TODO: Replace {} with your solution to load targets to "target"
{} = iris.target[idx].astype(np.float32)

The target has only 2 classes, 0 or 1

In [None]:
print(target)

If you plot out all the new datapoints which consists of 2 classes, you can see that the dataset can be well separated linearly.

In [None]:
plt.figure(figsize=(10, 6))
plt.scatter(data[:, 0], data[:, 1], c=target, cmap=plt.cm.coolwarm, s=100)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])

Split the dataset 70% train 30% test.

In [None]:
# TODO: Replace {} with your solution to split the dataset into 70% training set and 30% test set
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    data, target, test_size={}, random_state=123
)

In [None]:
X_train.shape, y_train.shape

In [None]:
X_test.shape, y_test.shape

## 3 - Logistic Regression

import `LogisticRegression` from `sklearn.linear_model` to use the logistic regression model.

In [None]:
# TODO: Replace {} to import logistic regression class
from sklearn.linear_model import {}

# TODO: Replace {} to implement logistic regression model
model = {}(solver="liblinear")

> `solver` is to specify which algorithm to use in the optimization problem, by default is 'lbfgs'. In this case, we use 'liblinear' algorithm fot the optimization because it works well in small dataset.

In [None]:
model.fit(X_train, y_train)

In [None]:
predictions = model.predict(X_test)

## 4 - Evaluating Model

Get the classification report.

In [None]:
# TODO: Replace {} with your solution to import classification report class
from sklearn.metrics import {}

# TODO: Replace {} with your solution to print the classfication report
print({}(y_test, predictions))

Get the confusion matrix.

In [None]:
# TODO: Replace {} with your solution to import confusion matrix class
from sklearn.metrics import {}

# TODO: Replace {} with your solution to print the confusion matrix
print({}(y_test, predictions))

# Exercise: Binary Classification of Breast Cancer dataset

## 1 - Load dataset

In [None]:
# TODO: Replace {} with your solution to load the breast cancer dataset
bcancer = datasets.{}()

In [None]:
dir(bcancer)

In [None]:
bcancer.target_names

In [None]:
bcancer.data.shape, bcancer.target.shape

Load the features to "data" and load targets to "target".

In [None]:
# TODO: Replace {} with your solution to load the features to "data"
data = bcancer.{}.astype(np.float32)

# TODO: Replace {} with your solution to load the targets to "target"
target = bcancer.{}.astype(np.float32)

In [None]:
# TODO: Replace {} with your solution to split the dataset into 70% training set and 30% test set
X_train, X_test, y_train, y_test = model_selection.{}(
    {}, {}, test_size=0.3, random_state=123
)

## 2 - Logistic Regression

In [None]:
# TODO: Replace {} with your solution to import logistic regression class
from sklearn.linear_model import {}

# TODO: Replace {} with your solution to implement logistic regression model
model = {}(solver="liblinear")

model.fit(X_train, y_train)

# TODO: Replace {} with your solution to make the prediction using the trained model
predictions = model.{}(X_test)

## 3 - Evaluating Model

In [None]:
from sklearn.metrics import confusion_matrix

# TODO: Replace {} with your solution to print confussion matrix
print(confusion_matrix(y_test, {}))

In [None]:
from sklearn.metrics import classification_report

# TODO: Replace {} with your solution to print classification report
print(classification_report({}, predictions))