# Logistic Regression - Advanced

In the lesson slides for this learning unit, we looked at:

* The form and function of the logistic regression model
* Use of binary logistic regression to classify a single variable
* Evaluating a trained model with the accuracy score
* Interpreting model predictions via probability scores

In this practical we will implement a simple multivariate logistic regression, before moving on to letting `sklearn` do all the work.

# Part 1: implementing logistic regression

The matrix notation form of the logistic model is $$ y = \sigma(Xw)$$

Where $X$ is a matrix of observations x features and $w$ are the model parameters.

Key components are:

1. The sigmoid function: $$\sigma(x) = \frac{1}{1 + e^{-x}}$$

2. A model that takes features and parameters, and combines them to make predictions.

3. The cross entropy function: $$L(w) = -\frac{1}{N}\sum_{i=1}^{N}log\left ( p_i^{y_i}(1-p_i)^{(1-y_i)} \right )$$

4. A loss function to combine the above.

5. An optimiser that can solve $y = \sigma(Xw)$ for $w$

## The sigmoid function

First, implement the `sigmoid` function using numpy. It should take in an array and apply the sigmoid function to each item in it. It should return an array of the transformed values.

In [None]:
import numpy as np

# Your code here


## The logistic model

The `logistic_model` function should implement $y = \sigma(Xw)$.

Inputs:

1. `X` - a matrix of observations x features
2. `w` - a vector of model parameters (a bias, plus one component per feature in X)

Should return `p`, the model predictions

Note: you can multiply one `numpy` array by another using the first array's `.dot()` method.

In [None]:
# Your code here


## The cross-entropy function

Now, implement the cross-entropy function.

Inputs:

1. `truth` - an vector of true class labels ($0$ or $1$)
2. `preds` - a vector of predicted probabilities (between $0$ and $1$), from `logistic_model`

Should return `loss`, a single float representing the cross-entropy loss.

In [None]:
# Your code here


## Loss function

Using the functions you have built so far, we can create a loss function to be optimised.

This takes in:

1. `X` - a matrix of observations x features
2. `w` - a vector of model parameters (a bias, plus one component per feature in X)
3. `y` - a vector of the true class labels

In [None]:
def loss(w, X, y):
    p = logistic_model(X, w)
    loss = cross_entropy(y, p)
    
    return loss

## Optimiser

Rather than implement gradient descent or something similar, we'll let `scipy.optimize.minimize` sort this out for us!

The `minimize` function takes in your loss function, your initial guess at the parameters and the X and y data.

In [None]:
from scipy.optimize import minimize

## Testing it all so far

Below is some fake data: one features (plus our bias term) predicting one of two classes.

The `minimize` function finds the best values for the bias and weights, which you can see labeled `x` at the end of the output.

In [None]:
X = np.array([[1, 0.1],
              [1, 0.8],
              [1, 0.3],
              [1, 0.2],
              [1, 0.9]
             ])

y = np.array([0, 1, 0, 0, 1])

w = np.array([0, 0])

results = minimize(loss, w, args=(X, y))

results

## Checking the curve

Since we have a simple 2D example, we can use the parameters from `results.x` and plot the learned curve.

In [None]:
import seaborn as sns

# Get a whole range of x values
x = np.linspace(0, 1, 100).reshape(-1,1)
# Add the weight term
x = np.insert(x, 0, values=1, axis=1)

# Use the model to get predictions for all of them, using the learned parameters
y_pred = logistic_model(x, results.x)

# Plot the model
sns.lineplot(x=x[:,1], y=y_pred, lw=4);

# Part 2 : logistic regression in sklearn

For the rest of this practical, we will use sklearn to create and evaluate the model.

First, let's look at binary classification - predicting breast cancer from a 30 different measurements.

In [None]:
from sklearn import datasets
import pandas as pd

data = datasets.load_breast_cancer()

X = pd.DataFrame(data['data'], columns=data['feature_names'])
y = data.target

X.head()

Instantiate a LogisticRegression model using `sklearn` and name it `model`.

(You should set `max_iter` to something larger, like 10000, for the model to converge during training.)

Use its `.fit()` method to learn from the data in `X` and `y`.

How accurate is the model? You can quickly compute this using the model's `.score()` method and passing it the data in `X` and `y`.

In [None]:
from sklearn.linear_model import LogisticRegression

# Your code here


Look at the distribution of class labels in the first 500 data points. What do you notice?

In [None]:
# Your code here


Use `model` to get predictions for all the data points in `X`, via its `.predict()` method.

Use these, and the true classes, to calculate overall precision, recall and F1 score for the model using the imported functions from `sklearn.metrics`. These functions take two arguments: the original `y` values and those predicted using the `.predict()` method.

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

# Your code here


These figures are averaged over all the classes. It could be useful to see how the model does on each class. Print a classification report. What do you observe?

In [None]:
from sklearn.metrics import classification_report

# Your code here


# sklearn vs DIY

Let's finish by comparing sklearn to your implementation, in terms of accuracy. What do you make of the results?

In [None]:
from sklearn.metrics import accuracy_score

X = np.array([[1, 0.1],
              [1, 0.8],
              [1, 0.3],
              [1, 0.2],
              [1, 0.9]
             ])

y = np.array([0, 1, 0, 0, 1])

model = LogisticRegression()
model.fit(X, y)
print(f"sklearn Logistic Regression accuracy: {model.score(X, y)}")

y_pred = logistic_model(X, results.x)
print(f"DIY Logistic Regression accuracy: {accuracy_score(y, y_pred.round())}")

# Your thoughts here


# Summary

In this practical, you implemented the core components of the logistic regression model, before going on to use `sklearn` to do all the work - training the model and also evaluating it using suitable metrics.

In practice, you wouldn't train a model on the entire dataset. So you could explore the models here a bit more and see how they perform when asked to classify new, totally unseen, data points.

To go even further, you could implement regularisation (using an L1 or L2 norm) for your DIY model and see if you can match the output of `sklearn`.