### Logistic Regression

In this notebook we are going to implement logistic regression machine leaning algorithim from scratch using the `numpy` package.

In machine leaning Logistic Regression is used to model the probability of a certain class or event existing such as pass/fail, win/lose, etc.


The formular of a logistic regression is the same as the formular for linear regression is just that the predictions are applied the `sigmoid` function to them:

$\hat{y}$ = **_mx_ + _b_**

### Sigmoid function

The `sigmoid` function always returns a value between `0` and `1`.

<p align="center"><img src="https://www.gstatic.com/education/formulas2/397133473/en/sigmoid_function.svg" alt=""/>
</p>

### Gradient descent: 

We start with a certain value of `w` which is our weighs and we keep on updating the value of `w` until we get to a minimum value.


### Updating the weights and bias.

To update the weights `w` and bias `b` we do it as follows in every iteration.

1. $w$ = $w$ - $\alpha$ * $dw$

2. $b$ = $b$ - $\alpha$ * $db$

Where:
* $\alpha$ - is the learning rate
* $w$ - is the weights
* $b$ - is the bias


If `dw` and `db` are simply the derivative of the loss function with regards to the weights and biases. Given the loss function

$J = \frac{1}{m} \Sigma_{i=1}^{m}(y_i - h(x_i))^2$

The derivatives of the loss to the weights (`dw`) and bias (`db`) are equal to:


1. `dw`

$\frac{\partial}{\partial W} J = -\frac{2}{m} \Sigma_{i=1}^{m}(y_i - h(x_i)) * x_i$


2. `db`


$\frac{\partial}{\partial b} J = -\frac{2}{m} \Sigma_{i=1}^{m}(y_i - h(x_i))$


### Implementation


In [1]:
import numpy as np


In logistic regression unlike in the linear regression we can be able to calculate the accuracy between observed labels and predicted labels. We can use classification metrics to measure the performance of the algorithm such as confusion matrix, accuracy, etc.

In [2]:
class LogisticRegression:
  def __init__(self, lr=0.001, n_iters=10000):
    self.lr = lr
    self.n_iters = n_iters

  def fit(self, X, y):
    n_samples, n_features = X.shape

    """
    you can initialize with random numbers
    """
    self.w = np.zeros(n_features)
    self.b = 0

    # gradient descent
    for _ in range(self.n_iters):
      linear_model = np.dot(X, self.w) + self.b
      y_predicted = self._sigmoid(linear_model)
      # y= mx + b

      # compute dw and db (gradients)
      dw = (2 / n_samples) * np.dot(X.T, (y_predicted - y))
      db = (2 / n_samples) * np.sum(y_predicted -y)

      # updating the parameters
      self.w -= self.lr * dw
      self.b -= self.lr * db

  def predict(self, X):
      linear_model = np.dot(X, self.w) + self.b
      y_predicted = self._sigmoid(linear_model)
      preds = [0 if i< 0.5 else 1 for i in y_predicted]
      return np.array(preds)

  def _sigmoid(self, x):
    return 1/(1 + np.exp(-x))

  def _accuracy(self, y_true, y_pred ):
    return np.sum(y_true == y_pred) / len(y_true)

  def evaluate(self, y_true, y_pred):
      print("acc: ", self._accuracy(y_true, y_pred))

### Testing the classifier

We are going to get the data from `sklearn` library as follows:

In [3]:
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

In [4]:
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1234
)

Spitting the data

In [5]:
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
)

In [6]:
regressor = LogisticRegression(lr=0.001)
regressor.fit(X_train, y_train)
y_preds = regressor.predict(X_test)



In [7]:
y_preds[:10], y_test[:10]

(array([0, 0, 0, 1, 1, 0, 0, 0, 1, 1]), array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1]))

In [8]:
regressor.evaluate(y_test, y_preds)

acc:  0.9736842105263158
