In [4]:
import numpy as np

We assume our classes are $y\in \{0,1\}$. We model our data as $$P(y_i=1|x_i;\theta)\approx \frac{1}{1+\exp(-\theta^T x)}.$$
We try to choose $\theta$ such that the likelihood $$\prod P(y_i|x_i;\theta)$$ is maximal. This can't be analytically solved, so we use gradient descent.

We apply log, multiply by $-1$, and try to minize using gradient descent (we analytically can calculate the partial derivatives, see attached pdf for more details on the math).

In [5]:
def sigmoid(x,coefs):
  return 1/(1+np.exp(-x@coefs))

def predict(X,theta):
  return np.array(sigmoid(X,theta)>0.5,int)

def gradient(theta,X,Y):
  difference=predict(X,theta)-Y
  # all the MATH is hidden in this expression, the rest is routine code
  # see pdf for more details
  return X.transpose()@difference/Y.shape[0]

def GradientDescent(X,Y,iterations,learningRate):
  theta=np.zeros(X.shape[1])
  for iter in range(iterations):
    theta=theta-learningRate*(gradient(theta,X,Y))
  return theta

We apply this to some toy dataset

In [6]:
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split


X,Y = datasets.load_breast_cancer(return_X_y=True)

#adds a column of ones (to account for bias)
X=np.c_[X,np.ones(X.shape[0])]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=42)

In [7]:
theta=GradientDescent(X_train,y_train, 200,0.1)
prediction=predict(X_test,theta)

  return 1/(1+np.exp(-x@coefs))


In [8]:
np.sum(prediction==y_test)/y_test.shape[0]

0.8947368421052632

Hooray! Accuracy of 0.89 -> our algorithm works.

Be careful: We do not get back the same coefficients we started with, but the ratios are preserved, and that is enough