In [12]:
import numpy as np

# ML Methods

## Gradient Descent

$$\boldsymbol{w}^{(t+1)}=\boldsymbol{w}^{(t)}- \gamma \nabla \mathcal{L}(\boldsymbol{w})$$

## Logistic Regression

Likelihood for logistic regression: $\mathcal{L}(\boldsymbol{w})=\prod_{n=1}^N \pi_{n}^{y_n} (1-\pi_{n})^{1-y_n}$, where $\pi_n(\boldsymbol{x_n})=\frac{e^{\boldsymbol{x_n}^T\boldsymbol{w}}}{1+e^{\boldsymbol{x_n}^T\boldsymbol{w}}} $
Log-likelihood: $l(\boldsymbol{w})=\sum_{n=1}^N(y_n\boldsymbol{x_n}^T\boldsymbol{w}-log(1+e^{\boldsymbol{x_n}^T\boldsymbol{w}}))$

Log-likelihood is differenetiable and concave, all local maxima are global maxima. Use gradient descent or SGD to optimize.

Gradient: $\nabla l(\boldsymbol{w})=[\frac{\partial l}{\partial w_1},\dots, \frac{\partial l}{\partial w_D}]^T =
[\sum_{n=1}^N(y_n-\frac{e^{\boldsymbol{x_n}^T\boldsymbol{w}}}{1+e^{\boldsymbol{x_n}^T\boldsymbol{w}}})x_{n1},\dots,\sum_{n=1}^N(y_n-\frac{e^{\boldsymbol{x_n}^T\boldsymbol{w}}}{1+e^{\boldsymbol{x_n}^T\boldsymbol{w}}})x_{nD}]^T=
\sum_{n=1}^N(y_n-\frac{e^{\boldsymbol{x_n}^T\boldsymbol{w}}}{1+e^{\boldsymbol{x_n}^T\boldsymbol{w}}})\boldsymbol{x_{n}}^T$

Maximize the previous quantity is equivalent to use the following gradient via GD SGD
$\sum_{n=1}^N(\frac{e^{\boldsymbol{x_n}^T\boldsymbol{w}}}{1+e^{\boldsymbol{x_n}^T\boldsymbol{w}}}-y_n)\boldsymbol{x_{n}}^T=\sum_{n=1}^N(\sigma({\boldsymbol{x_n}^T\boldsymbol{w}})-y_n)\boldsymbol{x_{n}}^T$

In [152]:
def sigmoid(x):
    x= pow(x,2.718281828459045)/(1+pow(x,2.718281828459045)) 
    return x

In [153]:
def partial_deriv_w(y, X, w, i):
    S=0
    for n in range(X.shape[0]):
        S+=(sigmoid(np.dot(X[n,:],w))-y[n])*X[n,i]
    return S

In [149]:
def compute_gradient(y, X, w):
    D=w.shape[0]
    L=np.zeros(D)
    for n in range(D):
        L+=(sigmoid(np.dot(X[n,:],w))-y[n])*X[n,:]
    return L

In [155]:
def gradient(y, X, w):
    D=w.shape[0]
    L=np.zeros(D)
    for i in range(D):
        L[i]=partial_deriv_w(y,X,w,i)
    return L

In [1]:
def gradient_descent(y, X, w0, gamma, max_iters):
    w=w0
    for n in range(max_iters):
        grad = gradient(y,X,w)
        #print(grad)
        w = w - gamma*grad
        print(w)
    return w

#y=np.array([1,0,1])
#X=np.array([[9,0,11],[1,0,1],[1,0,1]])
#w0=np.array([-1,1,1])
y=np.array([1,0,1,7])
X=np.array([[9,0,11],[1,0,1],[1,0,1],[1,2,3]])
w0=np.array([0,0,1000])
max_iters=10000
gamma=0.1
#gradient_descent(y, X, w0, gamma, max_iters)

NameError: name 'np' is not defined

In [171]:
def gradient_descent1(y, X, w0, gamma, max_iters):
    w=w0
    for n in range(max_iters):
        grad = compute_gradient(y,X,w)
        #print(grad)
        w = w - gamma*grad
        #print(w)
    return w

y=np.array([1,0,1,7])
X=np.array([[9,0,11],[1,0,1],[1,0,1],[1,2,3]])
w0=np.array([0,90,0])
#y=np.array([1,0,1])
#X=np.array([[9,0,11],[1,0,1],[1,0,1]])
#w0=np.array([-1,1,1])
max_iters=1000
gamma=0.1
gradient_descent1(y, X, w0, gamma, max_iters)
compute_gradient(y,X,w0)

array([-10.,   0., -12.])

In [105]:
def logistic_regression(y, tx, initial_w, max_iters, gamma):
    w=gradient_descent(y, tx, initial_w, gamma, max_iters)
    S=0
    for n in range(tx.shape[0]):
        z=np.dot(tx[n,:],w)
        z=pow(z,2.718281828459045)
        z=np.log(1+z)
        m=y[n]*np.dot(tx[n,:],w)
        print(z)
        print(m)
        S+= z-m
    return (w,S)        

### Test

In [124]:
y=np.array([1,0,1])
tx=np.array([[9,0,11],[1,0,1],[1,0,1]])
initial_w=np.array([-1,0,2])
max_iters=1000
gamma=0.1

logistic_regression(y, tx, initial_w, max_iters, gamma)


7.02434801244219
13.247193017189197
0.7020951594542215
0.0
0.7020951594542215
1.0065758880582614


(array([-1.08742912,  0.        ,  2.09400501]), -5.8252305738968255)