### Newton Method

Our objective is to minimize the function:
\begin{equation}
    f(\vec{x}) = \frac{1}{m} \sum_{i=1}^m \log\Bigg(1+\exp\Big(-b_j \vec{a_j}^T\vec{x}\Big)\Bigg)\qquad for \; x \in \mathbb{R}^d
\end{equation}
where $d$ is the feature number and $\vec{a}_j$ are the data while $b_j$ are the labels.
Now we would like to this applying the newton method to find a point that minimize such a function. This is possible because since $f$ is convex, all stationary points are minimizers and we search for the "roots" of the equation $\nabla f=0$.
The newton method we implement is of the form,
\begin{equation}
    \vec{x}_{n+1} = \vec{x}_n -\gamma Hf(\vec{x}_n)^{-1}\nabla f(\vec{x}_n)
\end{equation}
where $\gamma$ is the step size.
We solve the system $Hf(\vec{x}_n)q=\nabla f(\vec{x}_n)$ using the CG method where as a preconditioned we have taken a the inverse of $Hf(\vec{x}_n)$ computed using the random SVD presented in [1].

In [6]:
#We import all the library we are gona need
import tensorflow as tf
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from numsa.TFHessian import *
import dsdl

In [7]:
ds = dsdl.load("a1a")

X, Y = ds.get_train()
X = X[1:100];
Y = Y[1:100];
print(X.shape, Y.shape)

(99, 119) (99,)


In [8]:
#Setting the parameter of this run, we will use optimization nomeclature not ML one.
itmax = 100; # Number of epoch.
tol = 1e-4
step_size = 0.1; #Learning rate
#Defining the Loss Function
def Loss(x):
    S = tf.Variable(0.0);
    for j in range(X.shape[0]):
        a = tf.constant((X[j,:].todense().reshape(119,1)),dtype=np.float32);
        b = tf.constant(Y[j],dtype=np.float32)
        a = tf.reshape(a,(119,1));
        x = tf.reshape(x,(119,1));
        dot = tf.matmul(tf.transpose(a),x);
        S = S+tf.math.log(1+tf.math.exp(-b*dot))
    return S;
#Defining the Hessian class for the above loss function in x0
x = tf.Variable(0.1*np.ones((119,1),dtype=np.float32))
H =  Hessian(Loss,x)
grad = H.grad().numpy();
print("Computed the  first gradient ...")
q = H.pCG(grad,10,2,tol=1e-4,itmax=20);
print("Computed search search diratcion ...")
print("Entering the Netwton optimization loop")
for it in tqdm(range(itmax)):
    x = x - tf.constant(step_size,dtype=np.float32)*tf.Variable(q,dtype=np.float32);
    x =  tf.Variable(x)
    if it%10 == 0:
        print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));
    if np.linalg.norm(grad)<tol:
        break
    H =  Hessian(Loss,x)
    grad = H.grad().numpy();
    q = H.pCG(grad,10,2,tol=1e-4,itmax=20);

Computed the  first gradient ...


  0%|          | 0/100 [00:00<?, ?it/s]

Computed search search diratcion ...
Entering the Netwton optimization loop
Lost funciton at this iteration [[107.55325]]  and gradient norm 148.99732971191406


 10%|█         | 10/100 [02:28<22:37, 15.08s/it]

Lost funciton at this iteration [[47.29411]]  and gradient norm 50.6136360168457


 20%|██        | 20/100 [05:01<19:46, 14.83s/it]

Lost funciton at this iteration [[39.568394]]  and gradient norm 20.326595306396484


 30%|███       | 30/100 [07:35<18:12, 15.61s/it]

Lost funciton at this iteration [[36.40213]]  and gradient norm 9.324344635009766


 34%|███▍      | 34/100 [08:43<19:02, 17.31s/it]

Max itetation reached !


 40%|████      | 40/100 [10:07<14:23, 14.39s/it]

Lost funciton at this iteration [[35.06703]]  and gradient norm 6.246491432189941


 45%|████▌     | 45/100 [11:32<16:30, 18.01s/it]

Max itetation reached !


 50%|█████     | 50/100 [12:48<13:08, 15.78s/it]

Lost funciton at this iteration [[33.996437]]  and gradient norm 5.34960412979126


 60%|██████    | 60/100 [15:16<09:51, 14.79s/it]

Lost funciton at this iteration [[33.292046]]  and gradient norm 4.903813362121582


 70%|███████   | 70/100 [17:51<07:36, 15.23s/it]

Lost funciton at this iteration [[33.510406]]  and gradient norm 4.561596870422363


 80%|████████  | 80/100 [20:12<04:48, 14.43s/it]

Lost funciton at this iteration [[33.546192]]  and gradient norm 4.460367202758789


 90%|█████████ | 90/100 [22:35<02:18, 13.83s/it]

Lost funciton at this iteration [[34.895325]]  and gradient norm 4.901057243347168


100%|██████████| 100/100 [25:01<00:00, 15.02s/it]


In [9]:
print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));

Lost funciton at this iteration [[35.192173]]  and gradient norm 4.761678218841553
