## Newton Method

### Quadratic Energy Minimization
We are minimizing the following energy functional, using a Netwon method based on the TF Hessian library.
$$J(x,y) = x^2y^2+xy$$

which is the unique stationary point of $\nabla J$ given the fact that $J(x,y)$ is convex.

In [1]:
#We import all the library we are gona need
import tensorflow as tf
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from numsa.TFHessian import *

In [2]:
itmax = 10; # Number of epoch.
tol = 1e-8
step_size = 1; #Learning rate
def Loss(x):
    return (x[0]**2)*(x[1]**2)+x[0]*x[1];
#Defining the Hessian class for the above loss function in x0
x = tf.Variable(0.1*np.ones((2,1),dtype=np.float32))
H =  Hessian(Loss,x)
grad = H.grad().numpy();
print("Lost funciton at this iteration {}, gradient norm {} and is achived at point {}"
      .format(Loss(x),np.linalg.norm(grad),x));
print("Computed the  first gradient ...")
q = H.pCG(grad,1,1,tol=tol,itmax=100);
print("Computed search search diratcion ...")
print("Entering the Netwton optimization loop")
for it in tqdm(range(itmax)):
    x = x - tf.constant(step_size,dtype=np.float32)*tf.Variable(q,dtype=np.float32);
    x =  tf.Variable(x)
    if it%50 == 0:
        print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));
    if np.linalg.norm(grad)<tol:
        print("Lost funciton at this iteration {}, gradient norm {} and is achived at point {}"
      .format(Loss(x),np.linalg.norm(grad),x));
        break
    H =  Hessian(Loss,x)
    grad = H.grad().numpy();
    q = H.pCG(grad,1,1,tol=tol,itmax=100);

Lost funciton at this iteration [0.0101], gradient norm 0.1442497819662094 and is achived at point <tf.Variable 'Variable:0' shape=(2, 1) dtype=float32, numpy=
array([[0.1],
       [0.1]], dtype=float32)>
Computed the  first gradient ...


 10%|█         | 1/10 [00:00<00:01,  8.99it/s]

Computed search search diratcion ...
Entering the Netwton optimization loop
Lost funciton at this iteration [1.4239697e-05]  and gradient norm 0.1442497819662094


100%|██████████| 10/10 [00:01<00:00,  9.40it/s]


### Regression
Our objective is to minimize the function:
\begin{equation}
    f(\vec{x}) = \frac{1}{m} \sum_{i=1}^m \log\Bigg(1+\exp\Big(-b_j \vec{a_j}^T\vec{x}\Big)\Bigg)\qquad for \; x \in \mathbb{R}^d
\end{equation}

where $d$ is the feature number and $\vec{a}_j$ are the data while $b_j$ are the labels.
Now we would like to this applying the newton method to find a point that minimize such a function. This is possible because since $f$ is convex, all stationary points are minimizers and we search for the "roots" of the equation $\nabla f=0$.
The newton method we implement is of the form,
\begin{equation}
    \vec{x}_{n+1} = \vec{x}_n -\gamma Hf(\vec{x}_n)^{-1}\nabla f(\vec{x}_n)
\end{equation}

where $\gamma$ is the step size.
We solve the system $Hf(\vec{x}_n)q=\nabla f(\vec{x}_n)$ using the CG method where as a preconditioned we have taken a the inverse of $Hf(\vec{x}_n)$ computed using the random SVD presented in [1].

In [4]:
#We import all the library we are gona need
import tensorflow as tf
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from numsa.TFHessian import *
import dsdl

In [5]:
ds = dsdl.load("a1a")

X, Y = ds.get_train()
print(X.shape, Y.shape)

(1605, 119) (1605,)


In [8]:
#Setting the parameter of this run, we will use optimization nomeclature not ML one.
itmax = 100; # Number of epoch.
tol = 1e-4
step_size = 0.2; #Learning rate
Err = [];
"""
#Old Lost Function, intesead using Stefano's.
#Defining the Loss Function
def Loss(x):
    S = tf.Variable(0.0);
    for j in range(X.shape[0]):
        a = tf.constant((X[j,:].todense().reshape(119,1)),dtype=np.float32);
        b = tf.constant(Y[j],dtype=np.float32)
        a = tf.reshape(a,(119,1));
        x = tf.reshape(x,(119,1));
        dot = tf.matmul(tf.transpose(a),x);
        S = S+tf.math.log(1+tf.math.exp(-b*dot))
    S = (1/X.shape[0])*S;
    return S;
"""
tfX = tf.sparse.from_dense(np.array(X.todense(), dtype=np.float32))
tfY = tf.convert_to_tensor(np.array(Y, dtype=np.float32).reshape(X.shape[0], 1))


#Defining the Loss Function
def Loss(x):
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfX, x, adjoint_a=False)
    Z = tf.math.multiply(tfY, Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfX.shape[0])
    return S

#Defining the Hessian class for the above loss function in x0
x = tf.Variable(0.1*np.ones((119,1),dtype=np.float32))
H =  Hessian(Loss,x)
grad = H.grad().numpy();
print("Computed the  first gradient ...")
q = grad #H.pCG(grad,10,2,tol=1e-3,itmax=10);
print("Computed search search diratcion ...")
for it in tqdm(range(itmax)):
    x = x - tf.constant(step_size,dtype=np.float32)*tf.Variable(q,dtype=np.float32);
    x =  tf.Variable(x)
    if it%50 == 0:
        print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));
    if np.linalg.norm(grad)<tol:
        break
    H =  Hessian(Loss,x)
    grad = H.grad().numpy();
    q = grad #H.pCG(grad,10,2,tol=1e-3,itmax=10);
itmax = 100; # Number of epoch.
for it in tqdm(range(itmax)):
    x = x - tf.constant(step_size,dtype=np.float32)*tf.Variable(q,dtype=np.float32);
    x =  tf.Variable(x)
    Err = Err + [np.linalg.norm(grad)];
    if it%10 == 0:
        print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));
    if np.linalg.norm(grad)<tol:
        break
    H =  Hessian(Loss,x)
    grad = H.grad().numpy();
    q = H.pCG(grad,65,10,tol=1e-4,itmax=100);

 20%|██        | 20/100 [00:00<00:00, 194.09it/s]

Computed the  first gradient ...
Computed search search diratcion ...
Lost funciton at this iteration 0.9243690371513367  and gradient norm 1.3872039318084717


 88%|████████▊ | 88/100 [00:00<00:00, 214.75it/s]

Lost funciton at this iteration 0.3988267183303833  and gradient norm 0.07041343301534653


100%|██████████| 100/100 [00:00<00:00, 211.34it/s]
  0%|          | 0/100 [00:00<?, ?it/s]

Lost funciton at this iteration 0.37019962072372437  and gradient norm 0.041932351887226105


 10%|█         | 10/100 [00:41<06:14,  4.16s/it]

Lost funciton at this iteration 0.3065865635871887  and gradient norm 0.0067643579095602036


 20%|██        | 20/100 [01:23<05:33,  4.17s/it]

Lost funciton at this iteration 0.30081042647361755  and gradient norm 0.001130763441324234


 30%|███       | 30/100 [02:05<04:58,  4.27s/it]

Lost funciton at this iteration 0.2990850806236267  and gradient norm 0.0003155212034471333


 40%|████      | 40/100 [02:49<04:27,  4.47s/it]

Lost funciton at this iteration 0.2986849844455719  and gradient norm 0.00022607189021073282


 50%|█████     | 50/100 [03:31<03:23,  4.08s/it]

Lost funciton at this iteration 0.29835769534111023  and gradient norm 0.00010432228009449318


 51%|█████     | 51/100 [03:35<03:26,  4.22s/it]


In [11]:
print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));

Lost funciton at this iteration 0.30211928486824036  and gradient norm 0.000638214813079685


### Quasi-Newton Method

In [None]:
#We import all the library we are gona need
import tensorflow as tf
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from numsa.TFHessian import *
import dsdl

In [2]:
ds = dsdl.load("a1a")

X, Y = ds.get_train()
print(X.shape, Y.shape)

(1605, 119) (1605,)


In [5]:
#Setting the parameter of this run, we will use optimization nomeclature not ML one.
itmax = 100; # Number of epoch.
tol = 1e-4
step_size = 0.2; #Learning rate
Err = [];
Hs = [];
"""
#Old Lost Function, intesead using Stefano's.
#Defining the Loss Function
def Loss(x):
    S = tf.Variable(0.0);
    for j in range(X.shape[0]):
        a = tf.constant((X[j,:].todense().reshape(119,1)),dtype=np.float32);
        b = tf.constant(Y[j],dtype=np.float32)
        a = tf.reshape(a,(119,1));
        x = tf.reshape(x,(119,1));
        dot = tf.matmul(tf.transpose(a),x);
        S = S+tf.math.log(1+tf.math.exp(-b*dot))
    S = (1/X.shape[0])*S;
    return S;
"""
tfX = tf.sparse.from_dense(np.array(X.todense(), dtype=np.float32))
tfY = tf.convert_to_tensor(np.array(Y, dtype=np.float32).reshape(X.shape[0], 1))


#Defining the Loss Function
def Loss(x):
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfX, x, adjoint_a=False)
    Z = tf.math.multiply(tfY, Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfX.shape[0])
    return S
#Defining the Hessian class for the above loss function in x0
x = tf.Variable(0.1*np.ones((119,1),dtype=np.float32))
H =  Hessian(Loss,x)
grad = H.grad().numpy();
print("Computed the  first gradient ...")
q = grad #H.pCG(grad,10,2,tol=1e-3,itmax=10);
print("Computed search search diratcion ...")
for it in tqdm(range(itmax)):
    x = x - tf.constant(step_size,dtype=np.float32)*tf.Variable(q,dtype=np.float32);
    x =  tf.Variable(x)
    if it%50 == 0:
        print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));
    if np.linalg.norm(grad)<tol:
        break
    H =  Hessian(Loss,x)
    grad = H.grad().numpy();
    q = grad #H.pCG(grad,10,2,tol=1e-3,itmax=10);
itmax = 100; # Number of epoch.
for it in tqdm(range(itmax)):
    x = x - tf.constant(step_size,dtype=np.float32)*tf.Variable(q,dtype=np.float32);
    x =  tf.Variable(x)
    Err = Err + [np.linalg.norm(grad)];
    if it%5 == 0:
        print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));
        Hs = Hs + [H.mat()];
    if np.linalg.norm(grad)<tol:
        break
    H =  Hessian(Loss,x)
    grad = H.grad().numpy();
    U, s, Vt = H.RandMatSVD(65,10)
    q = (Vt.transpose()@np.linalg.inv(np.diag(s))@U.transpose())@grad;

 15%|█▌        | 15/100 [00:00<00:00, 144.26it/s]

Computed the  first gradient ...
Computed search search diratcion ...
Lost funciton at this iteration 0.9243690371513367  and gradient norm 1.3872039318084717


 77%|███████▋  | 77/100 [00:00<00:00, 195.12it/s]

Lost funciton at this iteration 0.3988267183303833  and gradient norm 0.07041343301534653


100%|██████████| 100/100 [00:00<00:00, 187.89it/s]
  0%|          | 0/100 [00:00<?, ?it/s]

Lost funciton at this iteration 0.37019962072372437  and gradient norm 0.041932351887226105


  5%|▌         | 5/100 [00:23<06:58,  4.40s/it]

Lost funciton at this iteration 0.3212338089942932  and gradient norm 0.018444659188389778


 10%|█         | 10/100 [00:47<06:36,  4.40s/it]

Lost funciton at this iteration 0.3076651096343994  and gradient norm 0.0066759707406163216


 15%|█▌        | 15/100 [01:11<06:19,  4.46s/it]

Lost funciton at this iteration 0.30385345220565796  and gradient norm 0.0024821243714541197


 20%|██        | 20/100 [01:35<05:56,  4.45s/it]

Lost funciton at this iteration 0.30251801013946533  and gradient norm 0.0011156557593494654


 25%|██▌       | 25/100 [02:00<05:44,  4.60s/it]

Lost funciton at this iteration 0.3015819787979126  and gradient norm 0.0007295323302969337


 30%|███       | 30/100 [02:25<05:14,  4.50s/it]

Lost funciton at this iteration 0.2998315989971161  and gradient norm 0.0005139752174727619


 35%|███▌      | 35/100 [02:49<04:51,  4.49s/it]

Lost funciton at this iteration 0.29897013306617737  and gradient norm 0.0002672745322342962


 40%|████      | 40/100 [03:14<04:34,  4.57s/it]

Lost funciton at this iteration 0.29869553446769714  and gradient norm 0.00016679601685609668


 45%|████▌     | 45/100 [03:39<04:06,  4.48s/it]

Lost funciton at this iteration 0.2984585762023926  and gradient norm 0.00012047346535837278


 49%|████▉     | 49/100 [03:59<04:09,  4.89s/it]


In [7]:
np.save("Hs",Hs)
print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x),np.linalg.norm(grad)));

Lost funciton at this iteration 0.29835739731788635  and gradient norm 9.214008605340496e-05
