## Federated Newton Learn
In this section we present the algorithm named Federated Newton Learning (FEDNL) introduced in [2].

| Flavour | tol | iteration |
| :--- | --- | --- |
| Vanilla Newton | 1e-4 | 4 |
| Rank 1 Compression | 1e-4 | 10 |
| Rank 1 Compression Diagonal Regularisation Identity Initial Hessian | 1e-4 | >50 |
| Rank 1 Compression Diagonal Regularisation Null Initial Hessian | 1e-4 | >50 |

In [3]:
from ipyparallel import Client
c = Client()
c.ids

[0, 1]

### Vanilla Newton
First we reimplement the vanilla Newton method in the framework of FEDNL.

In [2]:
%%px
import tensorflow as tf
import numpy as np
import scipy.linalg as la
import pandas as pd
import matplotlib.pyplot as plt
from numsa.TFHessian import *
import dsdl

comm = MPI.COMM_WORLD

ds = dsdl.load("a1a")

X, Y = ds.get_train()
indx = np.array_split(range(X.shape[0]),int(comm.Get_size()));
tfX = []
tfY = []
for k in range(len(indx)):
    tfX = tfX + [tf.sparse.from_dense(np.array(X[indx[comm.Get_rank()]].todense(), dtype=np.float32))]
    tfY = tfY + [tf.convert_to_tensor(np.array(Y[indx[comm.Get_rank()]], dtype=np.float32).reshape(X[indx[comm.Get_rank()]].shape[0], 1))]

tfXs = tf.sparse.from_dense(np.array(X.todense(), dtype=np.float32))
tfYs = tf.convert_to_tensor(np.array(Y, dtype=np.float32).reshape(X.shape[0], 1))
#Defining the Loss Function
def LossSerial(x):
    lam = 1e-3; #Regularisation
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfXs, x, adjoint_a=False)
    Z = tf.math.multiply(tfYs, Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfXs.shape[0]) + lam*tf.norm(x)**2

    return S
#Defining the Loss Function
def Loss(x,comm):
    lam = 1e-3; #Regularisation
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfX[comm.Get_rank()], x, adjoint_a=False)
    Z = tf.math.multiply(tfY[comm.Get_rank()], Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfX[comm.Get_rank()].shape[0]) + lam*tf.norm(x)**2
    return S
################! Setting Of The Solver!##################
itmax = 50
tol = 1e-4;
step_size=1;
###########################################################
x = tf.Variable(0.1*np.ones((119,1),dtype=np.float32))

H = Hessian(Loss,x);
H.shift(x)#,start=0*np.identity(x.numpy().shape[0])) #We initialize the shifter
#We now collect and average the loc Hessians in the master node (rk 0)
Hs = H.comm.gather(H.memH, root=0);
if H.comm.Get_rank()==0:
    Hm = (1/len(Hs))*np.sum(Hs,0);
else:
    Hm = None
print("The master Hessian has been initialised")
for it in tqdm(range(itmax)):
    # Obtaining the compression of the difference between local mat
    # and next local mat.
    U,sigma,Vt,ell = H.shift(x,{"comp":MatSVDCompDiag,"rk":119,"type":"mat"});
    shift = Vt.transpose()@np.diag(sigma)@U.transpose();
    #print("Updating local Hessian")
    H.memH = H.memH+step_size*shift;
    grad = H.grad().numpy();
    #Now we update the master Hessian and perform the Newton method step
    Shifts = H.comm.gather(shift, root=0);
    Grads = H.comm.gather(grad, root=0);
    Ells = H.comm.gather(ell, root=0);
    if H.comm.Get_rank() == 0:
        #print("Computing the avarage of the local shifts and grad ...")
        Shift = (1/len(Shifts))*np.sum(Shifts,0);
        Grad = (1/len(Grads))*np.sum(Grads,0);
        Ell = (1/len(Ells))*np.sum(Ells,0);
        res = np.linalg.norm(Grad);
        #print("Computing the master Hessian ...")
        Hm = Hm + step_size*Shift;
        #print("Searching new search direction ...")
        A = Hm; #A = Hm + Ell*np.identity(Hm.shape[0]);
        q = np.linalg.solve(A,Grad);
        #print("Found search dir, ",q);
        if it%1 == 0:
            print("(FedNL) [Iteration. {}] Lost funciton at this iteration {}  and gradient norm {}".format(it,LossSerial(x),np.linalg.norm(Grad)));
        x = x - tf.Variable(q,dtype=np.float32);
        x =  tf.Variable(x)
    else:
        res = None
    #Distributing the search direction
    x = H.comm.bcast(x,root=0)
    res = H.comm.bcast(res,root=0)
    if res<tol:
            break
print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x,H.comm),np.linalg.norm(grad)))

%px:   0%|          | 0/2 [00:00<?, ?tasks/s]

[stderr:1]   8%|▊         | 4/50 [00:15<03:03,  3.98s/it]


[stderr:0]   8%|▊         | 4/50 [00:15<03:03,  3.98s/it]


[stdout:0] The master Hessian has been initialised
(FedNL) [Iteration. 0] Lost funciton at this iteration 1.2675940990447998  and gradient norm 1.388305902481079
(FedNL) [Iteration. 1] Lost funciton at this iteration 0.3603573739528656  and gradient norm 0.15029960870742798
(FedNL) [Iteration. 2] Lost funciton at this iteration 0.3375653922557831  and gradient norm 0.019174691289663315
(FedNL) [Iteration. 3] Lost funciton at this iteration 0.3369176983833313  and gradient norm 0.0008387075504288077
(FedNL) [Iteration. 4] Lost funciton at this iteration 0.33691510558128357  and gradient norm 4.160287971899379e-06
Lost funciton at this iteration 0.35959863662719727  and gradient norm 0.02014993503689766


[stdout:1] The master Hessian has been initialised
Lost funciton at this iteration 0.3142033517360687  and gradient norm 0.020152254030108452


### FEDNL Rank 1 Compression

In [3]:
%%px
import tensorflow as tf
import numpy as np
import scipy.linalg as la
import pandas as pd
import matplotlib.pyplot as plt
from numsa.TFHessian import *
import dsdl

comm = MPI.COMM_WORLD

ds = dsdl.load("a1a")

X, Y = ds.get_train()
indx = np.array_split(range(X.shape[0]),int(comm.Get_size()));
tfX = []
tfY = []
for k in range(len(indx)):
    tfX = tfX + [tf.sparse.from_dense(np.array(X[indx[comm.Get_rank()]].todense(), dtype=np.float32))]
    tfY = tfY + [tf.convert_to_tensor(np.array(Y[indx[comm.Get_rank()]], dtype=np.float32).reshape(X[indx[comm.Get_rank()]].shape[0], 1))]

tfXs = tf.sparse.from_dense(np.array(X.todense(), dtype=np.float32))
tfYs = tf.convert_to_tensor(np.array(Y, dtype=np.float32).reshape(X.shape[0], 1))
#Defining the Loss Function
def LossSerial(x):
    lam = 1e-3; #Regularisation
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfXs, x, adjoint_a=False)
    Z = tf.math.multiply(tfYs, Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfXs.shape[0]) + lam*tf.norm(x)**2

    return S
#Defining the Loss Function
def Loss(x,comm):
    lam = 1e-3; #Regularisation
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfX[comm.Get_rank()], x, adjoint_a=False)
    Z = tf.math.multiply(tfY[comm.Get_rank()], Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfX[comm.Get_rank()].shape[0]) + lam*tf.norm(x)**2
    return S
################! Setting Of The Solver!##################
itmax = 50
tol = 1e-4;
step_size=1;
###########################################################
x = tf.Variable(0.1*np.ones((119,1),dtype=np.float32))

H = Hessian(Loss,x);
H.shift(x)#,start=0*np.identity(x.numpy().shape[0])) #We initialize the shifter
#We now collect and average the loc Hessians in the master node (rk 0)
Hs = H.comm.gather(H.memH, root=0);
if H.comm.Get_rank()==0:
    Hm = (1/len(Hs))*np.sum(Hs,0);
else:
    Hm = None
print("The master Hessian has been initialised")
for it in tqdm(range(itmax)):
    # Obtaining the compression of the difference between local mat
    # and next local mat.
    U,sigma,Vt,ell = H.shift(x,{"comp":MatSVDCompDiag,"rk":1,"type":"mat"});
    shift = Vt.transpose()@np.diag(sigma)@U.transpose();
    #print("Updating local Hessian")
    H.memH = H.memH+step_size*shift;
    grad = H.grad().numpy();
    #Now we update the master Hessian and perform the Newton method step
    Shifts = H.comm.gather(shift, root=0);
    Grads = H.comm.gather(grad, root=0);
    Ells = H.comm.gather(ell, root=0);
    if H.comm.Get_rank() == 0:
        #print("Computing the avarage of the local shifts and grad ...")
        Shift = (1/len(Shifts))*np.sum(Shifts,0);
        Grad = (1/len(Grads))*np.sum(Grads,0);
        Ell = (1/len(Ells))*np.sum(Ells,0);
        res = np.linalg.norm(Grad);
        #print("Computing the master Hessian ...")
        Hm = Hm + step_size*Shift;
        #print("Searching new search direction ...")
        A = Hm; #A = Hm + Ell*np.identity(Hm.shape[0]);
        q = np.linalg.solve(A,Grad);
        #print("Found search dir, ",q);
        if it%1 == 0:
            print("(FedNL) [Iteration. {}] Lost funciton at this iteration {}  and gradient norm {}".format(it,LossSerial(x),np.linalg.norm(Grad)));
        x = x - tf.Variable(q,dtype=np.float32);
        x =  tf.Variable(x)
    else:
        res = None
    #Distributing the search direction
    x = H.comm.bcast(x,root=0)
    res = H.comm.bcast(res,root=0)
    if res<tol:
            break
print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x,H.comm),np.linalg.norm(grad)))

%px:   0%|          | 0/2 [00:00<?, ?tasks/s]

[stderr:0]  20%|██        | 10/50 [00:35<02:22,  3.56s/it]


[stderr:1]  20%|██        | 10/50 [00:35<02:22,  3.56s/it]


[stdout:0] The master Hessian has been initialised
(FedNL) [Iteration. 0] Lost funciton at this iteration 1.2675940990447998  and gradient norm 1.388305902481079
(FedNL) [Iteration. 1] Lost funciton at this iteration 0.3603573739528656  and gradient norm 0.15029960870742798
(FedNL) [Iteration. 2] Lost funciton at this iteration 0.3409968614578247  and gradient norm 0.01779988408088684
(FedNL) [Iteration. 3] Lost funciton at this iteration 0.3384902775287628  and gradient norm 0.007210684008896351
(FedNL) [Iteration. 4] Lost funciton at this iteration 0.33764347434043884  and gradient norm 0.004843716975301504
(FedNL) [Iteration. 5] Lost funciton at this iteration 0.33708077669143677  and gradient norm 0.0022912921849638224
(FedNL) [Iteration. 6] Lost funciton at this iteration 0.3369518518447876  and gradient norm 0.0010489925043657422
(FedNL) [Iteration. 7] Lost funciton at this iteration 0.3369239270687103  and gradient norm 0.0004203339631203562
(FedNL) [Iteration. 8] Lost funciton 

[stdout:1] The master Hessian has been initialised
Lost funciton at this iteration 0.3142067790031433  and gradient norm 0.02014790289103985


### FEDNL With Diagonal Regularisation And Different Initial Hessian

In [4]:
%%px
import tensorflow as tf
import numpy as np
import scipy.linalg as la
import pandas as pd
import matplotlib.pyplot as plt
from numsa.TFHessian import *
import dsdl

comm = MPI.COMM_WORLD

ds = dsdl.load("a1a")

X, Y = ds.get_train()
indx = np.array_split(range(X.shape[0]),int(comm.Get_size()));
tfX = []
tfY = []
for k in range(len(indx)):
    tfX = tfX + [tf.sparse.from_dense(np.array(X[indx[comm.Get_rank()]].todense(), dtype=np.float32))]
    tfY = tfY + [tf.convert_to_tensor(np.array(Y[indx[comm.Get_rank()]], dtype=np.float32).reshape(X[indx[comm.Get_rank()]].shape[0], 1))]

tfXs = tf.sparse.from_dense(np.array(X.todense(), dtype=np.float32))
tfYs = tf.convert_to_tensor(np.array(Y, dtype=np.float32).reshape(X.shape[0], 1))
#Defining the Loss Function
def LossSerial(x):
    lam = 1e-3; #Regularisation
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfXs, x, adjoint_a=False)
    Z = tf.math.multiply(tfYs, Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfXs.shape[0]) + lam*tf.norm(x)**2

    return S
#Defining the Loss Function
def Loss(x,comm):
    lam = 1e-3; #Regularisation
    x = tf.reshape(x, (119, 1))
    Z = tf.sparse.sparse_dense_matmul(tfX[comm.Get_rank()], x, adjoint_a=False)
    Z = tf.math.multiply(tfY[comm.Get_rank()], Z)
    S = tf.reduce_sum(tf.math.log(1 + tf.math.exp(-Z)) / tfX[comm.Get_rank()].shape[0]) + lam*tf.norm(x)**2
    return S
################! Setting Of The Solver!##################
itmax = 50
tol = 1e-4;
step_size=1;
###########################################################
x = tf.Variable(0.1*np.ones((119,1),dtype=np.float32))

H = Hessian(Loss,x);
H.shift(x,start=0*np.identity(x.numpy().shape[0])) #We initialize the shifter
#We now collect and average the loc Hessians in the master node (rk 0)
Hs = H.comm.gather(H.memH, root=0);
if H.comm.Get_rank()==0:
    Hm = (1/len(Hs))*np.sum(Hs,0);
else:
    Hm = None
print("The master Hessian has been initialised")
for it in tqdm(range(itmax)):
    # Obtaining the compression of the difference between local mat
    # and next local mat.
    U,sigma,Vt,ell = H.shift(x,{"comp":MatSVDCompDiag,"rk":1,"type":"mat"});
    shift = Vt.transpose()@np.diag(sigma)@U.transpose();
    #print("Updating local Hessian")
    H.memH = H.memH+step_size*shift;
    grad = H.grad().numpy();
    #Now we update the master Hessian and perform the Newton method step
    Shifts = H.comm.gather(shift, root=0);
    Grads = H.comm.gather(grad, root=0);
    Ells = H.comm.gather(ell, root=0);
    if H.comm.Get_rank() == 0:
        #print("Computing the avarage of the local shifts and grad ...")
        Shift = (1/len(Shifts))*np.sum(Shifts,0);
        Grad = (1/len(Grads))*np.sum(Grads,0);
        Ell = (1/len(Ells))*np.sum(Ells,0);
        res = np.linalg.norm(Grad);
        #print("Computing the master Hessian ...")
        Hm = Hm + step_size*Shift;
        #print("Searching new search direction ...")
        A = Hm + Ell*np.identity(Hm.shape[0]);
        q = np.linalg.solve(A,Grad);
        #print("Found search dir, ",q);
        if it%1 == 0:
            print("(FedNL) [Iteration. {}] Lost funciton at this iteration {}  and gradient norm {}".format(it,LossSerial(x),np.linalg.norm(Grad)));
        x = x - tf.Variable(q,dtype=np.float32);
        x =  tf.Variable(x)
    else:
        res = None
    #Distributing the search direction
    x = H.comm.bcast(x,root=0)
    res = H.comm.bcast(res,root=0)
    if res<tol:
            break
print("Lost funciton at this iteration {}  and gradient norm {}".format(Loss(x,H.comm),np.linalg.norm(grad)))

%px:   0%|          | 0/2 [00:00<?, ?tasks/s]

[stderr:1] 100%|██████████| 50/50 [02:45<00:00,  3.31s/it]


[stderr:0] 100%|██████████| 50/50 [02:45<00:00,  3.31s/it]


[stdout:0] The master Hessian has been initialised
(FedNL) [Iteration. 0] Lost funciton at this iteration 1.2675940990447998  and gradient norm 1.388305902481079
(FedNL) [Iteration. 1] Lost funciton at this iteration 0.595215380191803  and gradient norm 0.47412794828414917
(FedNL) [Iteration. 2] Lost funciton at this iteration 0.48810920119285583  and gradient norm 0.20943771302700043
(FedNL) [Iteration. 3] Lost funciton at this iteration 0.4362362027168274  and gradient norm 0.11586670577526093
(FedNL) [Iteration. 4] Lost funciton at this iteration 0.41390082240104675  and gradient norm 0.08477818965911865
(FedNL) [Iteration. 5] Lost funciton at this iteration 0.3987269699573517  and gradient norm 0.0656685009598732
(FedNL) [Iteration. 6] Lost funciton at this iteration 0.3850094676017761  and gradient norm 0.051555052399635315
(FedNL) [Iteration. 7] Lost funciton at this iteration 0.3753131330013275  and gradient norm 0.041504524648189545
(FedNL) [Iteration. 8] Lost funciton at this 

[stdout:1] The master Hessian has been initialised
Lost funciton at this iteration 0.3139961063861847  and gradient norm 0.020151183009147644
