# Bayesian Neural Network (VI) for classification - Distributed Training

```
# Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
#   Licensed under the Apache License, Version 2.0 (the "License").
#   You may not use this file except in compliance with the License.
#   A copy of the License is located at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   or in the "license" file accompanying this file. This file is distributed
#   on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
#   express or implied. See the License for the specific language governing
#   permissions and limitations under the License.
# ==============================================================================
```

The following example follows the same example from [Bayesian Neural Network (VI) for classification](bnn_classification.ipynb), with implementation of Horovod's distributed training.

In [None]:
import warnings
warnings.filterwarnings('ignore')
import mxfusion as mf
import mxnet as mx
import numpy as np
import mxnet.gluon.nn as nn
import mxfusion.components
import mxfusion.inference

First of all, initialize Horovod with <tt>hvd.init()</tt>. We also want to set the global context to GPU or CPU depends where the code is executed.

In [None]:
import horovod.mxnet as hvd
import mxnet as mx
hvd.init()
mx.context.Context.default_ctx = mx.gpu(hvd.local_rank()) if mx.test_utils.list_gpus() else mx.cpu()

### Generate Synthetic Data

In [None]:
import GPy
from pylab import *
import matplotlib.pyplot as plt

np.random.seed(4)
k = GPy.kern.RBF(1, lengthscale=0.1)
x = np.random.rand(200,1)
y = np.random.multivariate_normal(mean=np.zeros((200,)), cov=k.K(x), size=(1,)).T>0.
plt.plot(x[:,0], y[:,0], '.')

In [None]:
D = 10
net = nn.HybridSequential(prefix='nn_')
with net.name_scope():
    net.add(nn.Dense(D, activation="tanh", flatten=False, in_units=1))
    net.add(nn.Dense(D, activation="tanh", flatten=False, in_units=D))
    net.add(nn.Dense(2, flatten=False, in_units=D))
net.initialize(mx.init.Xavier(magnitude=1))

In [None]:
from mxfusion.components.variables.var_trans import PositiveTransformation
from mxfusion.inference import VariationalPosteriorForwardSampling
from mxfusion.components.functions.operators import broadcast_to
from mxfusion.components.distributions import Normal, Categorical
from mxfusion import Variable, Model
from mxfusion.components.functions import MXFusionGluonFunction

In [None]:
m = Model()
m.N = Variable()
m.f = MXFusionGluonFunction(net, num_outputs=1, broadcastable=False)
m.x = Variable(shape=(m.N,1))
m.r = m.f(m.x)
for _,v in m.r.factor.parameters.items():
    v.set_prior(Normal(mean=broadcast_to(mx.nd.array([0]), v.shape),
                       variance=broadcast_to(mx.nd.array([1.]), v.shape)))
m.y = Categorical.define_variable(log_prob=m.r, shape=(m.N,1), num_classes=2)

In [None]:
from mxfusion.inference import DistributedBatchInferenceLoop, create_Gaussian_meanfield, DistributedGradBasedInference, StochasticVariationalInference, MAP

To allow distributed training instead of single processor training, the inference class used would be <tt>DistributedGradBasedInference</tt>. The default <tt>grad_loop</tt> of <tt>DistributedGradBasedInference</tt> is <tt>DistributedBatchInferenceLoop</tt>, as opposed to <tt>GradBasedInference</tt>, which is <tt>BatchInferenceLoop</tt>.

Note that currently the code is not running distributed training in Horovod as we are still not running <tt>horovodrun</tt> or <tt>mpirun</tt> command from our system.

In [None]:
observed = [m.y, m.x]
q = create_Gaussian_meanfield(model=m, observed=observed)
alg = StochasticVariationalInference(num_samples=5, model=m, posterior=q, observed=observed)
infr = DistributedGradBasedInference(inference_algorithm=alg, grad_loop=DistributedBatchInferenceLoop())

In [None]:
infr.initialize(y=mx.nd.array(y), x=mx.nd.array(x))

In [None]:
for v_name, v in m.r.factor.parameters.items():
    infr.params[q[v].factor.mean] = net.collect_params()[v_name].data()
    infr.params[q[v].factor.variance] = mx.nd.ones_like(infr.params[q[v].factor.variance])*1e-6

In [None]:
infr.run(max_iter=500, learning_rate=1e-1, y=mx.nd.array(y), x=mx.nd.array(x), verbose=True)

In [None]:
# for uuid, v in infr.inference_algorithm.posterior.variables.items():
#     if uuid in infr.params.param_dict:
#         print(v.name, infr.params[v])

In [None]:
xt = np.linspace(0,1,100)[:,None]

In [None]:
infr2 = VariationalPosteriorForwardSampling(10, [m.x], infr, [m.r])
res = infr2.run(x=mx.nd.array(xt))

In [None]:
yt = res[0].asnumpy()

In [None]:
yt_mean = yt.mean(0)
yt_std = yt.std(0)
for i in range(yt.shape[0]):
    plt.plot(xt[:,0],1./(1+np.exp(yt[i,:,0]-yt[i,:,1])),'k',alpha=0.2)
plt.plot(x[:,0],y[:,0],'.')
plt.show()

## Running Horovod

Currently, the only way to execute Horovod in MXFusion is via <tt>horovodrun</tt> or <tt>mpirun</tt> command from the system. Hence, we can first convert this notebook into Python file then execute the Python file with command line.

In [None]:
!jupyter nbconvert --to script bnn_classification-distributed.ipynb

To run it on Horovod and allow distributed training, we should run <tt>horovodrun</tt> or <tt>mpirun</tt> from our system while specifying the number of processors. More details about running Horovod can be found [here](https://github.com/horovod/horovod/blob/master/docs/running.rst). A simple way to run it is with the format: <br><tt>horovodrun -np {number of processors} -H localhost:4 python {python file}</tt>

NOTE : Please restart this notebook before executing the code below.

In [None]:
!mpirun -np 4 -H localhost:4 python bnn_classification-distributed.py