-sandbox
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px; height: 163px">
</div>

## Horovod Lab

In this notebook, we are going to take the Boston Dataset that we were working with earlier, but change it to train across the cluster, instead of just on the driver.

Let's start by reading in the data, and converting it to a Spark DataFrame.

In [3]:
import pandas as pd
from pyspark.ml.feature import VectorAssembler
from pyspark.sql.functions import *
from tensorflow import keras 

(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data()

pdTrain = pd.concat([pd.DataFrame(x_train), pd.DataFrame(y_train, columns=["label"])], axis=1)
pdTest = pd.concat([pd.DataFrame(x_test), pd.DataFrame(y_test, columns=["label"])], axis=1)

trainDF = spark.createDataFrame(pdTrain)
testDF = spark.createDataFrame(pdTest)

## Types

There are a few limitations with the current implementation of Horovod. One of them is that the datatypes have to be of float type, and for inference, it needs to be an array type instead of a Vector. We have already done those transformations for you below.

In [5]:
trainDF = trainDF.select([col(c).cast("float") for c in trainDF.columns])
testDF = testDF.select([col(c).cast("float") for c in testDF.columns]) 
# Must to train on float datatype (if use VectorAssembler, automatically does the conversion), but label needs to be converted too

vec = VectorAssembler(inputCols=trainDF.columns[:-1], outputCol="features")
trainDF = (vec.transform(trainDF)
           .select("features", "label")
           .withColumn("isVal", when(rand() > 0.8, True).otherwise(False)))

# If want inference to work, must use Array instead of Vector type
testDF = testDF.select(array("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12").alias("features"), "label") 

## Model_fn

Create the model_fn below. You have 2 options:
* Write the model_fn directly in Tensorflow
* Use a pre-made estimator from Tensorflow, and extract the model_fn

Your model should be a regression model, with 50 and 20 hidden units. They should use `relu` as the activation function.

In [7]:
from sparkdl.estimators.horovod_estimator.estimator import HorovodEstimator

help(HorovodEstimator)

In [8]:
# TODO

import horovod    
import tensorflow as tf

tf.set_random_seed(seed=42)

def model_fn(features, labels, mode, params):
    # FILL IN

## HorovodEstimator

Create a HorovodEstimator. You need to specify `modelFn`, `featureMapping`, `modelDir` `labelCol`, `batchSize`, `maxSteps`, `isValidationCol`, and you can optionally specify         `saveCheckpointsSecs`.

Set `batchSize`=64, and `maxSteps`=100 to start off with.

In [10]:
# Create Model Directory
import time
model_dir = "/tmp/horovodDemo/" + str(int(time.time())) # Have to use local path
print(model_dir)

In [11]:
# TODO

est = HorovodEstimator(<FILL_IN>)

In [12]:
transformer = est.fit(trainDF)

In [13]:
res = transformer.transform(testDF)
display(res)

## Extract Predictions and Evaluate

In [15]:
from pyspark.sql.types import FloatType

def _getPrediction(v): # Need to get element out of list to do RMSE below
  return float(v[0])
getPrediction = udf(_getPrediction, FloatType())

predDF = res.select(getPrediction("predictions").alias("prediction"), "label")
display(predDF)

In [16]:
from pyspark.ml.evaluation import RegressionEvaluator

regEval = RegressionEvaluator(predictionCol='prediction', labelCol='label', metricName='mse')

regEval.evaluate(predDF)

## Improve

Go back and modify the model_fn, or the HorovodEstimator parameters to reduce your MSE!

-sandbox
&copy; 2018 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>