# Getting started with deep learning in Databricks: an end-to-end example using TensorFlow Keras, Hyperopt, and MLflow

This tutorial uses a small dataset to show how to use TensorFlow Keras, Hyperopt, and MLflow to develop a deep learning model in Databricks. 

It includes the following steps:
- Load and preprocess data
- Part 1. Create a neural network model with TensorFlow Keras and view training with inline TensorBoard
- Part 2. Perform automated hyperparameter tuning with Hyperopt and MLflow and use autologging to save results
- Part 3. Use the best set of hyperparameters to build a final model 
- Part 4. Register the model in MLflow and use the model to make predictions

### Setup
- Databricks Runtime for Machine Learning 7.0 or above. This notebook uses TensorBoard to display the results of neural network training. Depending on the version of Databricks Runtime you are using, you use different methods to start TensorBoard.

In [0]:
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
import mlflow
import mlflow.keras
import mlflow.tensorflow

[0;31m---------------------------------------------------------------------------[0m
[0;31mModuleNotFoundError[0m                       Traceback (most recent call last)
File [0;32m<command-3417228129599329>:1[0m
[0;32m----> 1[0m [38;5;28;01mimport[39;00m [38;5;21;01mtensorflow[39;00m [38;5;28;01mas[39;00m [38;5;21;01mtf[39;00m
[1;32m      2[0m [38;5;28;01mfrom[39;00m [38;5;21;01mtensorflow[39;00m[38;5;21;01m.[39;00m[38;5;21;01mkeras[39;00m[38;5;21;01m.[39;00m[38;5;21;01mlayers[39;00m [38;5;28;01mimport[39;00m Dense
[1;32m      3[0m [38;5;28;01mfrom[39;00m [38;5;21;01mtensorflow[39;00m[38;5;21;01m.[39;00m[38;5;21;01mkeras[39;00m[38;5;21;01m.[39;00m[38;5;21;01mmodels[39;00m [38;5;28;01mimport[39;00m Sequential

File [0;32m/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py:171[0m, in [0;36m_create_import_patch.<locals>.import_patch[0;34m(name, globals, locals, fromlist, level)[0m
[1;32m    166[0m threa

## Load and preprocess data
This example uses the California Housing dataset from `scikit-learn`.

In [0]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

cal_housing = fetch_california_housing()

# Split 80/20 train-test
X_train, X_test, y_train, y_test = train_test_split(cal_housing.data,
                                                    cal_housing.target,
                                                    test_size=0.2)



### Scale features
Feature scaling is important when working with neural networks. This notebook uses the `scikit-learn` function `StandardScaler`.

In [0]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)



## Part 1. Create model and view TensorBoard in notebook

### Create the neural network

In [0]:
def create_model():
  model = Sequential()
  model.add(Dense(20, input_dim=8, activation="relu"))
  model.add(Dense(20, activation="relu"))
  model.add(Dense(1, activation="linear"))
  return model



### Compile the model

In [0]:
model = create_model()

model.compile(loss="mse",
              optimizer="Adam",
              metrics=["mse"])



### Create callbacks

In [0]:
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping

# In the following lines, replace <username> with your username.
experiment_log_dir = "/dbfs/<username>/tb"
checkpoint_path = "/dbfs/<username>/keras_checkpoint_weights.ckpt"

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=experiment_log_dir)
model_checkpoint = ModelCheckpoint(filepath=checkpoint_path, verbose=1, save_best_only=True)
early_stopping = EarlyStopping(monitor="loss", mode="min", patience=3)

history = model.fit(X_train, y_train, validation_split=.2, epochs=35, callbacks=[tensorboard_callback, model_checkpoint, early_stopping])



### TensorBoard commands for Databricks Runtime 7.2 ML and above

When you start TensorBoard this way, it continues to run until you detach the notebook from the cluster.  
Note: to clear the TensorBoard between runs, use this command: `dbutils.fs.rm(experiment_log_dir.replace("/dbfs",""), recurse=True)`

In [0]:
%load_ext tensorboard
%tensorboard --logdir $experiment_log_dir



### TensorBoard commands for Databricks Runtime 7.1 ML and below

The command in the following cell displays a link that, when clicked, opens TensorBoard in a new tab.

When you start TensorBoard this way, it continues to run until you either stop it with `dbutils.tensorboard.stop()` or you shut down the cluster.

In [0]:
#dbutils.tensorboard.start(experiment_log_dir)



### Evaluate model on test dataset

In [0]:
model.evaluate(X_test, y_test)



## Part 2. Hyperparameter tuning with Hyperopt and MLflow
[Hyperopt](https://github.com/hyperopt/hyperopt) is a Python library for hyperparameter tuning. Databricks Runtime for Machine Learning includes an optimized and enhanced version of Hyperopt, including automated MLflow tracking. For more information about using Hyperopt, see the [Hyperopt documentation](https://github.com/hyperopt/hyperopt/wiki/FMin).

### Create neural network model using variables for number of nodes in hidden layers

In [0]:
def create_model(n):
  model = Sequential()
  model.add(Dense(int(n["dense_l1"]), input_dim=8, activation="relu"))
  model.add(Dense(int(n["dense_l2"]), activation="relu"))
  model.add(Dense(1, activation="linear"))
  return model



### Create Hyperopt objective function

In [0]:
from hyperopt import fmin, hp, tpe, STATUS_OK, SparkTrials

def runNN(n):
  # Import tensorflow 
  import tensorflow as tf
  
  # Log run information with mlflow.tensorflow.autolog()
  mlflow.tensorflow.autolog()
  
  model = create_model(n)

  # Select optimizer
  optimizer_call = getattr(tf.keras.optimizers, n["optimizer"])
  optimizer = optimizer_call(learning_rate=n["learning_rate"])
 
  # Compile model
  model.compile(loss="mse",
                optimizer=optimizer,
                metrics=["mse"])

  history = model.fit(X_train, y_train, validation_split=.2, epochs=10, verbose=2)

  # Evaluate the model
  score = model.evaluate(X_test, y_test, verbose=0)
  obj_metric = score[0]  
  return {"loss": obj_metric, "status": STATUS_OK}



### Define Hyperopt search space

In [0]:
space = {
  "dense_l1": hp.quniform("dense_l1", 10, 30, 1),
  "dense_l2": hp.quniform("dense_l2", 10, 30, 1),
  "learning_rate": hp.loguniform("learning_rate", -5, 0),
  "optimizer": hp.choice("optimizer", ["Adadelta", "Adam"])
 }



### Create the `SparkTrials` object

The `SparkTrials` object tells `fmin()` to distribute the tuning job across a Spark cluster. When you create the `SparkTrials` object, you can use the `parallelism` argument to set the maximum number of trials to evaluate concurently. The default setting is the number of Spark executors available.  

A higher number lets you scale-out testing of more hyperparameter settings. Because Hyperopt proposes new trials based on past results, there is a trade-off between parallelism and adaptivity. For a fixed `max_evals`, greater parallelism speeds up calculations, but lower parallelism may lead to better results since each iteration has access to more past results.

In [0]:
# If you do not specify a parallelism argument, the default is the number of available Spark executors 
spark_trials = SparkTrials()



### Perform hyperparameter tuning 
Put the `fmin()` call inside an MLflow run to save results to MLflow. MLflow tracks the parameters and performance metrics of each run.   

After running the following cell, you can view the results in MLflow. Click **Experiment** at the upper right to display the Experiment Runs sidebar. Click the icon at the far right next to **Experiment Runs** to display the MLflow Runs Table.

For more information about using MLflow to analyze runs, see ([AWS](https://docs.databricks.com/applications/mlflow/index.html)|[Azure](https://docs.microsoft.com/azure/databricks/applications/mlflow/)|[GCP](https://docs.gcp.databricks.com/applications/mlflow/index.html)).

In [0]:
with mlflow.start_run():
  best_hyperparam = fmin(fn=runNN, 
                         space=space, 
                         algo=tpe.suggest, 
                         max_evals=30, 
                         trials=spark_trials)



## Part 3. Use the best set of hyperparameters to build a final model

In [0]:
import hyperopt

print(hyperopt.space_eval(space, best_hyperparam))



In [0]:
first_layer = hyperopt.space_eval(space, best_hyperparam)["dense_l1"]
second_layer = hyperopt.space_eval(space, best_hyperparam)["dense_l2"]
learning_rate = hyperopt.space_eval(space, best_hyperparam)["learning_rate"]
optimizer = hyperopt.space_eval(space, best_hyperparam)["optimizer"]



In [0]:
# Get optimizer and update with learning_rate value
optimizer_call = getattr(tf.keras.optimizers, optimizer)
optimizer = optimizer_call(learning_rate=learning_rate)



In [0]:
def create_new_model():
  model = Sequential()
  model.add(Dense(first_layer, input_dim=8, activation="relu"))
  model.add(Dense(second_layer, activation="relu"))
  model.add(Dense(1, activation="linear"))
  return model



In [0]:
new_model = create_new_model()
  
new_model.compile(loss="mse",
                optimizer=optimizer,
                metrics=["mse"])



When `autolog()` is active, MLflow does not automatically end a run. We need to end the run that was started in Cmd 30 before starting and autologging a new run.  
For more information, see https://www.mlflow.org/docs/latest/tracking.html#automatic-logging.

In [0]:
mlflow.end_run()



In [0]:
import matplotlib.pyplot as plt

mlflow.tensorflow.autolog()

with mlflow.start_run() as run:
  
  history = new_model.fit(X_train, y_train, epochs=35, callbacks=[early_stopping])
  
  # Save the run information to register the model later
  kerasURI = run.info.artifact_uri
  
  # Evaluate model on test dataset and log result
  mlflow.log_param("eval_result", new_model.evaluate(X_test, y_test)[0])
  
  # Plot predicted vs known values for a quick visual check of the model and log the plot as an artifact
  keras_pred = new_model.predict(X_test)
  plt.plot(y_test, keras_pred, "o", markersize=2)
  plt.xlabel("observed value")
  plt.ylabel("predicted value")
  plt.savefig("kplot.png")
  mlflow.log_artifact("kplot.png") 



## Part 4. Register the model in MLflow and use the model to make predictions
To learn more about the Model Registry, see ([AWS](https://docs.databricks.com/applications/mlflow/model-registry.html)|[Azure](https://docs.microsoft.com/azure/databricks/applications/mlflow/model-registry)|[GCP](https://docs.gcp.databricks.com/applications/mlflow/model-registry.html)).

In [0]:
import time

model_name = "cal_housing_keras"
model_uri = kerasURI+"/model"
new_model_version = mlflow.register_model(model_uri, model_name)

# Registering the model takes a few seconds, so add a delay before continuing with the next cell
time.sleep(5)



### Load the model for inference and make predictions

In [0]:
keras_model = mlflow.keras.load_model(f"models:/{model_name}/{new_model_version.version}")

keras_pred = keras_model.predict(X_test)
keras_pred



## Clean up
To stop TensorBoard:
- If you are running Databricks Runtime for Machine Learning 7.1 ML or below, uncomment and run the command in the following cell.  
- If you are running Databricks Runtime for Machine Learning 7.2 ML or above, detach this notebook from the cluster.

In [0]:
#dbutils.tensorboard.stop()

