# DLHub: A Data and Learning Hub for Science

DLHub is a self-service platform for publishing, applying, and creating machine learning (ML) models, including deep learning (DL) models, and associated data transformations. It is:

1. A **model serving infrastructure**: Users can easily run or test models (and also other related services, such as data transformations) via simple Web calls.

2. A **model registry**: Model developers can easily publish models, along with associated descriptive metadata and training data, so that they can then be discovered, cited, and reused by others.

3. A **model development system**: Developers of new models can easily access the data and computing infrastructure needed to re-train models for new applications.

DLHub benefits users in many ways. Data scientists can publish models (i.e., architectures and weights) and methods. Other scientists can apply existing models to new data with ease (e.g., by querying a prediction API for a deployed model). They can easily create new models with state-of-the-art techniques. Together, these capabilities lower barriers to employing ML/DL, making it easier for researchers to benefit from advances in ML/DL technologies.


# Publishing a Keras model

The example below covers how to publish a Keras model in DLHub. This includes:
* Model dataset description ( *The feature to publish dataset description in DLHub is a future work )
* Model metadata description
* Model publishing

As a simple example, we will show how to submit a Keras model based on the [MNIST database](http://yann.lecun.com/exdb/mnist/).

To publish a model with DLHub we first gather some metadata about the model itself. Our SDK is designed to assist the user in generating this metadata.

### Make a model using the MNIST dataset

Now we create a simple convnet model on the MNIST dataset.

This model is taken from: https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py

Modified to:
    - Use only 512 examples for faster training
    - Save model to hd5 at the end

In [None]:
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 4
train_size = 512

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train[:train_size], y_train[:train_size],
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# Save the model
model.save("model.hd5")

### Describe the model

We simply load in a Keras model from the hd5 file, and then provide a minimal amount of information about it.

The SDK will inspect the hd5 file for metadata info

In [None]:
from dlhub_sdk.models.servables.keras import KerasModel

# Describe the keras model
model_info = KerasModel.create_model('model.hd5', list(map(str, range(10))))

Now we use the SDK to append other metadata to the model. Below we set the name, title and domain of the model.

In [None]:
model_info.set_title("MNIST Digit Classifier")
model_info.set_name("mnist_tiny_example")
model_info.set_domains(["general", "digit recognition"])

Now we describe the outputs in more detail.

In [None]:
model_info['servable']['methods']['run']['output']['description'] = 'Probabilities of being 0-9'
model_info['servable']['methods']['run']['input']['description'] = 'Image of a digit'

Print out the result

In [None]:
import json

print('\n--> Model Information <--')
print(json.dumps(model_info.to_dict(), indent=2))

### Publishing the model to DLHub

We can use the DLHub SDK to create a DLHubClient. The DLHubClient wraps both our REST API and Search catalog. You can use the client to publish, discover, and use models.

This may take ~10 minutes to publish the model to DLHub.

In [None]:
import dlhub_sdk
dl = dlhub_sdk.DLHubClient()

# Publish the model to DLHub
task_id = dl.publish_servable(model_info)
print(task_id)