# Building a custom task

First, we need to import a few things

In [2]:
from pathlib import Path
import pandas as pd
import tensorflow as tf
import pickle

import keras.models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor

from sklearn.pipeline import Pipeline


Now let's build a neural network! First we'll lay out the code, then we'll walk through it

## The Custom Task Code

In [3]:
from datarobot_drum.custom_task_interfaces import RegressionEstimatorInterface

class CustomTask(RegressionEstimatorInterface):
    def fit(self, X, y, row_weights=None, **kwargs):
        """ This hook defines how DataRobot will train this task.
        DataRobot runs this hook when the task is being trained inside a blueprint.
        As an output, this hook is expected to create an artifact containing a trained object, that is then used to predict new data.
        The input parameters are passed by DataRobot based on project and blueprint configuration.

        Parameters
        -------
        X: pd.DataFrame
            Training data that DataRobot passes when this task is being trained.
        y: pd.Series
            Project's target column.
        row_weights: np.ndarray (optional, default = None)
            A list of weights. DataRobot passes it in case of smart downsampling or when weights column is specified in project settings.

        Returns
        -------
        CustomTask
            returns an object instance of class CustomTask that can be used in chained method calls
        """
        tf.random.set_seed(1234)
        input_dim, output_dim = len(X.columns), 1

        model = Sequential(
            [
                Dense(
                    input_dim, activation="relu", input_dim=input_dim, kernel_initializer="normal"
                ),
                Dense(input_dim // 2, activation="relu", kernel_initializer="normal"),
                Dense(output_dim, kernel_initializer="normal"),
            ]
        )
        model.compile(loss="mse", optimizer="adam", metrics=["mae", "mse"])

        callback = EarlyStopping(monitor="loss", patience=3)
        model.fit(
            X, y, epochs=20, batch_size=8, validation_split=0.33, verbose=1, callbacks=[callback]
        )

        # Attach the model to our object for future use
        self.estimator = model
        return self

    def save(self, artifact_directory):
        """
        Serializes the object and stores it in `artifact_directory`

        Parameters
        ----------
        artifact_directory: str
            Path to the directory to save the serialized artifact(s) to

        Returns
        -------
        self
        """

        # If your estimator is not pickle-able, you can serialize it using its native method,
        # i.e. in this case for keras we use model.save, and then set the estimator to none
        keras.models.save_model(self.estimator, Path(artifact_directory) / "model.h5")

        # Helper method to handle serializing, via pickle, the CustomTask class
        self.save_task(artifact_directory, exclude=['estimator'])

        return self

    @classmethod
    def load(cls, artifact_directory):
        """
        Deserializes the object stored within `artifact_directory`

        Returns
        -------
        cls
            The deserialized object
        """

        # Helper method to load the serialized CustomTask class
        custom_task = cls.load_task(artifact_directory)

        custom_task.estimator = keras.models.load_model(Path(artifact_directory) / "model.h5")

        return custom_task

    def predict(self, X, **kwargs):
        """ This hook defines how DataRobot will use the trained object from fit() to transform new data.
        DataRobot runs this hook when the task is used for scoring inside a blueprint.
        As an output, this hook is expected to return the transformed data.
        The input parameters are passed by DataRobot based on dataset and blueprint configuration.

        Parameters
        -------
        X: pd.DataFrame
            Data that DataRobot passes for transformation.

        Returns
        -------
        pd.DataFrame
            Returns a dataframe with transformed data.
        """
        # Note how the regression estimator only outputs one column, so no explicit column names are needed
        return pd.DataFrame(data=self.estimator.predict(X))


There's a lot above, so don't worry about reading through it all now. The key idea is that we have several hooks, specifically fit, save, load, and predict. DataRobot will use these hooks automatically to run our custom task. You can copy the above cell directly in a custom.py file, add in an optional (but highly recommeneded) model-metadata.yaml, and then you're ready to upload this CustomTask to DataRobot! See [placeholder] to see exactly how we setup the code above into a custom task folder ready for upload.

Note: The class above is an ordinary python class, so you can easily add helper methods or even import entire helper files! See [placeholder for VisualAI] for a more complex neural network that uses helper functions in a separate file. 


Now let's actually use the class above. Since this is an ordinary python class, all we need to do is build an object and we can test it out to ensure our methods work! First, let's grab a dataset and then separate out the target column

## Training our Custom Task

In [4]:
df = pd.read_csv("tests/testdata/juniors_3_year_stats_regression.csv")

y = df['Grade 2014']
X = df.drop(labels=['Grade 2014'], axis=1)

Now let's train our model!

In [5]:
task = CustomTask()

In [6]:
task = task.fit(X,y)

2022-02-01 09:48:32.035510: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-01 09:48:32.193202: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20


## Saving and Loading our Custom Task

Saving our model is critically important. For performance reasons, DataRobot actually separates training a model with the fit function vs. making predictions with the model with the predict() function. This means that we have to save, or serialize, everything we want to use in the predict() function. One challenge is that each machine learning library may use a slightly different serialization format. For example, sklearn uses pickle to serialize, whereas the keras framework has its own model.save() and load_model. 

By default, the CustomTask will pickle a model. That meants for a standard sklearn model, you don't even have to write a save or load hook! The CustomTask class has a built in save and load mehtod that will create a pickle for you (you will see it in your files as drum_artifact.pkl)

If you look above, you'll see we actually overrode the built in save and load methods. That's because we need to save our keras model using it's own serialization methods, in this case saving as a .h5 file. One thing to notice is that all we have to do is save off our model, typically saved in self.estimator, and then we can call the save_task helper function. This will automatically pickle the CustomTask class and exclude (i.e. set to None) any objects we pass along

In [17]:
??CustomTask.save

[0;31mSignature:[0m [0mCustomTask[0m[0;34m.[0m[0msave[0m[0;34m([0m[0mself[0m[0;34m,[0m [0martifact_directory[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
    [0;32mdef[0m [0msave[0m[0;34m([0m[0mself[0m[0;34m,[0m [0martifact_directory[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""[0m
[0;34m        Serializes the object and stores it in `artifact_directory`[0m
[0;34m[0m
[0;34m        Parameters[0m
[0;34m        ----------[0m
[0;34m        artifact_directory: str[0m
[0;34m            Path to the directory to save the serialized artifact(s) to[0m
[0;34m[0m
[0;34m        Returns[0m
[0;34m        -------[0m
[0;34m        self[0m
[0;34m        """[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m        [0;31m# If your estimator is not pickle-able, you can serialize it using its native method,[0m[0;34m[0m
[0;34m[0m        [0;31m# i.e. in this case for keras we use model.save, and then set the estim

Loading a custom task is simply the opposite approach. We use a helper method to read in our CustomTask object, then use the keras load_model method to 

In [54]:
task.save(".")

INFO:tensorflow:Assets written to: model/assets


<__main__.CustomTask at 0x187f5c350>

In [55]:
task = task.load(".")

In [56]:
task.predict(X)



Unnamed: 0,0
0,28.767523
1,27.769978
2,28.650490
3,25.763115
4,30.636791
...,...
1472,28.275587
1473,27.721256
1474,32.391720
1475,24.609760


TODO: mention what they'll need to copy into custom.py (have a separate folder for this example so they can see the difference between notebook land and custom.py)

In [8]:
import datarobot as dr
dr.Client()


<datarobot.rest.RESTClientObject at 0x18168d7d0>

In [12]:
from datarobot_bp_workshop import Workshop
w = Workshop()