# Building, Testing, and Deploying a Custom Model

This notebook walks through the general workflow for building a custom task. We'll also demonstrate how to then deploy your custom task to a cloud b

## Note
The final sections of this tutorial require that you have access to Cloud DataRobot (app.datarobot.com or app.eu.datarobot.com)

## Agenda
In this tutorial, we'll learn:
1. How to create a custom task using simple python classes
2. How to test your python class
3. How to use the drum cli tools to test out your custom task 
4. How to use the DataRobot API to deploy your custom task to the DataRobot cloud for use in projects
5. How to insert a custom task on the DataRobot cloud into a blueprint

## Setup and Requirements [ In Progress]
This tutorial assumes a few things about your filepath and prior work. 

**Firstly, you need a feature flag enabled:**

Secondly, you should have a folder at the path `~/datarobot-user-models/`. If you put the folder in a different location, make sure you update the `TESTING_PATH` variable. This folder should contain 4 things:
1. A folder containing your properly configured custom environment.     
    In this example, it's named `public_dropin_environments/python3_pytorch/`
    
    
2. A folder containing your properly-configured custom model.     
    In this example, it's named `model_templates/python3_pytorch/`
    
    
3. The current version of the DataRobot Python Client.
    - Installation instructions for the client can be found here: [DataRobot Python Client Docs](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.20.0/setup/getting_started.html#installation)
    - Full documentation for the client can be found here: [DataRobot Python Client Docs](https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.20.0/index.html)


4. A test dataset that you can use to test predictions from your custom model.     
    In this example, it's stored at `tests/testdata/juniors_3_year_stats_regression.csv`

It also assumes that you have access to app.datarobot.com.
If you use another version of DataRobot - use appropriate credentials and URL.


## Configuring Models and Environments
For more information on how to properly configure custom models and environments, read the README of our [DataRobot User Models repository](https://github.com/datarobot/datarobot-user-models).

# Building a custom task

First, we need to import a few things

In [2]:
from pathlib import Path
import pandas as pd
import tensorflow as tf
import pickle

import keras.models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor

from sklearn.pipeline import Pipeline


Now let's build a neural network! First we'll lay out the code, then we'll walk through it

In [35]:
from datarobot_drum.custom_task_interfaces import RegressionEstimatorInterface

class CustomTask(RegressionEstimatorInterface):
    def create_regression_model(self, num_features: int) -> Sequential:
        """
        Create a regression model.

        Parameters
        ----------
        num_features: int
            Number of features in X to be trained with

        Returns
        -------
        model: Sequential
            Compiled regression model
        """
        input_dim, output_dim = num_features, 1

        # create model
        model = Sequential(
            [
                Dense(input_dim, activation="relu", input_dim=input_dim, kernel_initializer="normal"),
                Dense(input_dim // 2, activation="relu", kernel_initializer="normal"),
                Dense(output_dim, kernel_initializer="normal"),
            ]
        )
        model.compile(loss="mse", optimizer="adam", metrics=["mae", "mse"])
        return model


    def build_regressor(self, X: pd.DataFrame):
        """
        Make the regressor pipeline with the required preprocessor steps and estimator in the end.

        Parameters
        ----------
        X: pd.DataFrame
            X containing all the required features for training

        Returns
        -------
        regressor_pipeline: Pipeline
            Regressor pipeline with preprocessor and estimator
        """

        return KerasRegressor(
            build_fn=self.create_regression_model,
            num_features=len(X.columns),
            epochs=20,
            batch_size=8,
            verbose=1,
            validation_split=0.33,
            callbacks=[EarlyStopping(patience=20)],
        )

    
    def fit(self, X, y, row_weights=None, **kwargs):
        """ This hook defines how DataRobot will train this task.
        DataRobot runs this hook when the task is being trained inside a blueprint.
        As an output, this hook is expected to create an artifact containing a trained object, that is then used to predict new data.
        The input parameters are passed by DataRobot based on project and blueprint configuration.

        Parameters
        -------
        X: pd.DataFrame
            Training data that DataRobot passes when this task is being trained.
        y: pd.Series
            Project's target column.
        row_weights: np.ndarray (optional, default = None)
            A list of weights. DataRobot passes it in case of smart downsampling or when weights column is specified in project settings.

        Returns
        -------
        CustomTask
            returns an object instance of class CustomTask that can be used in chained method calls
        """
        self.estimator = self.build_regressor(X)

        tf.random.set_seed(1234)
        self.estimator.fit(X, y)
        return self

    def save(self, artifact_directory):
        """
        Serializes the object and stores it in `artifact_directory`

        Parameters
        ----------
        artifact_directory: str
            Path to the directory to save the serialized artifact(s) to

        Returns
        -------
        self
        """

        # If your estimator is not pickle-able, you can serialize it using its native method,
        # i.e. in this case for keras we use model.save, and then set the estimator to none
        keras.models.save_model(self.estimator.model, Path(artifact_directory) / "model")
        self.estimator.model = None

        # Now that the estimator is none, it won't be pickled with the CustomTask class (i.e. this one)
        with open(Path(artifact_directory) / "artifact.pkl", "wb") as fp:
            pickle.dump(self, fp)

        return self

    @classmethod
    def load(cls, artifact_directory):
        """
        Deserializes the object stored within `artifact_directory`

        Returns
        -------
        cls
            The deserialized object
        """

        with open(Path(artifact_directory) / "artifact.pkl", "rb") as fp:
            custom_task = pickle.load(fp)

        custom_task.estimator.model = keras.models.load_model(Path(artifact_directory) / "model")

        return custom_task

    def predict(self, X, **kwargs):
        """ This hook defines how DataRobot will use the trained object from fit() to transform new data.
        DataRobot runs this hook when the task is used for scoring inside a blueprint.
        As an output, this hook is expected to return the transformed data.
        The input parameters are passed by DataRobot based on dataset and blueprint configuration.

        Parameters
        -------
        X: pd.DataFrame
            Data that DataRobot passes for transformation.

        Returns
        -------
        pd.DataFrame
            Returns a dataframe with transformed data.
        """

        return pd.DataFrame(data=self.estimator.predict(X))


There's a lot above, so don't worry about reading through it all now. The key idea is that we have several hooks, specifically fit, save, load, and predict. DataRobot will use these hooks automatically to run our custom task. Then we also have several helper functions, specifically build_regressor and create_regression_model. We could have just as easily defined a separate file with these helper functions and imported them (which is actually what the example does [here]), but since we're in a notebook it makes more sense to lay out everything

Now let's actually use the class above. Since this is an ordinary python class, all we need to do is build an object and we can test it out to ensure our methods work! First, let's grab a dataset and then separate out the target column

In [6]:
df = pd.read_csv("tests/testdata/juniors_3_year_stats_regression.csv")

y = df['Grade 2014']
X = df.drop(labels=['Grade 2014'], axis=1)

Now let's train our model!

In [36]:
task = CustomTask()

In [None]:
task = task.fit(X,y)

TODO: emphasize that save is critically important. Maybe walk through what DataRobot will do and what it needs to work ?

Since fit returns the estimator, we can use it to run save if we need to! This will create a serialized version of our model. Within DataRobot, this is how we hand off our model between the fit and predict hooks, which actually run in separate containers. In our code here we can test that out by saving our model, then loading it and running predict to make sure it works

In [38]:
task.save(".")

INFO:tensorflow:Assets written to: model/assets


<__main__.CustomTask at 0x1864ef450>

In [39]:
task = task.load(".")

In [40]:
task.predict(X)



Unnamed: 0,0
0,28.767523
1,27.769978
2,28.650490
3,25.763115
4,30.636791
...,...
1472,28.275587
1473,27.721256
1474,32.391720
1475,24.609760


TODO: mention what they'll need to copy into custom.py (have a separate folder for this example so they can see the difference between notebook land and custom.py)