# Standard Transformer Example

This notebook demonstrates how to deploy a PyFunc model and a standard transformer. The pyfunc model is an echo model which simply returns the request from the transformer. While the transformer itself has preprocess and postprocess pipeline, including Feast feature retrieval, variable declaration, and table creation.

## Requirements

- Authenticated to gcloud (```gcloud auth application-default login```)

In [None]:
!pip install --upgrade -r requirements.txt > /dev/null

In [None]:
import warnings
warnings.filterwarnings('ignore')

## 1. Initialize Merlin

### 1.1 Set Merlin Server

In [None]:
import merlin
print(merlin.__version__)

merlin.set_url("https://my-merlin-domain/api/merlin")

### 1.2 Set Active Project

`project` represent a project in real life. You may have multiple model within a project.

`merlin.set_project(<project-name>)` will set the active project into the name matched by argument. You can only set it to an existing project. If you would like to create a new project, please do so from the MLP UI.

In [None]:
PROJECT_NAME = "sample"

merlin.set_project(PROJECT_NAME)

### 1.3 Set Active Model

`model` represents an abstract ML model. Conceptually, `model` in Merlin is similar to a class in programming language. To instantiate a `model` you'll have to create a `model_version`.

Each `model` has a type, currently model type supported by Merlin are: sklearn, xgboost, tensorflow, pytorch, and user defined model (i.e. pyfunc model).

`model_version` represents a snapshot of particular `model` iteration. You'll be able to attach information such as metrics and tag to a given `model_version` as well as deploy it as a model service.

`merlin.set_model(<model_name>, <model_type>)` will set the active model to the name given by parameter, if the model with given name is not found, a new model will be created.

In [None]:
from merlin.model import ModelType

MODEL_NAME = "transformer-upi"

merlin.set_model(MODEL_NAME, ModelType.PYFUNC)

## 2. Create PyFunc Model

To create a PyFunc model you'll have to extend `merlin.PyFuncModel` class and implement its `initialize` and one of `infer` or `upiv1_infer` method.

`initialize` will be called once during model initialization. The argument to `initialize` is a dictionary containing a key value pair of artifact name and its URL. The artifact's keys are the same value as received by `log_pyfunc_model`.

`infer` method is the prediction method that needs to be implemented when `HTTP_JSON` protocol is used (the default protocol). It accept a dictionary type argument which represent incoming request body. `infer` should return a dictionary object which correspond to response body of prediction result.

`upiv1_infer` method is the prediction method that needs to be implemented when `UPI_V1` protocol is used.

In following example we are creating PyFunc model called `SimpleForwarder` which will simply return back the `prediction_table` field of request as the `prediction_result_table` field of the response.


In [None]:
import pandas as pd
import grpc
from caraml.upi.utils import df_to_table
from caraml.upi.v1 import upi_pb2
from caraml.upi.v1 import type_pb2, variable_pb2
from merlin.model import PyFuncModel

class SimpleForwarder(PyFuncModel):
    target_name = "probability"
    def infer(self, request: dict, **kwargs):
        return request

    def upiv1_infer(self, request: upi_pb2.PredictValuesRequest,
                    context: grpc.ServicerContext) -> upi_pb2.PredictValuesResponse:
        return upi_pb2.PredictValuesResponse(prediction_result_table=request.prediction_table, target_name=self.target_name)


## 3. Deploy Model

To deploy the model, we will have to create an iteration of the model (by create a `model_version`), upload the serialized model to MLP, and then deploy.

### 3.1 Create Model Version and Upload

`merlin.new_model_version()` is a convenient method to create a model version and start its development process. It is equal to following codes:

```
v = model.new_model_version()
v.start()
v.log_pyfunc_model(model_instance=EnsembleModel(), 
                conda_env="env.yaml", 
                artifacts={"xgb_model": model_1_path, "sklearn_model": model_2_path})
v.finish()
```

To upload PyFunc model you have to provide following arguments:
1. `model_instance` is the instance of PyFunc model, the model has to extend `merlin.PyFuncModel`
2. `conda_env` is path to conda environment yaml file. The environment yaml file must contain all dependency required by the PyFunc model.
3. (Optional) `artifacts` is additional artifact that you want to include in the model
4. (Optional) `code_dir` is a list of directory containing python code that will be loaded during model initialization, this is required when `model_instance` depend on local python package

In [None]:
import merlin
with merlin.new_model_version() as v:    
    merlin.log_pyfunc_model(model_instance=SimpleForwarder(),
                            conda_env="env.yaml",
                            artifacts={})

### 3.2 Deploy Model and Transformer

To deploy a model and its transformer, you must pass a `transformer` object to `deploy()` function. Each of deployed model version will have its own generated url. The `transformer` object is initialized by specifying the YAML config file. For specifying Transformer config, refer to [this documention](https://go-jek.atlassian.net/wiki/spaces/DSP/pages/2158857203/Standard+Transformer+Design). You can use merlin UI to easily create the transformer configuration.

In [None]:
!cat "config.yaml"

In [None]:
from merlin.transformer import StandardTransformer
from merlin.protocol import Protocol


# Create a transformer object and its resources requests
transformer_config_path = "config.yaml"
transformer = StandardTransformer(config_file=transformer_config_path,
                                  enabled=True)

endpoint = merlin.deploy(v, transformer=transformer, protocol=Protocol.UPI_V1)

In [None]:
from caraml.upi.v1 import upi_pb2
from typing import List

def create_simple_forwarder_request() -> upi_pb2.PredictValuesRequest:
    target_name = SimpleForwarder.target_name
    cols = ["id", "name", "vehicle", "previous_vehicle", "rating", "test_time", "row_number"]
    indices = ["row1", "row2"]
    rows = [
        [1, "driver-1", "motorcycle", "suv", 4.0, 90, 0],
        [2, "driver-2", "sedan", "mpv", 3.0, 90, 1]
    ]
    driver_df = pd.DataFrame(columns=cols, data=rows, index=indices)
    driver_table = df_to_table(driver_df, "driver_table")
    variables: List[variable_pb2.Variable] = [
        variable_pb2.Variable(name="customer_id", type=type_pb2.TYPE_INTEGER, integer_value=1111)
    ]
    return upi_pb2.PredictValuesRequest(
        target_name=target_name,
        transformer_input=upi_pb2.TransformerInput(
            tables=[driver_table],
            variables=variables
        )
    )

def simple_forwarder_response() -> upi_pb2.PredictValuesResponse:
    target_name = SimpleForwarder.target_name
    cols = ["customer_id", "name", "rank", "rating", "vehicle", "previous_vehicle"]
    indices = ["row2", "row1"]
    rows = [
        [1111, "driver-2", 2.5, 0.5, 2, 3],
        [1111, "driver-1", -2.5, 0.75, 0, 1]
    ]
    response_df = pd.DataFrame(columns=cols, data=rows, index=indices)
    response_table = df_to_table(response_df, "transformed_driver_table")
    return upi_pb2.PredictValuesResponse(
        target_name=target_name,
        prediction_result_table=response_table
    )

### 3.3 Send Test Request

In [None]:
import grpc
from time import sleep

from caraml.upi.v1 import upi_pb2, upi_pb2_grpc

channel = grpc.insecure_channel(f"{endpoint.url}:80")
stub = upi_pb2_grpc.UniversalPredictionServiceStub(channel)

print(endpoint.url)
sleep(5)

request = create_simple_forwarder_request()
print(f"request ---------> {request}")


In [None]:
response = stub.PredictValues(request=request)
print(f"response ---------> {response}")

## 3. Clean Up

## 3.1 Delete Deployment

In [None]:
merlin.undeploy(v)