## 2. Define a Custom Model and log it to Snowflake Model Registry

In this notebook we will define a Custom Model that is using the PyCrate model, that we trained in the previous step, and log it to the Snowflake Model Registry.

### Import Libraries

In [None]:
from snowflake.snowpark import Session
from snowflake.snowpark.version import VERSION

from snowflake.ml.registry import Registry
from snowflake.ml.model import custom_model
from snowflake.ml.model import model_signature

from pycaret.classification import predict_model, load_model

import pandas as pd
import json
import os
import shutil

# warning suppresion
import warnings; warnings.simplefilter('ignore')

### Establish Secure Connection to Snowflake

*Other connection options include Username/Password, MFA, OAuth, Okta, SSO. For more information, refer to the [Python Connector](https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-example) documentation.*

In [None]:
# Make a Snowpark Connection

################################################################################################################
#  You can also use the SnowSQL Client to configure your connection params:
#  https://docs.snowflake.com/en/user-guide/snowsql-install-config.html
#
#  >>> from snowflake.ml.utils import connection_params
#  >>> session = Session.builder.configs(connection_params.SnowflakeLoginOptions()
#  >>> ).create()   
#
#  NOTE: If you have named connection params then specify the connection name
#  Example:
#  
#  >>> session = Session.builder.configs(
#  >>> connection_params.SnowflakeLoginOptions(connection_name='connections.snowml')
#  >>> ).create()
#
#################################################################################################################

# Edit the connection.json before creating the session object below
# Create Snowflake Session object
connection_parameters = json.load(open('connection.json'))
session = Session.builder.configs(connection_parameters).create()

snowflake_environment = session.sql('SELECT current_user(), current_version()').collect()
snowpark_version = VERSION

# Current Environment Details
print('\nConnection Established with the following parameters:')
print('User                        : {}'.format(snowflake_environment[0][0]))
print('Role                        : {}'.format(session.get_current_role()))
print('Database                    : {}'.format(session.get_current_database()))
print('Schema                      : {}'.format(session.get_current_schema()))
print('Warehouse                   : {}'.format(session.get_current_warehouse()))
print('Snowflake version           : {}'.format(snowflake_environment[0][1]))
print('Snowpark for Python version : {}.{}.{}'.format(snowpark_version[0],snowpark_version[1],snowpark_version[2]))

First step is to create a [CustomModel](https://docs.snowflake.com/en/developer-guide/snowpark-ml/snowpark-ml-mlops-custom-models) class that will be used in Snowflake when calling the methods/functions of the model. 

In this quickstart we will only support the **predict** function, but if we want to support additional functions, we would specify those as methods of our class.

The **__init__** method is where we load the model, we also need to change the *memory* directory that the model is using to */tmp/* since when we run this model in Snowflake it will use the WH nodes and we only have access to the */tmp* directory on those. **This only needed for a PyCaret model, if you use another library this might not be needed**
 
The **predict** method needs to accept a Pandas DataFrame as input and also return a Pandas DataFrame, this due to that when it is running in Snowflake a [vectorized UDF](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch) is used where Snowflake converts the input rows to a Pandas DataFrame when calling the method and then converts the returned Pandas DataFrame into rows.


In [None]:
# Name of the class
class PyCaretModel(custom_model.CustomModel):
    # The init function is used to load the model file
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)
        # The model is saved with .pkl prefix, and the filename will be part of the properties of the ModelContext
        # we create when logging it to Snowflake. Since PyCaret load function does not support using the prefix we 
        # need to remove it from the name
        model_dir = self.context.path("model_file")[:-4]
        # Load the model
        self.model = load_model(model_dir, verbose=False)
        # When running this model in Snowflake it will use a WH and we do not have access to the /var/ directory on the nodes so
        # we need to change to a directory we have access to, in this case /tmp/
        self.model.memory='/tmp/' 

    @custom_model.inference_api
    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
        model_output = predict_model(self.model, data=X)
        # We will return both the predcited label as well as the score, as two sepereated columns
        res_df = pd.DataFrame({"prediction_label": model_output['prediction_label'], "prediction_score": model_output['prediction_score']})
        
        return res_df

We can now use this CustomModel class every time we want to log a PyCaret ClassificationExperiment to the Snowflake Model Registry.

Before logging the model we need to define the **ModelContext**, it has references to the objects we need. In this case we need to add a reference to where we have stored the *juice_best_model.pkl* file locally so it can be uploaded to Snowflake when we log it to the Snowflake Model Registry.

In [None]:
pycaret_mc = custom_model.ModelContext(
	models={ # This should be for models/objects that is supported by Model Registry OOTB.
	},
	artifacts={ # Everything not supported needs to be here
		'model_file': 'juice_best_model.pkl',
	}
)

We can now create a new object based on our **PyCaretModel** class and test it before log it to the Snowflake Model Registry

Since we also need to generate a **ModelSignature** we can reuse the **test_data** DataFrame for generating the input part.

In [None]:
test_data = [
    [1,237,1,1.75,1.99,0.00,0.00,0,0,0.500000,1.99,1.75,0.24,'No',0.000000,0.000000,0.24,1],
    [2,239,1,1.75,1.99,0.00,0.30,0,1,0.600000,1.69,1.75,-0.06,'No',0.150754,0.000000,0.24,1],
    [3,245,1,1.86,2.09,0.17,0.00,0,0,0.680000,2.09,1.69,0.40,'No',0.000000,0.091398,0.23,1],
    [4,227,1,1.69,1.69,0.00,0.00,0,0,0.400000,1.69,1.69,0.00,'No',0.000000,0.000000,0.00,1],
    [5,228,7,1.69,1.69,0.00,0.00,0,0,0.956535,1.69,1.69,0.00,'Yes',0.000000,0.000000,0.00,0]
]
col_nms = ['Id','WeekofPurchase','StoreID','PriceCH','PriceMM','DiscCH','DiscMM','SpecialCH','SpecialMM'
           ,'LoyalCH','SalePriceMM','SalePriceCH','PriceDiff','Store7','PctDiscMM','PctDiscCH','ListPriceDiff','STORE']

test_pd = pd.DataFrame(test_data, columns=col_nms)
test_pd

We will store the output from the **predict** call in a Pandas DataFrame so it can be used for generating the output part of the **ModelSignature**

In [None]:
my_pycaret_model = PyCaretModel(pycaret_mc)
output_pd = my_pycaret_model.predict(test_pd)
output_pd

Before logging the model we need to provide a **Model Signauture**. A **Model Signature** can be created using sample data for the input and output by using the *model_signature.infer_signature* function.

In this case we can use the **test_pd** Pandas DataFrame as the input_data and **output_pd** Pandas DataFrame as the output.

In [None]:
predict_sign = model_signature.infer_signature(input_data=test_pd, output_data=output_pd)
predict_sign

We can now log the model,  we will use the model signature for the predict function. In order to know which function of themodel that uses the signature we need use the function name as the key for the **signatures** parameter. 

There can be multiple models under the same **model_name** as long as the **version_name** is different. 

There is also possible to add metrics using the **metrics** parameter, but in this case we will not do that.

In [None]:
# Create a model registry connection using the Snowpark session object, we will use the current database and schema for storing the model.
snowml_registry = Registry(session)

custom_mv = snowml_registry.log_model(
    my_pycaret_model,
    model_name="pycaret_juice",
    version_name="version_1",
    conda_dependencies=["pycaret==3.0.2", "scipy==1.11.4", "joblib==1.2.0"],
    options={"relax_version": False},
    signatures={"predict": predict_sign},
    comment = 'PyCaret ClassificationExperiment using the CustomModel API'
)

We can use **show_models** to check that the model is avalible in the Model Registry

In [None]:
snowml_registry.show_models()

We can now use the logged model to do inference, using the model version object returned from **log_model**, **custom_mv**.

To test it, we can create a Snowpark DataFrame with the test data we defined previously and then use the **run** methond of the model version to get back a Snowpark DataFrame with the predictions added (it can also use a Pandas DataFrame directly).

Using the **show_functions** method on the model version object will show us which methods the model support and what the expected input and output is.

In [None]:
custom_mv.show_functions()

Create a Snowpark DataFrame and use the model on it.

In [None]:
snowpark_df = session.create_dataframe(test_data, schema=col_nms)

custom_mv.run(snowpark_df).show()

You have now succesfully deployed a PyCaret model to Snowflake using the Model Registry. If you go to Snowsight (the Snowflake GUI), you should see the model under **AI & ML -> Models**, if you do not see it make sure you are using the **ACCOUNTADMIN** role or the role you used to log the model.

If you want to use the model fom SQL, you could use the following SQL:
```SQL
SELECT 
 pycaret_juice!predict(*) as predict_dict,
 predict_dict['prediction_label']::text as prediction_label,
 predict_dict['prediction_score']::double as prediction_score
from pycaret_input_data;
```