In [20]:
#Determine which model to load
roberta=True
msmarco_distilbertv4 = False

## 2. Define a Custom Model and log it to Snowflake Model Registry

In this notebook we will define a Custom Model that is using the PyCrate model, that we trained in the previous step, and log it to the Snowflake Model Registry.

### Import Libraries

In [21]:
!python3 -m pip3 --no-cache-dir install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
!python3 -m pip install sentence-transformers

/home/gustav/GitRepos/snowconnect/.venv/bin/python3: No module named pip3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1.1[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [22]:
from snowflake.snowpark import Session
#Default SF connection details defined in ~/.config/snowflake/connection.toml
from transformers import AutoTokenizer, AutoModel
import torch

In [23]:
from snowflake.snowpark import Session
from snowflake.snowpark.version import VERSION

from snowflake.ml.registry import Registry
from snowflake.ml.model import custom_model
from snowflake.ml.model import model_signature


import pandas as pd
import json
import os
import shutil

# warning suppresion
import warnings; warnings.simplefilter('ignore')

### Establish Secure Connection to Snowflake

*Other connection options include Username/Password, MFA, OAuth, Okta, SSO. For more information, refer to the [Python Connector](https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-example) documentation.*

In [24]:
# Make a Snowpark Connection

################################################################################################################
#  You can also use the SnowSQL Client to configure your connection params:
#  https://docs.snowflake.com/en/user-guide/snowsql-install-config.html
#
#  >>> from snowflake.ml.utils import connection_params
#  >>> session = Session.builder.configs(connection_params.SnowflakeLoginOptions()
#  >>> ).create()   
#
#  NOTE: If you have named connection params then specify the connection name
#  Example:
#  
#  >>> session = Session.builder.configs(
#  >>> connection_params.SnowflakeLoginOptions(connection_name='connections.snowml')
#  >>> ).create()
#
#################################################################################################################
connection_parameters = {
  "account": "",
  "user": "",
  "password": "",
  "role": "ACCOUNTADMIN",
  "warehouse": "MRCM_HOL_WH_SP",
  "database": "MRCM_HOL_DB",
  "schema": "MRCM_HOL_SCHEMA"
}
# Edit the connection.json before creating the session object below
# Create Snowflake Session object
# connection_parameters = json.load(open('connection.json'))
# session = Session.builder.configs(connection_parameters).create()
session = Session.builder.configs(connection_parameters).create()
snowflake_environment = session.sql('SELECT current_user(), current_version()').collect()
snowpark_version = VERSION

# Current Environment Details
print('\nConnection Established with the following parameters:')
print('User                        : {}'.format(snowflake_environment[0][0]))
print('Role                        : {}'.format(session.get_current_role()))
print('Database                    : {}'.format(session.get_current_database()))
print('Schema                      : {}'.format(session.get_current_schema()))
print('Warehouse                   : {}'.format(session.get_current_warehouse()))
print('Snowflake version           : {}'.format(snowflake_environment[0][1]))
print('Snowpark for Python version : {}.{}.{}'.format(snowpark_version[0],snowpark_version[1],snowpark_version[2]))


Connection Established with the following parameters:
User                        : 
Role                        : "ACCOUNTADMIN"
Database                    : "MRCM_HOL_DB"
Schema                      : "MRCM_HOL_SCHEMA"
Warehouse                   : "MRCM_HOL_WH_SP"
Snowflake version           : 8.24.1
Snowpark for Python version : 1.19.0


In [40]:
roberta_path = os.getcwd() + '/roberta_base/'
for file in os.listdir(roberta_path):
    print(file)

#Determine which model to load
if roberta:
    for file in os.listdir(roberta_path):
        try:
            print(f'{roberta_path}{file}')
            session.sql(f"PUT file://{roberta_path}{file} @MODELFILES/roberta AUTO_COMPRESS=False OVERWRITE=False").show()
        except Exception as e:
            print(f'passing {file}, {e}')

vocab.json
merges.txt
tokenizer.json
config.json
tokenizer_config.json
dict.txt
pytorch_model.bin
/home//GitRepos/snowconnect/sfguide-deploying-custom-models-snowflake-model-registry-main/roberta_base/vocab.json
--------------------------------------------------------------------------------------------------------------------------------
|"source"    |"target"    |"source_size"  |"target_size"  |"source_compression"  |"target_compression"  |"status"  |"message"  |
--------------------------------------------------------------------------------------------------------------------------------
|vocab.json  |vocab.json  |898823         |898832         |NONE                  |NONE                  |UPLOADED  |           |
--------------------------------------------------------------------------------------------------------------------------------

/home//GitRepos/snowconnect/sfguide-deploying-custom-models-snowflake-model-registry-main/roberta_base/merges.txt
----------------------------

In [41]:
if msmarco_distilbertv4:
    import os
    try:
        #con.cursor().execute("PUT file:///tmp/data/file* @%testtable")
        print(os.getcwd())
        file_pvar=os.getcwd()+"/msmarco_distilbertv4/pytorch_model.bin"
        print(file_pvar)
        session.sql(f"PUT file://{file_pvar} @MODELFILES AUTO_COMPRESS=False OVERWRITE=False").show()
    except:
        print('passing')

In [42]:
#Setup the session
session.sql("USE ROLE ACCOUNTADMIN").show()
list_sql_commands=[
    "CREATE WAREHOUSE IF NOT EXISTS MRCM_HOL_WH_SP", 
    "CREATE DATABASE if not exists MRCM_HOL_DB ",
    "CREATE SCHEMA if not exists MRCM_HOL_SCHEMA "
]
for i in list_sql_commands:
    session.sql(i).show()

------------------------------------
|"status"                          |
------------------------------------
|Statement executed successfully.  |
------------------------------------

------------------------------------------------------
|"status"                                            |
------------------------------------------------------
|MRCM_HOL_WH_SP already exists, statement succee...  |
------------------------------------------------------

----------------------------------------------------
|"status"                                          |
----------------------------------------------------
|MRCM_HOL_DB already exists, statement succeeded.  |
----------------------------------------------------

------------------------------------------------------
|"status"                                            |
------------------------------------------------------
|MRCM_HOL_SCHEMA already exists, statement succe...  |
----------------------------------------------------

First step is to create a [CustomModel](https://docs.snowflake.com/en/developer-guide/snowpark-ml/model-registry/custom-models) class that will be used in Snowflake when calling the methods/functions of the model. 

In this quickstart we will only support the **predict** function, but if we want to support additional functions, we would specify those as methods of our class.

The **__init__** method is where we load the model, we also need to change the *memory* directory that the model is using to */tmp/* since when we run this model in Snowflake it will use the WH nodes and we only have access to the */tmp* directory on those. **This only needed for a PyCaret model, if you use another library this might not be needed**
 
The **predict** method needs to accept a Pandas DataFrame as input and also return a Pandas DataFrame, this due to that when it is running in Snowflake a [vectorized UDF](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch) is used where Snowflake converts the input rows to a Pandas DataFrame when calling the method and then converts the returned Pandas DataFrame into rows.


In [43]:
if msmarco_distilbertv4:
    from sentence_transformers import SentenceTransformer
    modelPath = os.getcwd() + '/msmarco_distilbertv4/'

    model = SentenceTransformer('msmarco-distilbert-base-v4')
    model.save(modelPath)
    model = SentenceTransformer(modelPath)



In [44]:
if msmarco_distilbertv4:
    #Generate custom model class
    class ST_Custom_Model(custom_model.CustomModel):
        from sentence_transformers import SentenceTransformer
        # The init function is used to load the model file
        def __init__(self, context: custom_model.ModelContext) -> None:
            super().__init__(context)
            self.model = SentenceTransformer(modelPath)
            self.model.memory='/tmp/'
            
        @custom_model.inference_api    
        def predict(self, sentences_df: pd.DataFrame) -> pd.DataFrame:
            print(sentences_df)
            print(sentences_df.iloc[:,1])
            # embeddings = model.encode(sentences_df.iloc[:,1])
            # print(embeddings)
            # print(embeddings.size)
            # print(embeddings.shape)
            data={"id":[],
                "embeddings":[]}
            counter=0
            for i,j in sentences_df.iterrows():
                print(j)
                print(type(j))
                # try:
                #     data["id"].append(counter)
                #     data['embeddings'].append(model.encode(j[0]))
                #     print(j[0])
                #     print('printed(j[0])')
                # except:
                data["id"].append(counter)
                data['embeddings'].append(model.encode(j['string_column'])[0])
                print(j['string_column'])
                print('printed(j[string_column])')
                counter+=1
            # res_df = pd.DataFrame({"embeddings": embeddings[0]})
            print('data dict being printed\n',data)
            res_df = pd.DataFrame.from_dict(data=data,orient='columns')
            try:
                res_df.set_index('id', inplace=True)
            except:
                print('Could not set index as id')
                pass
            print(res_df)
            return res_df

In [48]:
if roberta:
    modelPath = os.getcwd() + '/roberta_base/'
    print(modelPath)
    from transformers import RobertaTokenizer, RobertaModel
    tokenizer = RobertaTokenizer.from_pretrained(modelPath)
    model = RobertaModel.from_pretrained(modelPath)
    text = "Replace me by any text you'd like."
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    print(output[0][0][0])

    class Roberta_Custom_Model(custom_model.CustomModel):
        from transformers import RobertaTokenizer, RobertaModel
        # The init function is used to load the model file
        def __init__(self, context: custom_model.ModelContext) -> None:
            super().__init__(context)
            self.model = RobertaModel.from_pretrained(modelPath)
            self.model.memory='/tmp/'
            
        @custom_model.inference_api    
        def predict(self, sentences_df: pd.DataFrame) -> pd.DataFrame:
            print(sentences_df)
            print(sentences_df.iloc[:,1])
            # embeddings = model.encode(sentences_df.iloc[:,1])
            # print(embeddings)
            # print(embeddings.size)
            # print(embeddings.shape)
            data={"id":[],
                "embeddings":[]}
            counter=0
            for i,j in sentences_df.iterrows():
                print(j)
                print(type(j))
                # except:
                data["id"].append(counter)
                encoded_input = tokenizer(text, return_tensors='pt')
                output = model(**encoded_input)
                data['embeddings'].append(output)
                print(j['string_column'])
                print('printed(j[string_column])')
                counter+=1

            print('data dict being printed\n',data)
            res_df = pd.DataFrame.from_dict(data=data,orient='columns')
            try:
                res_df.set_index('id', inplace=True)
            except:
                print('Could not set index as id')
                pass
            print(res_df)
            return res_df

/home/gustav/GitRepos/snowconnect/sfguide-deploying-custom-models-snowflake-model-registry-main/roberta_base/
tensor([-1.1464e-01,  1.1033e-01, -1.4857e-02, -8.8033e-02,  1.1303e-01,
        -5.1192e-02, -3.0091e-04,  2.9412e-02,  2.0843e-02, -9.8690e-02,
        -4.2613e-02,  2.3971e-02,  2.4632e-02, -6.3275e-02,  6.4357e-02,
        -1.2229e-02, -8.5991e-02,  1.2847e-02, -6.8539e-03, -1.2506e-02,
        -1.1441e-01,  1.7809e-02,  1.0354e-02,  1.6203e-01, -3.5990e-02,
         8.6394e-02,  4.4544e-02,  9.8515e-02, -2.6844e-02,  4.7489e-03,
        -7.7530e-02, -9.4126e-02,  8.1290e-02,  1.6278e-02,  1.4995e-02,
         8.3578e-02,  1.0319e-02, -1.9454e-02, -2.4330e-02,  4.3773e-02,
         1.4145e-02,  1.7192e-01,  1.8987e-02,  1.9204e-03,  3.5254e-02,
         3.1255e-02,  1.0304e-02, -6.0790e-02, -3.0112e-02, -9.0907e-03,
         1.0809e-02,  8.7455e-02, -5.5545e-02,  5.9884e-02, -1.5576e-01,
         3.4279e-02,  7.7561e-02,  6.6867e-02,  5.7053e-02, -1.2535e-01,
        -7.654

In [133]:
# #Generate custom model class
# class Transformers_Custom_Model(custom_model.CustomModel):
#     # The init function is used to load the model file
#     def __init__(self, context: custom_model.ModelContext) -> None:
#         super().__init__(context)
#         self.model = SentenceTransformer(modelPath)
#         self.model.memory='/tmp/'
         
#     @custom_model.inference_api    
#     def embedding(self, sentences_df: pd.DataFrame) -> pd.DataFrame:
#         print(sentences_df)
#         print(sentences_df.iloc[:,1][0])
#         embeddings = model.encode(sentences_df.iloc[:,1][0])
#         res_df = pd.DataFrame({"embeddings": embeddings})
#         return res_df

In [134]:
# Name of the class
class PyCaretModel(custom_model.CustomModel):
    # The init function is used to load the model file
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)
        # The model is saved with .pkl prefix, and the filename will be part of the properties of the ModelContext
        # we create when logging it to Snowflake. Since PyCaret load function does not support using the prefix we 
        # need to remove it from the name
        model_dir = self.context.path("model_file")[:-4]
        # Load the model
        self.model = load_model(model_dir, verbose=False)
        # When running this model in Snowflake it will use a WH and we do not have access to the /var/ directory on the nodes so
        # we need to change to a directory we have access to, in this case /tmp/
        self.model.memory='/tmp/' 

    @custom_model.inference_api
    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
        model_output = predict_model(self.model, data=X)
        # We will return both the predcited label as well as the score, as two sepereated columns
        res_df = pd.DataFrame({"prediction_label": model_output['prediction_label'], "prediction_score": model_output['prediction_score']})
        
        return res_df

We can now use this CustomModel class every time we want to log a PyCaret ClassificationExperiment to the Snowflake Model Registry.

Before logging the model we need to define the [ModelContext](https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/api/model/snowflake.ml.model.custom_model.ModelContext), it has references to the objects we need. In this case we need to add a reference to where we have stored the *juice_best_model.pkl* file locally so it can be uploaded to Snowflake when we log it to the Snowflake Model Registry.

In [135]:
model_files=os.listdir('msmarco_distilbertv4')
model_files


['vocab.txt',
 'config_sentence_transformers.json',
 'README.md',
 'tokenizer.json',
 '1_Pooling',
 'config.json',
 'modules.json',
 'sentence_bert_config.json',
 'tokenizer_config.json',
 'model.safetensors',
 'special_tokens_map.json',
 'pytorch_model.bin']

In [136]:
stcustom_mc = custom_model.ModelContext(
	models={ # This should be for models/objects that is supported by Model Registry OOTB.
	},
	artifacts={ # Everything not supported needs to be here
		'model_file': model_files,
	}
)

In [137]:
data={}
counter=0
for file in model_files:
    if counter>0:
        data[f'model_file{counter}']='msmarco_distilbertv4/'+str(file)
    else:
        data['model_file']='msmarco_distilbertv4/'+str(file)
    counter+=1
print(data)

import json
json.dumps(data)

{'model_file': 'msmarco_distilbertv4/vocab.txt', 'model_file1': 'msmarco_distilbertv4/config_sentence_transformers.json', 'model_file2': 'msmarco_distilbertv4/README.md', 'model_file3': 'msmarco_distilbertv4/tokenizer.json', 'model_file4': 'msmarco_distilbertv4/1_Pooling', 'model_file5': 'msmarco_distilbertv4/config.json', 'model_file6': 'msmarco_distilbertv4/modules.json', 'model_file7': 'msmarco_distilbertv4/sentence_bert_config.json', 'model_file8': 'msmarco_distilbertv4/tokenizer_config.json', 'model_file9': 'msmarco_distilbertv4/model.safetensors', 'model_file10': 'msmarco_distilbertv4/special_tokens_map.json', 'model_file11': 'msmarco_distilbertv4/pytorch_model.bin'}


'{"model_file": "msmarco_distilbertv4/vocab.txt", "model_file1": "msmarco_distilbertv4/config_sentence_transformers.json", "model_file2": "msmarco_distilbertv4/README.md", "model_file3": "msmarco_distilbertv4/tokenizer.json", "model_file4": "msmarco_distilbertv4/1_Pooling", "model_file5": "msmarco_distilbertv4/config.json", "model_file6": "msmarco_distilbertv4/modules.json", "model_file7": "msmarco_distilbertv4/sentence_bert_config.json", "model_file8": "msmarco_distilbertv4/tokenizer_config.json", "model_file9": "msmarco_distilbertv4/model.safetensors", "model_file10": "msmarco_distilbertv4/special_tokens_map.json", "model_file11": "msmarco_distilbertv4/pytorch_model.bin"}'

In [138]:
stcustom_mc = custom_model.ModelContext(
	models={ # This should be for models/objects that is supported by Model Registry OOTB.
	},
	artifacts={ # Everything not supported needs to be here
		'model_file': 'msmarco_distilbertv4/vocab.txt', 'model_file1': 'msmarco_distilbertv4/config_sentence_transformers.json', 'model_file2': 'msmarco_distilbertv4/README.md', 'model_file3': 'msmarco_distilbertv4/tokenizer.json', 'model_file4': 'msmarco_distilbertv4/1_Pooling', 'model_file5': 'msmarco_distilbertv4/config.json', 'model_file6': 'msmarco_distilbertv4/modules.json', 'model_file7': 'msmarco_distilbertv4/sentence_bert_config.json', 'model_file8': 'msmarco_distilbertv4/tokenizer_config.json', 'model_file9': 'msmarco_distilbertv4/model.safetensors', 'model_file10': 'msmarco_distilbertv4/special_tokens_map.json'
	}
)

We can now create a new object based on our **PyCaretModel** class and test it before log it to the Snowflake Model Registry

Since we also need to generate a [ModelSignature](https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/api/model/snowflake.ml.model.model_signature.ModelSignature), the Model Signature is used to infer the input and output columns for a model function (for example predict and predict_prob), we can reuse the **test_data** DataFrame for generating the input part.

In [139]:
test_data = [
    [1,237,1,1.75,1.99,0.00,0.00,0,0,0.500000,1.99,1.75,0.24,'No',0.000000,0.000000,0.24,1],
    [2,239,1,1.75,1.99,0.00,0.30,0,1,0.600000,1.69,1.75,-0.06,'No',0.150754,0.000000,0.24,1],
    [3,245,1,1.86,2.09,0.17,0.00,0,0,0.680000,2.09,1.69,0.40,'No',0.000000,0.091398,0.23,1],
    [4,227,1,1.69,1.69,0.00,0.00,0,0,0.400000,1.69,1.69,0.00,'No',0.000000,0.000000,0.00,1],
    [5,228,7,1.69,1.69,0.00,0.00,0,0,0.956535,1.69,1.69,0.00,'Yes',0.000000,0.000000,0.00,0]
]
col_nms = ['Id','WeekofPurchase','StoreID','PriceCH','PriceMM','DiscCH','DiscMM','SpecialCH','SpecialMM'
           ,'LoyalCH','SalePriceMM','SalePriceCH','PriceDiff','Store7','PctDiscMM','PctDiscCH','ListPriceDiff','STORE']

test_pd = pd.DataFrame(test_data, columns=col_nms)
test_pd

Unnamed: 0,Id,WeekofPurchase,StoreID,PriceCH,PriceMM,DiscCH,DiscMM,SpecialCH,SpecialMM,LoyalCH,SalePriceMM,SalePriceCH,PriceDiff,Store7,PctDiscMM,PctDiscCH,ListPriceDiff,STORE
0,1,237,1,1.75,1.99,0.0,0.0,0,0,0.5,1.99,1.75,0.24,No,0.0,0.0,0.24,1
1,2,239,1,1.75,1.99,0.0,0.3,0,1,0.6,1.69,1.75,-0.06,No,0.150754,0.0,0.24,1
2,3,245,1,1.86,2.09,0.17,0.0,0,0,0.68,2.09,1.69,0.4,No,0.0,0.091398,0.23,1
3,4,227,1,1.69,1.69,0.0,0.0,0,0,0.4,1.69,1.69,0.0,No,0.0,0.0,0.0,1
4,5,228,7,1.69,1.69,0.0,0.0,0,0,0.956535,1.69,1.69,0.0,Yes,0.0,0.0,0.0,0


In [140]:
#Generate dataframe with id and string column
data_dict = {
    "id": [1, 2, 3],
    "string_column": [["Hello",], ["World",], ["Python",]]
}

df = pd.DataFrame(data_dict)

print(df)



   id string_column
0   1       [Hello]
1   2       [World]
2   3      [Python]


We will store the output from the **predict** call in a Pandas DataFrame so it can be used for generating the output part of the [ModelSignature](https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/api/model/snowflake.ml.model.model_signature.ModelSignature).

In [141]:
my_stcustom_model = ST_Custom_Model(stcustom_mc)
output_pd = my_stcustom_model.predict(df)
output_pd

   id string_column
0   1       [Hello]
1   2       [World]
2   3      [Python]
0     [Hello]
1     [World]
2    [Python]
Name: string_column, dtype: object
id                     1
string_column    [Hello]
Name: 0, dtype: object
<class 'pandas.core.series.Series'>
['Hello']
printed(j[string_column])
id                     2
string_column    [World]
Name: 1, dtype: object
<class 'pandas.core.series.Series'>
['World']
printed(j[string_column])
id                      3
string_column    [Python]
Name: 2, dtype: object
<class 'pandas.core.series.Series'>
['Python']
printed(j[string_column])
data dict being printed
 {'id': [0, 1, 2], 'embeddings': [array([-1.2736211 ,  0.5415507 ,  0.80620795, -0.5429207 , -0.5382189 ,
       -0.5652241 ,  0.25220737,  0.21757756, -0.64475125, -0.35152468,
       -0.95391375, -0.16957527,  0.11097623,  0.40696692, -0.31347537,
       -0.1854711 , -0.2573649 ,  0.16312367, -0.16383146,  0.7782914 ,
       -0.9097046 , -0.15215512,  0.19534326, -0.2082758 , 

Unnamed: 0_level_0,embeddings
id,Unnamed: 1_level_1
0,"[-1.2736211, 0.5415507, 0.80620795, -0.5429207..."
1,"[0.10775349, -0.33878946, 1.0631086, -0.027825..."
2,"[0.3051766, 0.6037056, -0.6238331, -0.6631124,..."


Before logging the model we need to provide a **Model Signauture**. A **Model Signature** can be created using sample data for the input and output by using the *model_signature.infer_signature* function.

In this case we can use the **test_pd** Pandas DataFrame as the input_data and **output_pd** Pandas DataFrame as the output.

In [142]:
list_sentence = ["Hello World", "Python is cool", "Snowflake is awesome"]
predict_sign = model_signature.infer_signature(input_data=list_sentence, output_data=output_pd)
predict_sign

ModelSignature(
                    inputs=[
                        FeatureSpec(dtype=DataType.STRING, name='input_feature_0')
                    ],
                    outputs=[
                        FeatureSpec(dtype=DataType.FLOAT, name='embeddings', shape=(768,))
                    ]
                )

We can now log the model using the [log_model](https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/api/registry/snowflake.ml.registry.Registry) method, we will use the model signature for the predict function. In order to know which function of themodel that uses the signature we need use the function name as the key for the **signatures** parameter. 

There can be multiple models under the same **model_name** as long as the **version_name** is different. 

There is also possible to add metrics using the **metrics** parameter, but in this case we will not do that.

In [143]:
# Create a model registry connection using the Snowpark session object, we will use the current database and schema for storing the model.
snowml_registry = Registry(session)

custom_mv = snowml_registry.log_model(
    my_stcustom_model,
    model_name="my_stcustom_model",
    version_name="version_1",
    conda_dependencies=["sentence-transformers"],
    pip_requirements=None,
    options={"relax_version": False},
    signatures={"predict": predict_sign},
    comment = 'PyCaret ClassificationExperiment using the CustomModel API'
)

SnowparkSQLException: (1300) (1304): 100357 (P0000): Python Interpreter Error:
ModuleNotFoundError: No module named 'sentence_transformers.model_card' in function CreateModule-8c395451-82f4-41b7-9a85-02734658c13b with handler predict.infer

In [None]:
#Generate dataframe with id and string column
data_dict = {
    "id": [1, 2, 3],
    "string_column": [["Hello",], ["World",], ["Python",]]
}

df = pd.DataFrame(data_dict)

print(df)



In [None]:
#Generate dataframe with id and string column
data_dict = {
    "id": [1, 2, 3],
    "string_column": [["Hello",], ["World",], ["Python",]]
}

df = pd.DataFrame(data_dict)

print(df)



In [None]:
#Generate dataframe with id and string column
data_dict = {
    "id": [1, 2, 3],
    "string_column": [["Hello",], ["World",], ["Python",]]
}

df = pd.DataFrame(data_dict)

print(df)



In [None]:
#Generate dataframe with id and string column
data_dict = {
    "id": [1, 2, 3],
    "string_column": [["Hello",], ["World",], ["Python",]]
}

df = pd.DataFrame(data_dict)

print(df)



In [None]:
#Generate dataframe with id and string column
data_dict = {
    "id": [1, 2, 3],
    "string_column": [["Hello",], ["World",], ["Python",]]
}

df = pd.DataFrame(data_dict)

print(df)



In [None]:
#Generate dataframe with id and string column
data_dict = {
    "id": [1, 2, 3],
    "string_column": [["Hello",], ["World",], ["Python",]]
}

df = pd.DataFrame(data_dict)

print(df)



We can use **show_models** to check that the model is avalible in the Model Registry

In [None]:
snowml_registry.show_models()

We can now use the logged model to do inference, using the model version object returned from **log_model**, **custom_mv**.

To test it, we can create a Snowpark DataFrame with the test data we defined previously and then use the **run** methond of the model version to get back a Snowpark DataFrame with the predictions added (it can also use a Pandas DataFrame directly).

Using the **show_functions** method on the model version object will show us which methods the model support and what the expected input and output is.

In [None]:
custom_mv.show_functions()

Create a Snowpark DataFrame and use the model on it.

In [None]:
snowpark_df = session.create_dataframe(test_data, schema=col_nms)

custom_mv.run(snowpark_df).show()

You have now succesfully deployed a PyCaret model to Snowflake using the Model Registry. If you go to Snowsight (the Snowflake GUI), you should see the model under **AI & ML -> Models**, if you do not see it make sure you are using the **ACCOUNTADMIN** role or the role you used to log the model.

If you want to use the model fom SQL, you could use the following SQL:
```SQL
SELECT 
 pycaret_juice!predict(*) as predict_dict,
 predict_dict['prediction_label']::text as prediction_label,
 predict_dict['prediction_score']::double as prediction_score
from pycaret_input_data;
```