Two Ways to Use Deployed CML Model in Development

There are two ways to interact with CML hosted model. One is through cmlapi library, which is a Python wrapper for APIv2 interface, and the other approach is a direct HTTP request. Both options can facilitate authentication of the caller and use JSON as the message exchange format. This notebook combines the two methods: gatheres the necessary variables via cmlapi and makes a request to the model endpoint with a requests python package.  

### Launch CML client utility
For complete reference see [cmlapi documentation](https://docs.cloudera.com/machine-learning/cloud/api/topics/ml-apiv2-usage-examples.html) examples. 

In [None]:
import cmlapi
import json
import sys
import os
client = cmlapi.default_client(url=os.getenv("CDSW_API_URL").replace("/api/v1", ""), cml_api_key=os.getenv("CDSW_APIV2_KEY"))
client.list_projects()

### Locate and assign your CML project
The lab's ML workspace constains a separate project that is used to host the model. We can find the project ID along with other information by performing a search on its name.

In [None]:
projects = client.list_projects(include_public_projects=True, search_filter=json.dumps({"name": "Shared LLM Model for Hands on Lab"}))
project = projects.projects[0] # assuming only one project is returned by the above query
print(project)

### Locate CML model and Load Access Key for Model to Environment
Within the retreived project, we'll use the model object to retreive the model access for the use in the call later.

In [None]:
## Here we assume that only one model has been deployed in the project, if this is not true this should be adjusted (this is reflected by the placeholder 0 in the array)
model = client.list_models(project_id=project.id)
selected_model = model.models[0]

## Save the access key for the model to the environment variable of this project
os.environ["MODEL_ACCESS_KEY"] = selected_model.access_key

### Generate Model Endpoint URL for Request
We then build a URL to make the call the model, given the model access key from the previous step.

In [None]:
MODEL_ENDPOINT = os.getenv("CDSW_API_URL").replace("https://", "https://modelservice.").replace("/api/v1", "/model?accessKey=")
MODEL_ENDPOINT = MODEL_ENDPOINT + os.environ["MODEL_ACCESS_KEY"]
os.environ["MODEL_ENDPOINT"] = MODEL_ENDPOINT

### Create Request to Model
Finally, all of the above variables are put to use in order to make a request to the model and interpret the response.

In [None]:
import requests
import json
import os

## Set variables
temperature = 0.01
token_count = 150

## Write a question to ask the model
question = "What is Cloudera Data Platform?"

llama_sys = f"<<SYS>>\n You are a helpful and honest assistant. If you are unsure about an answer, truthfully say \"I don't know\".\n<</SYS>>\n\n"
llama_inst = f"[INST]Use your knowledge to answer the user's question. [/INST]"
question_and_context = f"{llama_sys} {llama_inst} [INST] User: {question} [/INST]"

data={ "request": {"prompt":question_and_context,"temperature":temperature,"max_new_tokens":token_count,"repetition_penalty":1.0} }

r = requests.post(os.environ["MODEL_ENDPOINT"], data=json.dumps(data), headers={'Content-Type': 'application/json'})

# Logging
print(f"Request: {data} \n\n")
print(f"Response: {r.json()}")


### Takeaways
* Models deployed in CML can be accessed via an API endpoint call and a JSON payload containing the request
* Models can have multiple replicas to accomodate the load based on the use case
* Authorization is done with a a model access key and (optionally) user access key