# Invoking and Testing the Large Language Model Inference Service

Welcome to the fifth and final part of our tutorial series on building a question-answering application over a corpus of private documents using Large Language Models (LLMs). In the previous Notebooks, we've covered the processes of creating vector embeddings of our documents, deploying a Vector Store Inference Service, creating a Large Language Model Inference Service, and enriching user queries with relevant context using a transformer.

In this Notebook, we focus on the crucial task of invoking and testing the LLM Inference Service we've created. This is an important step in the development process as it allows us to validate the functionality and performance of our service in a practical setting.

Throughout this Notebook, we'll guide you on how to construct and send requests to the LLM Inference Service, interpret the responses, and handle potential issues that might arise. By the end of this Notebook, you will have gained practical experience in working with the LLM Inference Service, preparing you to integrate it into larger systems or applications.

Let's get started by importing the libraris you'll need:

In [None]:
import json
import requests
import ipywidgets as widgets

Like before, you'll need to provide your credentials to get a token you can use to invoke the inference service. The process should look familiar by now:

In [None]:
# Add heading
heading = widgets.HTML("<h2>Serving Credentials</h2>")
display(heading)

ezaf_env_input = widgets.Text(description='EZAF Env:')
namespace_input = widgets.Text(description='Namespace:')
username_input = widgets.Text(description='Username:')
password_input = widgets.Password(description='Password:')
submit_button = widgets.Button(description='Submit')
success_message = widgets.Output()

ezaf_env = None
namespace = None
username = None
password = None

def submit_button_clicked(b):
    global ezaf_env, namespace, username, password
    ezaf_env = ezaf_env_input.value
    namespace = namespace_input.value
    username = username_input.value
    password = password_input.value
    with success_message:
        success_message.clear_output()
        print("Credentials submitted successfully!")
    submit_button.disabled = True

submit_button.on_click(submit_button_clicked)

# Set margin on the submit button
submit_button.layout.margin = '20px 0 20px 0'

# Display inputs and button
display(ezaf_env_input, namespace_input, username_input, password_input, submit_button, success_message)

In [None]:
EZAF_ENV = ezaf_env 
token_url = f"https://keycloak.{EZAF_ENV}.com/realms/UA/protocol/openid-connect/token"

data = {
    "username" : username,
    "password" : password,
    "grant_type" : "password",
    "client_id" : "ua-grant",
}

token_responce = requests.post(token_url, data=data, allow_redirects=True, verify=False)

token = token_responce.json()["access_token"]

Finally, you are ready to test your service. Provide your question and get back the answer from the LLM inference service.

In [None]:
NAMESPACE = namespace
DEPLOYMENT_NAME = "llm"
MODEL_NAME = DEPLOYMENT_NAME
SVC = f'{DEPLOYMENT_NAME}-transformer-default.{NAMESPACE}.{EZAF_ENV}.com'
URL = f"https://{SVC}/v1/models/{MODEL_NAME}:predict"

print(URL)

In [None]:
data = {
  "instances": [{
      "question": "What are modern CPUs made of?"
  }]
}

headers = {"Authorization": f"Bearer {token}"}

response = requests.post(URL, json=data, headers=headers, verify=False)

In [None]:
response.text