# Invoking and Testing the Vector Store Inference Service (Optional)

Welcome to the third part of our tutorial series on building a question-answering application over a corpus of private documents using Large Language Models (LLMs). In our previous Notebooks, you've embarked on the journey of transforming unstructured text data into structured vector embeddings and deploying an Inference Service to serve the Vector Store that holds these embeddings.

In this optional Notebook, you will focus on invoking the Vector Store Inference Service you've created and testing its performance. This is an essential step, as it allows us to verify the functionality of your service and observe how it performs in practice. Throughout this Notebook, we will guide you on how to construct suitable requests, communicate with the service, and interpret the responses.

By the end of this Notebook, you will gain practical insights into the workings of the Vector Store Inference Service and will be well-prepared to integrate it into a larger system, alongside the Large Language Model Inference Service that you will create in the subsequent Notebook.

Let's get started! As always, let's import the libraries you'll need:

In [None]:
import os
import json
import requests

Next, you need to construct the URL you'll hit and define the payload for the POST request you'll send. For this example. you'll be using the V1 inference protocol, which is described below:

| API          | Verb | Path                          | Request Payload   | Response Payload                  |
|--------------|------|-------------------------------|-------------------|-----------------------------------|
| List Models  | GET  | /v1/models                    |                   | {"models": [<model_name>]}        |
| Model Ready  | GET  | /v1/models/<model_name>       |                   | {"name": <model_name>,"ready": $bool} |
| Predict      | POST | /v1/models/<model_name>:predict | {"instances": []}* | {"predictions": []}              |
| Explain      | POST | /v1/models/<model_name>:explain | {"instances": []}* | {"predictions": [], "explanations": []} |

\* Payload is optional

You want to invoke the `predict` API. So let's use a simple query to test our service:

In [None]:
DOMAIN_NAME = "svc.cluster.local"  # change this to your domain for external access
NAMESPACE = os.environ['USER']
DEPLOYMENT_NAME = "vectorstore"
MODEL_NAME = DEPLOYMENT_NAME
SVC = f'{DEPLOYMENT_NAME}-predictor-default.{NAMESPACE}.{DOMAIN_NAME}'
URL = f"https://{SVC}/v1/models/{MODEL_NAME}:predict"

print(URL)

In [None]:
data = {
  "instances": [{
      "question": "Who's Ada Lovelace?"
  }]
}

headers = {"Authorization": f"Bearer {os.environ['AUTH_TOKEN']}"}

response = requests.post(URL, json=data, headers=headers, verify=False)

In [None]:
response.text

# Conclusion and Next Steps

Well done! Through this Notebook, you've successfully interacted with and tested the Vector Store Inference Service. You've learned how to construct and send requests to the service and how to interpret the responses. This hands-on experience is crucial as it provides a practical understanding of the service's operation, preparing you for real-world applications.

In the next Notebook, you will extend our question-answering system by creating an Inference Service for the Large Language Model (LLM). The LLM Inference Service will work in conjunction with the Vector Store Inference Service to provide comprehensive and accurate answers to user queries.