# Invoking and Testing the Large Language Model Inference Service

Welcome to the fifth and final part of our tutorial series on building a question-answering application over a corpus of private documents using Large Language Models (LLMs). In the previous Notebooks, we've covered the processes of creating vector embeddings of our documents, deploying a Vector Store Inference Service, creating a Large Language Model Inference Service, and enriching user queries with relevant context using an Inference service Transformer component.

In this Notebook, you focus on the crucial task of invoking and testing the LLM Inference Service you've created. This is an important step in the development process as it allows us to validate the functionality and performance of your service in a practical setting.

Throughout this Notebook, we'll guide you on how to construct and send requests to the LLM Inference Service, interpret the responses, and handle potential issues that might arise. By the end of this Notebook, you will have gained practical experience in working with the LLM Inference Service, preparing you to integrate it into larger systems or applications.

Let's get started by importing the libraries you'll need:

In [None]:
import os
import json
import requests

You are now ready to test your service. Provide your question and get back the answer from the LLM inference service.

In [None]:
DOMAIN_NAME = "svc.cluster.local"  # change this to your domain for external access
NAMESPACE = os.environ['USER']
DEPLOYMENT_NAME = "llm"
MODEL_NAME = DEPLOYMENT_NAME
SVC = f'{DEPLOYMENT_NAME}-transformer-default.{NAMESPACE}.{DOMAIN_NAME}'
URL = f"https://{SVC}/v1/models/{MODEL_NAME}:predict"

print(URL)

In [None]:
data = {
  "instances": [{
      "question": "What are modern CPUs made of?"
  }]
}

headers = {"Authorization": f"Bearer {os.environ['AUTH_TOKEN']}"}

response = requests.post(URL, json=data, headers=headers, verify=False)

In [None]:
response.text

If you're executing this tutorial in an environment without access to a GPU device, the inference step might require more time than usual. Please exercise patience and allow for approximately 7-8 minutes. In the unlikely event that you encounter a time-out error, please attempt the process again.

# Conclusion

Congratulations on reaching the finish line of this comprehensive tutorial! You've successfully developed an application capable of delivering responses to user queries in a natural language format. The journey has not only enhanced your understanding but also allowed you to acquire hands-on experience in various facets of Large Language Models.

Throughout this process, you've demystified the concept of a Vector Store, created custom predictor and transformer components, and learned to log artifacts with MLflow. Moreover, all these tasks have been accomplished within the comfortable and familiar confines of your JupyterLab environment.

In conclusion, you've taken significant strides in your journey of mastering Large Language Models, and how to create real-world applications using the EzUA platform. Happy coding, and until our next tutorial, keep learning and experimenting!