# AI Inference Test Framework
This framework provides a complete environment for testing, interacting with, and building agentic workflows using Large Language Models (LLMs).

# Introduction
The system integrates several components to deliver a flexible and extensible inference stack:

- Open WebUI: Chat frontend for interacting with LLMs
- n8n: Workflow automation and orchestration
- PostgreSQL + pgVector: Vector database for embedding storage and retrieval
- trt-llm: Optimized inference server for efficient model execution

# Usage
The framework is fully operational immediately after deployment.
Additional LLMs can be integrated into Open WebUI following the official guide: <br>
[üëâ Open WebUI Quick Start Guide](https://docs.openwebui.com/getting-started/quick-start/starting-with-openai)

By default, the framework deploys the TinyLlama-1.1B-Chat-v1.0 model.
Optionally, you can deploy an NVIDIA NIM for GPU-accelerated inference using the steps below.

# (Optional) Deploying a Nvidia NIM
Since this is a test environment, only one LLM should be deployed at a time.
Before deploying a new NIM model, the old one must be removed.

### Step 0 ‚Äî Stop and Remove the Existing Container
Check for existing containers:

In [None]:
!docker ps -a

Stop and remove the current TRT-LLM container:

In [None]:
!docker kill trt-llm && docker rm trt-llm

Check that no LLM container is active anymore:

In [None]:
!docker ps -a

If trt-llm is not listed anymore, the environment is clean.

### Step 1 - Add the API Key and login to container registry

An NGC API key is required to access NGC resources and a key can be generated here: [NGC Catalog](https://org.ngc.nvidia.com/setup/api-keys).

When creating an NGC API key, ensure that at least ‚ÄúNGC Catalog‚Äù is selected from the ‚ÄúServices Included‚Äù dropdown. <br>
More Services can be included if this key is to be reused for other purposes.

![Image](https://docs.nvidia.com/nim/large-language-models/latest/_images/personal-key.png)

2. Add the API Key in the next column:   

In [None]:
import os
os.environ['NGC_API_KEY']="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"  #THIS NEEDS TO BE CHANGED

3. Login to the nvcr.io container registry

In [None]:
!echo $NGC_API_KEY | docker login nvcr.io -u '$oauthtoken' --password-stdin

### Step 2 - Configure the local NIM cache

In [None]:
os.environ['LOCAL_NIM_CACHE'] = os.path.expanduser("~/.cache/nim")

In [None]:
!mkdir -p "$LOCAL_NIM_CACHE"

Ensure that the directory is created

In [None]:
!ls -lha ~/.cache/ | grep nim

If the nim folder exists, the setup is correct.

### Step 3 - Deploy NIM

Now the actual NIM will be deployed. This example uses GPT-OSS-20B from the NGC catalog:

In [None]:
!docker run -d --rm --name nvidia-nim --gpus all --shm-size=16GB -e NGC_API_KEY=$NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8000:8000 nvcr.io/nim/openai/gpt-oss-20b:latest

Now that the container is created, we're waiting for it to start up. Execute the next cell to monitor the deployment.

Once it's done there will be an output like:
> Container is ready! <br>
> Proceed with the next steps in the notebook


In [None]:
import subprocess
import time

container_name = "nvidia-nim"  # Adjust this name to your container
search_string = "Uvicorn running on http://0.0.0.0:8000"

print(f"Monitoring logs of container '{container_name}'...")

while True:
    try:
        logs = subprocess.check_output(f"docker logs {container_name}", shell=True, stderr=subprocess.STDOUT).decode()
    except subprocess.CalledProcessError as e:
        print(f"Error retrieving logs: {e}")
        time.sleep(2)
        continue

    if search_string in logs:
        print("Container is ready!")
        break
    else:
        print("Not ready yet, waiting 2 seconds...")
        time.sleep(2)

print("Continue with the next steps in the notebook.")


Once ‚ÄúContainer is ready!‚Äù appears, the model is fully initialized and accessible via Open WebUI.<br>
You can now open your Brev instance, connect to the WebUI, and start interacting with the LLM.

### Step 4 - Cleanup

If you want to stop the current LLM and deploy a different one, you can terminate the container using the command in the next cell.<br>
By modifying the docker run command from Step 3, you can deploy any other LLM available in the NGC catalog.

In [None]:
!docker kill nvidia-nim

If the nvidia-nim container is no longer listed, the cleanup was successful:

In [None]:
!docker ps -a