<h1 style="text-align: center; font-size: 50px;">Multimodal RAG Chatbot with Langchain and ML Flow Evaluation</h1>

Retrieval-Augmented Generation (RAG) is an architectural approach that can enhance the effectiveness of large language model (LLM) applications using customized data. In this example, we use LangChain, an orchestrator for language pipelines, to build an assistant capable of loading information from a web page and use it for answering user questions. We'll also use the DeepEval platform to evaluate, observe and protect the LLM responses.

# Notebook Overview
- Imports
- Configurations
- Verify Assets
- Data Loading
- Creation of Chunks
- Retrieval
- Model Setup
- Chain Creation
- Model Service 

# Imports

By using our Local GenAI workspace image, many of the necessary libraries to work with RAG already come pre-installed - in our case, we just need to add the connector to work with PDF documents

In [1]:
%pip install -r ../requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


In [2]:
# === Standard Library Imports ===
from typing import List, Dict, Any
from datetime import datetime
import warnings
from pathlib import Path
import os
import sys
import logging
import pandas as pd
import json
from copy import deepcopy
from tqdm import tqdm
from collections import defaultdict

# === MLflow integration ===
import mlflow

# Define the relative path to the 'core' directory (one level up from current working directory)
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))
# === Import ChatbotService from project core ===
from core.chatbot_service.chatbot_service import ChatbotService

# === Third-Party Imports ===
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.embeddings import Embeddings
from langchain_core.documents import Document
from langchain_core.runnables import Runnable, RunnablePassthrough, RunnableLambda
from langchain.vectorstores import Chroma
from langchain_community.vectorstores.utils import filter_complex_metadata
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.document import Document
from langchain.document_loaders import WebBaseLoader, JSONLoader
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_huggingface import HuggingFaceEmbeddings

import promptquality as pq
import torch
from langchain_core.output_parsers import StrOutputParser
import base64, os, mimetypes
from chromadb.config import Settings
from transformers import SiglipProcessor, SiglipModel
from llama_cpp import Llama
from llama_cpp import llama_supports_gpu_offload

# Define the relative path to the 'src' directory (one level up from current working directory)
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..")))

# === Project-Specific Imports (from src) ===
from src.local_genai_judge import LocalGenAIJudge
from src.utils import (
    load_config_and_secrets,
    configure_proxy,
    initialize_llm,
    configure_hf_cache,
    mlflow_evaluate_setup,
)

2025-07-11 17:33:46.468060: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-07-11 17:33:46.541332: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752255226.573276    2267 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752255226.581964    2267 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1752255226.625957    2267 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

# Configurations

In [3]:
warnings.filterwarnings("ignore")

In [4]:
# Create logger
logger = logging.getLogger("multimodal_rag_logger")
logger.setLevel(logging.INFO)

formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s", 
                              datefmt="%Y-%m-%d %H:%M:%S") 

stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
logger.propagate = False

In [5]:
CONFIG_PATH = "../configs/config.yaml"
SECRETS_PATH = "../configs/secrets.yaml"
DATA_PATH = "../data"
IMAGE_DIR = os.path.join(DATA_PATH, "images")  # PNG/JPGs
MM_JSON = os.path.join(DATA_PATH, "wiki_flat_structure.json")

MLFLOW_EXPERIMENT_NAME = "AIStudio-Multimodal-Chatbot-Experiment"
MLFLOW_RUN_NAME = "AIStudio-Multimodal-Chatbot-Run"

LOCAL_MODEL_PATH = "/home/jovyan/datafabric/llama3.1-8b-instruct/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf"
INTERNVL_MODEL_PATH = "/home/jovyan/datafabric/InternVL3-8B-Instruct-Q8_0/InternVL3-8B-Instruct-Q8_0.gguf"
MM_PROJ_PATH = "/home/jovyan/datafabric/mmproj-InternVL3-8B-Instruct-Q8_0/mmproj-InternVL3-8B-Instruct-Q8_0.gguf"

DEMO_FOLDER = "../demo"
MLFLOW_MODEL_NAME = "AIStudio-Multimodal-Chatbot-Model"

In [6]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [7]:
print("GPU off-load supported:", llama_supports_gpu_offload())

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes


GPU off-load supported: True


In [8]:
logger.info('Notebook execution started.')

2025-07-11 17:33:49 - INFO - Notebook execution started.


## Configuration of HuggingFace caches

In the next cell, we configure HuggingFace cache, so that all the models downloaded from them are persisted locally, even after the workspace is closed. This is a future desired feature for AI Studio and the GenAI addon.

In [9]:
# Configure HuggingFace cache
configure_hf_cache()

In [10]:
# Initialize HuggingFace Embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="intfloat/e5-large-v2",
    cache_folder="/tmp/hf_cache"
)

2025-07-11 17:33:50,173 - INFO - Use pytorch device_name: cuda:0
2025-07-11 17:33:50,173 - INFO - Load pretrained SentenceTransformer: intfloat/e5-large-v2


## Configuration and Secrets Loading

In this section, we load configuration parameters and API keys from separate YAML files. This separation helps maintain security by keeping sensitive information (API keys) separate from configuration settings.

- **config.yaml**: Contains non-sensitive configuration parameters like model sources and URLs
- **secrets.yaml**: Contains sensitive API keys for services like Galileo and HuggingFace

In [11]:
config, secrets = load_config_and_secrets(CONFIG_PATH, SECRETS_PATH)

# Verify Assets

In [12]:
def log_asset_status(asset_path: str, asset_name: str, success_message: str, failure_message: str) -> None:
    """
    Logs the status of a given asset based on its existence.

    Parameters:
        asset_path (str): File or directory path to check.
        asset_name (str): Name of the asset for logging context.
        success_message (str): Message to log if asset exists.
        failure_message (str): Message to log if asset does not exist.
    """
    if Path(asset_path).exists():
        logger.info(f"{asset_name} is properly configured. {success_message}")
    else:
        logger.info(f"{asset_name} is not properly configured. {failure_message}")

log_asset_status(
    asset_path=CONFIG_PATH,
    asset_name="Config",
    success_message="",
    failure_message="Please check if the configs.yaml was propely connfigured in your project on AI Studio."
)

log_asset_status(
    asset_path=SECRETS_PATH,
    asset_name="Secrets",
    success_message="",
    failure_message="Please check if the secrets.yaml was propely connfigured in your project on AI Studio."
)

log_asset_status(
    asset_path=INTERNVL_MODEL_PATH,
    asset_name="Local InternVL-8B model",
    success_message="",
    failure_message="Please create and download the required assets in your project on AI Studio if you want to use local model.")

log_asset_status(
    asset_path=MM_PROJ_PATH,
    asset_name="Vision projector (.gguf)",
    success_message="",
    failure_message="Download mmproj-InternVL3-8B-Instruct-Q8_0.gguf")

log_asset_status(
    asset_path=MM_JSON,
    asset_name="wiki_flat_structure.json",
    success_message="",
    failure_message="Place JSON Wiki Pages in data/")

2025-07-11 17:33:53 - INFO - Config is properly configured. 
2025-07-11 17:33:53 - INFO - Secrets is properly configured. 
2025-07-11 17:33:53 - INFO - Local InternVL-8B model is properly configured. 
2025-07-11 17:33:53 - INFO - Vision projector (.gguf) is properly configured. 
2025-07-11 17:33:53 - INFO - wiki_flat_structure.json is properly configured. 


# Data Loading & Cleaning

We load wiki-pages from `wiki_flat_structure.json`, but:
* remove any image name that  
  – is empty / `None`  
  – contains invalid characters (e.g. the `==image_0==` placeholders)  
  – has an extension not in {png, jpg, jpeg, webp, gif}  
  – points to a file that does **not** exist in `data/images/`
* log every discarded image so we can fix the parser later.

In [13]:
VALID_EXTS = {".png", ".jpg", ".jpeg", ".webp", ".gif"}

def load_mm_docs_clean(json_path: str, img_dir: Path) -> list[Document]:
    bad_imgs, docs = [], []
    with open(json_path, encoding="utf-8") as f:
        rows = json.load(f)

    for row in rows:
        images_ok: list[str] = []
        for name in row.get("images", []):
            # -- basic sanity checks --------------------------------------------------
            if (not name) or ("==" in name):
                bad_imgs.append((row["path"], name, "placeholder/empty"))
                continue
            ext = Path(name).suffix.lower()
            if ext not in VALID_EXTS:
                bad_imgs.append((row["path"], name, f"invalid ext {ext}"))
                continue
            img_path = img_dir / name
            if not img_path.is_file():
                bad_imgs.append((row["path"], name, "missing file"))
                continue
            images_ok.append(name)

        meta = {"source": row["path"], "images": images_ok}
        docs.append(Document(page_content=row["content"], metadata=meta))

    # -------- logging summary ------------------------------------------------------
    if bad_imgs:
        logger.warning("⚠️ %d broken image refs filtered out", len(bad_imgs))
        for src, name, reason in bad_imgs[:20]:                 # first 20 only
            logger.debug("  » %s → %s (%s)", src, name, reason)
    else:
        logger.info("✅ no invalid image refs found")

    return docs

mm_raw_docs = load_mm_docs_clean(MM_JSON, Path(IMAGE_DIR))
def log_stage(name: str, docs: List[Document]):
    logger.info(f"{name}: {len(docs)} docs, avg_tokens={sum(len(d.page_content) for d in docs)/len(docs):.0f}")
log_stage("Docs loaded", mm_raw_docs)


2025-07-11 17:33:54 - INFO - Docs loaded: 548 docs, avg_tokens=3076


# Creation of Chunks
Here, we split the loaded documents into chunks, so we have smaller and more specific texts to add to our vector database.

In [14]:
# === Initialize text splitter ===
# - chunk_size: Maximum number of characters per text chunk.
# - chunk_overlap: Number of overlapping characters between chunks.

def chunk_documents(docs, chunk_size=600, overlap=100):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=overlap,
        separators=["\n## ", "\n# ", "\n\n", ".", "!", "?"]
    )
    splits = splitter.split_documents(docs)
    return splits

splits = chunk_documents(mm_raw_docs)
def log_stage(name: str, docs: List[Document]):
    logger.info(f"{name}: {len(docs)} docs, avg_tokens={sum(len(d.page_content) for d in docs)/len(docs):.0f}")
# e.g. after splits
log_stage("Chunks created", splits)

2025-07-11 17:33:54 - INFO - Chunks created: 4199 docs, avg_tokens=413


# Setup Embeddings & Vector Store
Here we setup Siglip for Image embeddings, and also store our cleaned text chunks embeddings into Chroma.

In [15]:

CHROMA_SETTINGS = Settings(anonymized_telemetry=False)

# --- 1) TEXT collection (unchanged) ---------------------------------------
clean_chunks = filter_complex_metadata(deepcopy(splits))
text_db = Chroma.from_documents(
    documents       = clean_chunks,
    embedding       = embeddings,          # HF E5
    collection_name = "wiki_text_mm",
    client_settings = CHROMA_SETTINGS,
)
logger.info("Text collection ready: %d vectors", text_db._collection.count())

# --- 2) IMAGE collection ---------------------------------------------------
class SiglipEmbeddings(Embeddings):
    def __init__(self,
                 model_id: str = "google/siglip2-base-patch16-224",
                 device: str | None = None):
        from transformers import SiglipModel, SiglipProcessor
        import torch, PIL.Image as PILImage
        self.device    = device or ("cuda" if torch.cuda.is_available() else "cpu")
        self.model     = SiglipModel.from_pretrained(model_id).to(self.device)
        self.processor = SiglipProcessor.from_pretrained(model_id)
        self.torch     = torch
        self.PILImage  = PILImage

    def _embed_text(self, txts):  # list[str]
        inp = self.processor(text=txts, return_tensors="pt",
                             padding=True, truncation=True).to(self.device)
        with self.torch.no_grad():
            return self.model.get_text_features(**inp).cpu().numpy()

    def _embed_imgs(self, paths):  # list[str]
        imgs = [self.PILImage.open(p).convert("RGB") for p in paths]
        inp  = self.processor(images=imgs, return_tensors="pt").to(self.device)
        with self.torch.no_grad():
            return self.model.get_image_features(**inp).cpu().numpy()

    # LangChain API --------------------------------------------------------
    def embed_documents(self, docs):      # list[str]
        return self._embed_imgs(docs).tolist()

    def embed_query(self, txt):           # single str
        return self._embed_text([txt])[0].tolist()

siglip_embeddings = SiglipEmbeddings()

image_db = Chroma(
    collection_name    = "wiki_image_mm",
    embedding_function = siglip_embeddings,
    client_settings    = CHROMA_SETTINGS,
)

# --- populate image vectors (skip duplicate IDs) --------------------------
img_paths, img_ids, img_meta = [], [], []
seen_ids = set()
dup_count = 0

for doc in mm_raw_docs:
    src   = doc.metadata["source"]
    for name in set(doc.metadata["images"]):        # 1× per image per doc
        img_id = f"{src}::{name}"
        if img_id in seen_ids:                      # already queued
            dup_count += 1
            continue
        full = str(Path(IMAGE_DIR) / name)
        img_paths.append(full)
        img_ids.append(img_id)
        img_meta.append({"source": src, "image": name})
        seen_ids.add(img_id)

if dup_count:
    logger.info("Skipped %d duplicate image IDs", dup_count)

image_db.add_texts(
    texts     = img_paths,
    metadatas = img_meta,
    ids       = img_ids,
)
logger.info("Image collection ready: %d vectors", image_db._collection.count())


2025-07-11 17:33:54,929 - ERROR - Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
2025-07-11 17:33:54,935 - ERROR - Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
2025-07-11 17:34:36 - INFO - Text collection ready: 4199 vectors
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
2025-07-11 17:34:40,716 - ERROR - Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
2025-07-11 17:34:40,718 - ERROR - Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
2025-07-11 17:35:04 - INFO - Image co

# Retrieval

We transform the texts and images into embeddings and store them in a vector database. This allows us to perform similarity search, and proper retrieval of documents

In [16]:
def retrieve_mm(query: str, *, k_txt: int = 4, k_img: int = 4,
                pad_with_similarity: bool = True) -> Dict[str, Any]:
    """
    Returns
      docs   – k_txt most-relevant text chunks
      images – up to k_img image paths
               • first: images explicitly referenced in `docs`
               • optional: extra similarity-based images, to reach k_img
    """
    # ---------- 1) text ---------------------------------------------------
    txt_docs = text_db.similarity_search(query, k=k_txt)

    # ---------- 2) images linked to those docs ----------------------------
    linked = [
        str(Path(IMAGE_DIR) / name)
        for doc in txt_docs
        for name in doc.metadata.get("images", [])
    ]
    # keep order, drop dups
    seen = set()
    linked = [p for p in linked if not (p in seen or seen.add(p))]

    # ---------- 3) (optional) semantic padding ----------------------------
    if pad_with_similarity and len(linked) < k_img:
        q_vec = siglip_embeddings.embed_query(query)
        extra = image_db.similarity_search_by_vector(q_vec, k=k_img * 2)
        for d in extra:
            p = d.page_content
            if p not in seen:
                linked.append(p)
                seen.add(p)
            if len(linked) >= k_img:
                break

    return {
        "docs":   txt_docs,
        "images": linked[:k_img]        # never exceed the requested cap
    }


# Model Setup

In this notebook, we provide three different options for loading the model:
 * **local**: by loading the internvl3-8b-instruct-Q8_0 model from the asset downloaded on the project
 * **hugging-face-local** by downloading a DeepSeek model from Hugging Face and running locally
 * **hugging-face-cloud** by accessing the Mistral model through Hugging Face cloud API (requires HuggingFace API key saved on secrets.yaml)

This choice can be set in the config.yaml file. The model deployed on the bottom cells of this notebook will load the choice from the config file.

In [17]:
# ## 🤖 LLM – InternVL-3 (8B Q8_0 + mmproj)

llm_mm = Llama(
    model_path   = INTERNVL_MODEL_PATH,
    mmproj_path  = MM_PROJ_PATH,
    chat_format  = "qwen",
    n_gpu_layers = -1,
    n_ctx        = 8192,
    n_batch      = 256,
    f16_kv       = True,
    verbose      = False,
)


llama_context: n_ctx_per_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_kv_cache_unified: LLAMA_SET_ROWS=0, using old ggml_cpy() method for backwards compatibility


# Chain Creation
In this part, we define a pipeline that receives a question and context, formats the context documents, and uses the Qwen chat model to answer the question based on the provided context. The output is then formatted as a string for easy reading.

In [18]:
SYSTEM_PROMPT = (
    "You are an internal DevOps assistant. "
    "Use the context provided between <context> tags to answer the user’s question. "
    "If the answer isn’t in the context, say you don’t know.\n\n"
    "Context:\n<context>\n\n"
    "Answer clearly and concisely. Steps or bullet-points are fine when helpful."
)

def _b64(path: str) -> str:
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode()

def build_messages(inp: Dict) -> List[Dict]:
    """
    Returns a ChatML-compatible list:
      • system  – plain text
      • user    – text + embedded screenshots in SAME message
    """
    # ---------- context ---------------------------------------------------
    context_txt = "\n\n".join(
        f"[{d.metadata['source']}]\n{d.page_content}"
        for d in inp["docs"]
    )

    # ---------- images ----------------------------------------------------
    def mime(p: str) -> str:
        from mimetypes import guess_type
        return guess_type(p)[0] or "image/png"

    image_blocks = [
        {"type": "image_url",
         "image_url": {"url": f"data:{mime(p)};base64,{_b64(p)}"}}
        for p in inp["images"]
    ]

    # ---------- messages list ---------------------------------------------
    system_msg = {
        "role": "system",
        "content": SYSTEM_PROMPT.replace("<context>", context_txt),
    }

    # “content” can be a list ⇒ text first, then each image dict
    user_msg = {
        "role": "user",
        "content": [inp["query"], *image_blocks],
    }

    return [system_msg, user_msg]


def call_llm(msgs: List[Dict]) -> str:
    resp = llm_mm.create_chat_completion(messages=msgs)
    return resp["choices"][0]["message"]["content"]

mm_chain = (
    {
        "query":   RunnablePassthrough(),
        "results": RunnableLambda(lambda q: retrieve_mm(q)),
    }
    | RunnableLambda(lambda d: {"query": d["query"], **d["results"]})
    | RunnableLambda(build_messages)
    | RunnableLambda(call_llm)
    | StrOutputParser()
)

In [19]:
query = "How do I manually clean my environment without hooh?"

results = retrieve_mm(query, k_txt=4, k_img=4)

# --- text context -------------------------------------------------
for i, doc in enumerate(results["docs"], 1):
    print(f"\n▶ Doc {i}  •  {doc.metadata['source']}")
    print(doc.page_content[:700], "…")

# --- images -------------------------------------------------------
print("\n▶ Images")
for p in results["images"]:
    print(p)


2025-07-11 17:35:53,942 - ERROR - Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
2025-07-11 17:35:53,993 - ERROR - Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given



▶ Doc 1  •  How%2Dto-articles/How-to-manually-clean--your-environment.md
This documentation is directed at users who don't have access to hooh for some reason. Hooh should handle these automatically when changing environments.

Access the AIStudio directory by typing `%localappdata%` in your explorer.

![image.png](/.attachments/image-18fc95b4-a25e-41d6-a85a-917ee67c75b1.png)

Next, go to the **HP** direcotory and then **AiStudio**


In the AIStudio direcotory, delete the directories that have either an **Account ID**, the **db** and **creds** direcotires, as circled below.
Also delete the userconfig file, as it stores environment specific variables. …

▶ Doc 2  •  Education%2DResources-and-Knowledge-Share/Learning-AI-Studio-Education-and-References/A-walk-through-of-getting-setup-(new-to-the-AI-Studio-team%3F).md
You need to launch Windows PowerShell (WPS) as an admin and then make the hooh command line call from within that PowerShell session. This will enable hooh at the admin leve

In [20]:
# ✅ Quick Test

question = "How do I manually clean my environment without hooh?"
print(mm_chain.invoke(question))


To manually clean your AI Studio environment when you don't have access to hooh, follow these steps:

1. **Access the AIStudio Directory:**
   - Open File Explorer and type `%localappdata%` in the address bar to navigate to the local app data directory.
   - Navigate to the **HP** directory and then to the **AiStudio** directory.

2. **Delete Specific Directories and Files:**
   - In the AIStudio directory, delete the directories that contain an **Account ID**, the **db** directory, and the **creds** directory. These are typically circled in the provided image.
   - Also, delete the **userconfig** file, as it stores environment-specific variables.

3. **Launch Windows PowerShell as Admin:**
   - Go to the file location for PowerShell.
   - Right-click on PowerShell and select "Run Self Elevated with Policy Pak" to launch it as an administrator.

4. **Run the hooh Command:**
   - Within the PowerShell session, run the following command to clean up AI Studio data:
     ```shell
     hooh

In [21]:
doc = text_db.similarity_search(query)[0]
print(doc.metadata)

{'source': 'How%2Dto-articles/How-to-manually-clean--your-environment.md'}


# MLFlow Model Service 

In this section, we demonstrate how to deploy a RAG-based chatbot service. This service provides a REST API endpoint that allows users to query the knowledge base with natural language questions, upload new documents to the knowledge base, and manage conversation history, all with built-in safeguards against sensitive information and toxicity. This service encapsulates all the functionality we developed in this notebook, including the document retrieval system, RAG-based question answering capabilities, and Galileo integration for protection, observation and evaluation. It demonstrates how to use our ChatbotService from the src/service directory. 

## Setup

In [22]:
mlflow_evaluate_setup(
    secrets,
    mlflow_tracking_uri="/phoenix/mlflow"
)

# === Set MLflow experiment context ===
mlflow.set_experiment(MLFLOW_EXPERIMENT_NAME)

# === Validate local model file path ===
if not os.path.exists(LOCAL_MODEL_PATH):
    logger.info(f"⚠️ Warning: Model file not found at {LOCAL_MODEL_PATH}. Please verify the path.")

✅ Environment ready for MLflow evaluation.


## Log & Register Model

In [23]:
# === Log and register model to MLflow ===
with mlflow.start_run(run_name=MLFLOW_RUN_NAME) as run:
    
    # Log model artifacts using custom ChatbotService
    ChatbotService.log_model(
        artifact_path=MLFLOW_MODEL_NAME,
        config_path=CONFIG_PATH,
        secrets_path=SECRETS_PATH,
        docs_path=DATA_PATH,
        model_path=LOCAL_MODEL_PATH,
        demo_folder=DEMO_FOLDER
    )

    # Construct the URI for the logged model
    model_uri = f"runs:/{run.info.run_id}/{MLFLOW_MODEL_NAME}"

2025-07-11 17:39:30,914 - INFO - Use pytorch device_name: cuda:0
2025-07-11 17:39:30,915 - INFO - Load pretrained SentenceTransformer: intfloat/e5-large-v2


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/732 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/49 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

 - PyPDF (current: uninstalled, required: PyPDF)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.
2025-07-11 17:41:40,585 - INFO - Model and artifacts successfully registered in MLflow.


In [24]:
# Register the model into MLflow Model Registry
mlflow.register_model(
    model_uri=model_uri,
    name=MLFLOW_MODEL_NAME
)

logger.info(f"✅ Model registered successfully with run ID: {run.info.run_id}")

Registered model 'AIStudio-Multimodal-Chatbot-Model' already exists. Creating a new version of this model...
Created version '8' of model 'AIStudio-Multimodal-Chatbot-Model'.
2025-07-11 17:41:41 - INFO - ✅ Model registered successfully with run ID: 61cbed93136e494791bedcdc3e18f632


## Evaluate Hallucination, Answer Relevance

In [25]:
model_source = config["model_source"]

In [26]:
%%time

llm = initialize_llm(model_source, secrets)

ValidationError: 1 validation error for LlamaCpp
  Value error, Could not load Llama model from path: /home/jovyan/datafabric/llama3.1-8b-instruct/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf. Received error Failed to load model from file: /home/jovyan/datafabric/llama3.1-8b-instruct/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf [type=value_error, input_value={'model_path': '/home/jov...: None, 'grammar': None}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

In [None]:

def model(batch_df: pd.DataFrame) -> pd.DataFrame:
    preds, contexts = [], []
    for q in batch_df["questions"]:
        answer = mm_chain.invoke(q)
        preds.append(answer)

        docs = retriever.get_relevant_documents(q)
        contexts.append(" ".join(d.page_content for d in docs))

    # keep the incoming index so every batch’s rows stay unique
    return pd.DataFrame(
        {
            "result": preds,
            "source_documents": contexts,
        },
        index=batch_df.index,      #  ← key line
    )

# --- 3)  Evaluation dataset
eval_df = pd.DataFrame({"questions": [
    "What naming convention should I use for a new blueprint project folder?",
    "What is the first step in the standard blueprint testing workflow?",
    "How do I fetch logs from a running Kubernetes pod?",
]})

judge = LocalGenAIJudge(
    llm=llm
)

faithfulness_metric = judge.to_mlflow_metric("faithfulness")
relevance_metric = judge.to_mlflow_metric("relevance")

results = mlflow.evaluate(
    model,
    eval_df,
    predictions="result",
    evaluators="default",
    extra_metrics=[faithfulness_metric, relevance_metric],
    evaluator_config={
        "col_mapping": {
            "inputs": "questions",
            "context": "source_documents"
        }
    },
)


Built with ❤️ using Z by HP AI Studio.