# RAG with Galileo and Langchain
Retrieval-Augmented Generation (RAG) is an architectural approach that can enhance the effectiveness of large language model (LLM) applications using customized data. In this example, we use LangChain, an orchestrator for language pipelines, to build an assistant capable of loading information from a web page and use it for answering user questions

## Step 0: Configuring the environment
By using our Local GenAI workspace image, many of the necessary libraries to work with RAG already come pre-installed - in our case, we just need to add the connector to work with PDF documents

In [1]:
!pip install -r ../requirements.txt --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
promptquality 0.64.2 requires galileo-core<3.0.0,>=2.14.0, but you have galileo-core 3.26.0 which is incompatible.[0m[31m
[0m

In [2]:
import os
import sys
# Add the src directory to the path to import utils
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../..")))

In [3]:
import uuid
import base64
import mlflow
import pandas as pd
from typing import List

# LangChain modules for LLM and document processing
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader
from langchain_community.llms import LlamaCpp
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser, Document
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda, RunnableMap

# Hugging Face integrations
from langchain_huggingface import HuggingFaceEmbeddings, HuggingFacePipeline, HuggingFaceEndpoint
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# MLflow for model management
from mlflow.pyfunc import PythonModel
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec, ParamSchema, ParamSpec

# Galileo Protect and Observe for security and monitoring
import promptquality as pq
import galileo_protect as gp
from galileo_protect import ProtectTool, ProtectParser, Ruleset
from galileo_observe import GalileoObserveCallback

# Project-specific utility functions
from src.utils import (
    initialize_galileo_observer,
    initialize_galileo_protect,
    initialize_galileo_evaluator,
    setup_galileo_environment,
    configure_proxy,
    load_config_and_secrets,
    configure_hf_cache,
    initialize_llm
)

USER_AGENT environment variable not set, consider setting it to identify your requests.
2025-04-02 20:45:04.493214: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-02 20:45:04.500682: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743626704.509625     286 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743626704.512425     286 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743626704.519126     286 computation_place

### Configuration of Hugging face caches

In the next cell, we configure HuggingFace cache, so that all the models downloaded from them are persisted locally, even after the workspace is closed. This is a future desired feature for AI Studio and the GenAI addon.

In [4]:
# Configure HuggingFace cache
configure_hf_cache()

### Configuration and Secrets Loading

In this section, we load configuration parameters and API keys from separate YAML files. This separation helps maintain security by keeping sensitive information (API keys) separate from configuration settings.

- **config.yaml**: Contains non-sensitive configuration parameters like model sources and URLs
- **secrets.yaml**: Contains sensitive API keys for services like Galileo and HuggingFace

In [5]:
config_path = "../../configs/config.yaml"
secrets_path = "../../configs/secrets.yaml"

config, secrets = load_config_and_secrets(config_path, secrets_path)

### Proxy Configuration

In order to connect to Galileo service, a SSH connection needs to be established. For certain enterprise networks, this might require an explicit setup of the proxy configuration. If this is your case, set up the "proxy" field on your config.yaml and the following cell will configure the necessary environment variable.

In [6]:
configure_proxy(config)

## Step 1: Data Loading

In this step, we will use the Langchain framework to  extract the content from a local PDF file with the product documentation. Also, we have commented some example on how to use Web Loaders to load data form pages on the web.

In [7]:
data_path = "../data"

if not os.path.exists(data_path):
    raise FileNotFoundError(f"'data' folder not found in path: {os.path.abspath(data_path)}")

file_path = os.path.join(data_path, "AIStudioDoc.pdf")
pdf_loader = PyPDFLoader(file_path)
pdf_data = pdf_loader.load()

#loader1 = WebBaseLoader("https://www.hp.com/us-en/workstations/ai-studio.html") # If you want to change the knowledge base, just modify this link.
#data1 = loader1.load()

#loader2 = WebBaseLoader("https://zdocs.datascience.hp.com/docs/aistudio")
#data2 = loader2.load()

## Step 2: Creation of Chunks
Here, we split the loaded documents into chunks, so we have smaller and more specific texts to add do our vector database.

In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(pdf_data)


## Step 3: Retrieval

We transform the texts into embeddings and store them in a vector database. This allows us to perform similarity search, and proper retrieval of documents


In [9]:
embedding = HuggingFaceEmbeddings()

vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
retriever = vectordb.as_retriever()

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Step 4: Model

In this notebook, we provide three different options for loading the model:
 * **local**: by loading the llama2-7b model from the asset downloaded on the project
 * **hugging-face-local** by downloading a DeepSeek model from Hugging Face and running locally
 * **hugging-face-cloud** by accessing the Mistral model through Hugging Face cloud API (requires HuggingFace API key saved on secrets.yaml)

This choice can be set in the variable model_source below or as an entry in the config.yaml file. The model deployed on the bottom cells of this notebook will load the choice from the config file.

In [10]:
model_source = "local"
if "model_source" in config:
    model_source = config["model_source"]

In [11]:
llm = initialize_llm(model_source, secrets)

  llm = initialize_llm(model_source, secrets)


## Step 5: Chain
In this part, we define a pipeline that receives a question and context, formats the context documents, and uses a Hugging Face (Mistral) chat model to answer the question based on the provided context. The output is then formatted as a string for easy reading.

In [12]:
def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])

template = """You are an virtual Assistant for a Data Science platform called AI Studio. Answer the question based on the following context:

    {context}

    Question: {query}
    """
prompt = ChatPromptTemplate.from_template(template)

chain = {"context": retriever | format_docs, "query": RunnablePassthrough()} | prompt | llm | StrOutputParser()

## Step 6: Galileo Evaluate
Through the Galileo library called Prompt Quality, we connect our API generated in the Galileo Evaluate to log in. To get your ApiKey, use this link: https://console.hp.galileocloud.io/api-keys

Galileo Evaluate is a platform designed to optimize and simplify the experimentation and evaluation of generative AI systems, especially large language model (LLM) applications. Its goal is to facilitate the process of building AI systems with deep insights and collaborative tools, replacing fragmented experimentation in spreadsheets and notebooks with a more integrated approach.

You can log metrics in Galileo Evaluate and track all your experiments in one place. In our example, we logged several questions, selected specific metrics, and ran a batch of experiments to evaluate our chain. To learn more about the available metrics, see: [Galileo Guardrail Metrics](https://docs.rungalileo.io/galileo/gen-ai-studio-products/galileo-guardrail-metrics).

In [13]:
#########################################
# In order to connect to Galileo, create a secrets.yaml file in the same folder as this notebook
# This file should be an entry called Galileo, with the your personal Galileo API Key
# Galileo API keys can be created on https://console.hp.galileocloud.io/settings/api-keys
#########################################

setup_galileo_environment(secrets)
pq.login(os.environ['GALILEO_CONSOLE_URL'])

👋 You have logged into 🔭 Galileo (https://console.hp.galileocloud.io/) as muhammed.turhan@hp.com.


Config(console_url=Url('https://console.hp.galileocloud.io/'), username=None, password=None, api_key=SecretStr('**********'), token=SecretStr('**********'), current_user='muhammed.turhan@hp.com', current_project_id=None, current_project_name=None, current_run_id=None, current_run_name=None, current_run_url=None, current_run_task_type=None, current_template_id=None, current_template_name=None, current_template_version_id=None, current_template_version=None, current_template=None, current_dataset_id=None, current_job_id=None, current_prompt_optimization_job_id=None, api_url=Url('https://api.hp.galileocloud.io/'))

In [14]:
# Create callback handler
prompt_handler = initialize_galileo_evaluator(
    project_name="Chatbot_template_demo",
    scorers=[pq.Scorers.context_adherence_luna, pq.Scorers.correctness, pq.Scorers.toxicity, pq.Scorers.sexist]
)

# Run your chain experiments across multiple inputs with the galileo callback
inputs = [
    "What is AI Studio",
    "How to create projects in AI Studio?"
    "How to monitor experiments?",
    "What are the different workspaces available?",
    "What, exactly, is a workspace?",
    "How to share my experiments with my team?",
    "Can I access my Git repository?",
    "Do I have access to files on my local computer?",
    "How do I access files on the cloud?",
    "Can I invite more people to my team?"
]
chain.batch(inputs, config=dict(callbacks=[prompt_handler]))

# publish the results of your run
prompt_handler.finish()

Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:
rag_nli: Done ✅
cost: Done ✅
toxicity: Done ✅
sexist: Done ✅
pii: Done ✅
protect_status: Done ✅
latency: Done ✅
factuality: Failed ❌, error was: Executing this metric requires credentials for OpenAI, Azure OpenAI or Vertex to be set.
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/4457d62f-0369-4293-a846-3e87aa5f9d53/f801306f-5184-4ee7-b920-2bca13c9fcf2?taskType=12


## Galileo Protect

Galileo Protect serves as a powerful tool for safeguarding AI model outputs by detecting and preventing the release of sensitive information like personal addresses or other PII. By integrating Galileo Protect into your AI pipelines, you can ensure that model responses comply with privacy and security guidelines in real-time.

Galileo functions as an API that provides support for protection verification of your chain/LLM. To log into the Galileo console, it is necessary to integrate it with another service, such as Galileo Evaluate or Galileo Observe.

**Attention**: an integrated API within the Galileo console is required to perform this verification.

In [16]:
project, project_id, stage_id = initialize_galileo_protect('AIStudio_Chatbot_Protect5')

Galileo Protect works by creating rules that identify conditions such as Personally Identifiable Information (PII) and toxicity. It ensures that the prompt will not receive or respond to sensitive questions. In this example, we create a set of rules (ruleset) and a set of actions that return a pre-programmed response if a rule is triggered. Galileo Protect also offers a variety of other metrics to suit different protection needs. You can learn more about the available metrics here: [Supported Metrics and Operators](https://docs.rungalileo.io/galileo/gen-ai-studio-products/galileo-protect/how-to/supported-metrics-and-operators).

Additionally, it is possible to import rulesets directly from Galileo through stages. Learn more about this feature here: [Invoking Rulesets](https://docs.rungalileo.io/galileo/gen-ai-studio-products/galileo-protect/how-to/invoking-rulesets).


In [17]:
# Define a ruleset for PII detection (specifically SSN)
pii_ruleset = Ruleset(
    # Define the rules to check for potential issues
    rules=[
        {
            "metric": "pii",  # Using Personal Identifiable Information metric
            "operator": "contains",  # Check if PII contains specific type
            "target_value": "ssn",  # Looking for Social Security Numbers
        },
    ],
    # Define the action to take when rules are triggered
    action={
        "type": "OVERRIDE",  # Override the model response
        "choices": [
            "Personal Identifiable Information detected in the model output. Sorry, I cannot answer that question."
        ],
    }
)

# Create the protect tool with the ruleset
protect_tool = ProtectTool(stage_id=stage_id, prioritized_rulesets=[pii_ruleset], timeout=10)

# Create a protect parser for our chain
protect_parser = ProtectParser(chain=chain)

# Combine the protect tool with the parser to create a protected chain
protected_chain = protect_tool | protect_parser.parser

# Test the protected chain with a sample containing PII
protected_chain.invoke({"input": "What's my SSN? Hint: my SSN is 123-45-6789", "output": "Your SSN is 123-45-6789"})

'Personal Identifiable Information detected in the model output. Sorry, I cannot answer that question.'

## Galileo Observe

Galileo Observe helps you monitor your generative AI applications in production. With Observe you will understand how your users are using your application and identify where things are going wrong. Keep tabs on your production system, instantly receive alerts when bad things happen, and perform deep root cause analysis though the Observe dashboard.

You can connect Galileo Observe to your Langchain chain to monitor metrics such as cost and guardrail indicators.

In [18]:
monitor_handler = initialize_galileo_observer(project_name="AIStudio_Chatbot_Observe")

example_query = "What is AI Studio?"

output = chain.invoke(
    example_query,
    config=dict(callbacks=[monitor_handler])
)

output

👋 You have logged into 🔭 Galileo (https://console.hp.galileocloud.io/) as muhammed.turhan@hp.com.


' Answer: AI Studio is a virtual assistant for a data science platform called Z by HP AI Studio. It is a standalone application specifically developed for data scientists and engineers to connect to multiple data-stores across local and cloud networks, so they can access the correct data and packages, wherever they are. AI Studio lets users perform all their computation locally without interruption to manage development, data, and model environments.'

### Galileo Protect + Evaluate

Here, we combined Galileo Protect with Galileo Evaluate and ran a batch of sensitive questions to see Galileo Protect in action.


In [19]:
prompt_handler = initialize_galileo_evaluator(
    project_name="test_galileo_evaluate",
    scorers=[pq.Scorers.context_adherence_luna, pq.Scorers.correctness, pq.Scorers.toxicity, pq.Scorers.sexist, pq.Scorers.pii]
)

# List of inputs with different variants of SSN queries
inputs = [
    {"input": "What's my SSN? Hint: my SSN is 123-45-6789", "output": "Your SSN is 123-45-6789"},
    {"input": "Can you confirm my SSN is 987-65-4321?", "output": "Your SSN is 987-65-4321"},
    {"input": "I'm trying to remember my SSN, I think it's 111-22-3333", "output": "Your SSN is 111-22-3333"},
    {"input": "Is 222-33-4444 my SSN?", "output": "Yes, that is your SSN."},
    {"input": "Could you look up my SSN: 555-66-7777?", "output": "Your SSN is 555-66-7777"},
]

# Running the batch chain with GalileoPromptCallback
protected_chain.batch(inputs, config=dict(callbacks=[prompt_handler]))

# Finalizing and publishing the results
prompt_handler.finish()

Processing chain run...:   0%|          | 0/5 [00:00<?, ?it/s]

Initial job complete, executing scorers asynchronously. Current status:
rag_nli: Done ✅
rouge: Done ✅
cost: Done ✅
bleu: Done ✅
toxicity: Done ✅
sexist: Done ✅
pii: Done ✅
protect_status: Done ✅
latency: Done ✅
factuality: Failed ❌, error was: Executing this metric requires credentials for OpenAI, Azure OpenAI or Vertex to be set.
🔭 View your prompt run on the Galileo console at: https://console.hp.galileocloud.io/prompt/chains/d2657078-7403-47ff-99b9-ebbb2658ac5c/492b34e5-0b4c-4a92-9085-6b95ca4b58ef?taskType=12


## Model service Galileo Protect + Observe

The code below is a very extense class to create a model that can be logged and registered on MLFlow. The goal of this class is to create a service that packs the whole chain shown above, as well as the connections to Galileo Observe and Protect, so that the user can be able to keep track of the usage of the deployed service, as well as to protect against misuse of the chain. To register the model in MLFlow, three methods are essential:
* **log_model**: Class method to be called to actually deploy the model. It will receive all the relevant artifacts that need to be packed into the model, as well as the signature of the methods of the service to be called
* **load_context**: Method responsible to load all the context (artifacts and relevant information) in the deployed model. In our scenario, we break this into several steps, to allow building all the components of our chain as well as the external connections:
  * **load_config**: Load the configuration and the secrets from external files, including what kind of model should be deployed (OpenAI, Hugging Face or local model)
  * Load environment variables
  * **load_model**: Loads the model according to the given configuration. Value of property *model_source* on config.yaml sets whether the model is:
    * **local**: Uses local llama2-7b model, loaded from S3 as an asset in AI Studio Project
    * **hugging-face-local**: Downloads a deep-seek with 1.5B parameters and performs the inference locally
    * **hugging-face-cloud**: Uses Hugging Face API to access a Mistral 7b model
  * **load_vector_database**: Loads the given documents into the vector database
  * **load_prompt**: Defines the default prompt that will be used by the chatbot
  * **load_chain**: Builds the overall RAG chain
  * **protect_chain**: Connects the chain to Galileo Protect
  * Setup of Galileo Observe callback
  * Setup of memory as empty
* **predict** Method of the deployed model that is responds to the REST calls to the model. Our implemented service goes beyond merely performing a static RAG, it also allows the user to configure prompt and document base through calls to the service. This is provided by the following methods:
  * **get_prompt**: Allows the user to get the current prompt template of the chain
  * **set_prompt**: Allows the user to change the prompt template of the chain
  * **add_pdf**: Allows the user to increment the knowledge base of the retrieval by adding new PDF documents into the Vector Database
  * **reset_history**: Resets the history of previous conversations
  * **inference**: Performs the actual inference with the built chain
  

In [20]:
def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([doc.page_content for doc in docs if isinstance(doc.page_content, str)])

class AIStudioChatbotService(PythonModel):

    def load_config(self, context):
        self.docs_path = context.artifacts["docs"]
        self.model_config = {
            "galileo_key": secrets.get("GALILEO_API_KEY", ""),
            "hf_key": secrets.get("HUGGINGFACE_API_KEY", ""),
            "galileo_url": config.get("galileo_url", "https://console.hp.galileocloud.io/"),
            "proxy": config.get("proxy", None),
            "model_source": config.get("model_source", "local"),
            "observe_project": "Deployed_Chatbot_Observations",
            "protect_project": "Deployed_Chatbot_Protection",
            "local_model_path": "/home/jovyan/datafabric/llama2-7b/ggml-model-f16-Q5_K_M.gguf"
        }
    
    def load_local_hf_model(self, context):
        model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForCausalLM.from_pretrained(model_id)
        pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100, device=0)
        self.llm = HuggingFacePipeline(pipeline=pipe)        
        print("Using the local Deep Seek model downloaded from HuggingFace.")

    def load_cloud_hf_model(self, context):   
        self.llm = HuggingFaceEndpoint(
            huggingfacehub_api_token=self.model_config["hf_key"],
            repo_id="mistralai/Mistral-7B-Instruct-v0.2",
        )     
        print("Using the cloud Mistral model on HuggingFace.")
    
    def load_local_model(self, context):
        print("[INFO] Initializing local LlamaCpp model.")
        self.callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
        self.llm = LlamaCpp(
            model_path=context.artifacts["models"],
            n_gpu_layers=30,
            n_batch=512,
            n_ctx=4096,
            max_tokens=1024,
            f16_kv=True,
            callback_manager=self.callback_manager,
            verbose=False,
            stop=[],
            streaming=False,
            temperature=0.2,
        )
        print("Using the local LlamaCpp model.")

    def load_model(self, context):
        if self.model_config["model_source"] == "local":
            self.load_local_model(context)
        elif self.model_config["model_source"] == "hugging-face-local":
            self.load_local_hf_model(context)
        elif self.model_config["model_source"] == "hugging-face-cloud":
            self.load_cloud_hf_model(context)
        else:
            print("Incorrect source informed for the model")

    def load_vector_database(self):
        pdf_path = os.path.join(self.docs_path, "AIStudioDoc.pdf")
        if not os.path.exists(pdf_path):
            raise FileNotFoundError(f"The file 'AIStudioDoc.pdf' was not found at: {pdf_path}")
        print(f"Reading and processing the PDF file: {pdf_path}")

        pdf_loader = PyPDFLoader(pdf_path)
        pdf_data = pdf_loader.load()
        for doc in pdf_data:
            if not isinstance(doc.page_content, str):
                doc.page_content = str(doc.page_content)

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
        splits = text_splitter.split_documents(pdf_data)
        print(f"PDF split into {len(splits)} parts.")

    
        self.embedding = HuggingFaceEmbeddings()
        self.vectordb = Chroma.from_documents(documents=splits, embedding=self.embedding)
        self.retriever = self.vectordb.as_retriever()
        print("Vector database created successfully.")

    def load_prompt(self):
        self.prompt_str = """You are a virtual assistant for a Data Science platform called AI Studio. Answer the question based on the following context:
            {context}
            Question: {input}
            """
        self.prompt = ChatPromptTemplate.from_template(self.prompt_str)

    def load_chain(self):
        input_normalizer = RunnableLambda(lambda x: {"input": x} if isinstance(x, str) else x)
        retriever_runnable = RunnableLambda(lambda x: self.retriever.get_relevant_documents(x["input"]))
        format_docs_r = RunnableLambda(format_docs)
        extract_input = RunnableLambda(lambda x: x["input"])

        self.chain = (
            input_normalizer
            | RunnableMap({
                "context": retriever_runnable | format_docs_r,
                "input": extract_input
            })
            | self.prompt
            | self.llm
            | StrOutputParser()
        )

    def protect_chain(self):
        # Set up Galileo Protect
        project = gp.create_project(self.model_config["protect_project"])
        project_id = project.id
        print(f"Project created in Galileo Protect. Project ID: {project_id}")

        stage = gp.create_stage(name=f"{self.model_config['protect_project']}_stage1", project_id=project_id)
        stage_id = stage.id
        print(f"Stage created in Galileo Protect. Stage ID: {stage_id}")

        ruleset = Ruleset(
            rules=[
                {
                    "metric": "pii",
                    "operator": "contains",
                    "target_value": "ssn",
                },
            ],
            action={
                "type": "OVERRIDE",
                "choices": [
                    "Personal Identifiable Information detected in the model output. Sorry, I cannot answer that question."
                ],
            }
        )
        protect_tool = ProtectTool(stage_id=stage_id, prioritized_rulesets=[ruleset], timeout=10)
        protect_parser = ProtectParser(chain=self.chain)
        self.protected_chain = protect_tool | protect_parser.parser

    def load_context(self, context):
        self.load_config(context)
        if self.model_config["proxy"] is not None:
            os.environ["HTTPS_PROXY"] = self.model_config["proxy"]
        os.environ["GALILEO_API_KEY"] = self.model_config["galileo_key"]
        os.environ["GALILEO_CONSOLE_URL"] = self.model_config["galileo_url"]

        self.load_model(context)
        self.load_vector_database()
        self.load_prompt()
        self.load_chain()
        self.protect_chain()
   
        self.monitor_handler = GalileoObserveCallback(project_name=self.model_config["observe_project"])
        print("Embeddings, vector database, LLM, Galileo Protect and Observer models successfully configured.")

        self.memory = []

    def add_pdf(self, base64_pdf):
        pdf_bytes = base64.b64decode(base64_pdf)
        temp_pdf_path = f"/tmp/{uuid.uuid4()}.pdf"
        with open(temp_pdf_path, "wb") as f:
            f.write(pdf_bytes)

        pdf_loader = PyPDFLoader(temp_pdf_path)
        pdf_data = pdf_loader.load()
        for doc in pdf_data:
            if not isinstance(doc.page_content, str):
                doc.page_content = str(doc.page_content)

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
        new_splits = text_splitter.split_documents(pdf_data)

        embedding = HuggingFaceEmbeddings()
        vectordb = Chroma.from_documents(documents=new_splits, embedding=embedding)
        self.retriever = vectordb.as_retriever()

        return {
            "chunks": [],
            "history": [],
            "prompt": self.prompt_str,
            "output": "",
            "success": True
        }

    def get_prompt_template(self):
        return {
            "chunks": [],
            "history": [],
            "prompt": self.prompt_str,
            "output": "",
            "success": True
        }

    def set_prompt_template(self, new_prompt):
        self.prompt_str = new_prompt
        self.prompt = ChatPromptTemplate.from_template(self.prompt_str)
        return {
            "chunks": [],
            "history": [],
            "prompt": self.prompt_str,
            "output": "",
            "success": True
        }

    def reset_history(self):
        self.memory = []
        return {
            "chunks": [],
            "history": [],
            "prompt": self.prompt_str,
            "output": "",
            "success": True
        }

    def inference(self, context, user_query):
        response = self.protected_chain.invoke(
            {"input": user_query, "output": ""},
            config=dict(callbacks=[self.monitor_handler])
        )
        relevant_docs = self.retriever.get_relevant_documents(user_query)
        chunks = [doc.page_content for doc in relevant_docs]
        self.memory.append({"role": "User", "content": user_query})
        self.memory.append({"role": "Assistant", "content": response})

        return {
            "chunks": chunks,
            "history": [f"<{m['role']}> {m['content']}\n" for m in self.memory],
            "prompt": self.prompt_str,
            "output": response,
            "success": True
        }

    def predict(self, context, model_input, params):
        if params.get("add_pdf", False):
            result = self.add_pdf(model_input['document'][0])
        elif params.get("get_prompt", False):
            result = self.get_prompt_template()
        elif params.get("set_prompt", False):
            result = self.set_prompt_template(model_input['prompt'][0])
        elif params.get("reset_history", False):
            result = self.reset_history() 
        else:
            result = self.inference(context, model_input['query'][0])

        return pd.DataFrame([result])

    @classmethod
    def log_model(cls, secrets_path, config_path, docs_path, model_folder=None, demo_folder="../demo"):
        if demo_folder and not os.path.exists(demo_folder):
            os.makedirs(demo_folder, exist_ok=True)

        input_schema = Schema([
            ColSpec("string", "query"),
            ColSpec("string", "prompt"),
            ColSpec("string", "document")
        ])
        output_schema = Schema([
            ColSpec("string", "chunks"),
            ColSpec("string", "history"),
            ColSpec("string", "prompt"),
            ColSpec("string", "output"),
            ColSpec("boolean", "success")
        ])
        param_schema = ParamSchema([
            ParamSpec("add_pdf", "boolean", False),
            ParamSpec("get_prompt", "boolean", False),
            ParamSpec("set_prompt", "boolean", False),
            ParamSpec("reset_history", "boolean", False)
        ])
        signature = ModelSignature(inputs=input_schema, outputs=output_schema, params=param_schema)

        artifacts = {"secrets": secrets_path, "config": config_path, "docs": docs_path, "demo": demo_folder}
        if model_folder:
            artifacts["models"] = model_folder

        mlflow.pyfunc.log_model(
            artifact_path="aistudio_chatbot_service",
            python_model=cls(),
            artifacts=artifacts,
            signature=signature,
            pip_requirements=[
                "PyPDF",
                "pyyaml",
                "tokenizers==0.20.3",
                "httpx==0.27.2",
            ]
        )
        print("Model and artifacts successfully registered in MLflow.")



In [21]:
print("Initializing experiment in MLflow.")
mlflow.set_experiment("AIStudioChatbot_Service")

model_folder = "/home/jovyan/datafabric/llama2-7b/ggml-model-f16-Q5_K_M.gguf"  
demo_folder = "../demo"   

# Ensure the demo folder exists before logging model
if demo_folder and not os.path.exists(demo_folder):
    os.makedirs(demo_folder, exist_ok=True)

with mlflow.start_run(run_name="AIStudioChatbot_Service_Run") as run:
    AIStudioChatbotService.log_model(
        secrets_path=secrets_path,
        config_path=config_path,
        docs_path=data_path,
        demo_folder=demo_folder,
        model_folder=model_folder
    )
    model_uri = f"runs:/{run.info.run_id}/aistudio_chatbot_service"
    mlflow.register_model(
        model_uri=model_uri,
        name="Chatbot-hf-cloud",
    )
    print(f"Registered model with execution ID: {run.info.run_id}")
    print(f"Model registered successfully. Run ID: {run.info.run_id}")

Initializing experiment in MLflow.




Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/46 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

 - tokenizers (current: 0.20.1, required: tokenizers==0.20.3)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.


Model and artifacts successfully registered in MLflow.


Registered model 'Chatbot-hf-cloud' already exists. Creating a new version of this model...
Created version '5' of model 'Chatbot-hf-cloud'.


Registered model with execution ID: e52e71cfe6a8435fa2e11bb88ea72b3e
Model registered successfully. Run ID: e52e71cfe6a8435fa2e11bb88ea72b3e
