# Inference LLMs RAG, Interface chatbot

In [1]:
!pip install -qU \
    langchain==0.0.354\
    openai==1.6.1\
    pinecone-client==3.1.0\
    tiktoken==0.5.2

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nemoguardrails 0.16.0 requires langchain<0.4.0,>=0.2.14, but you have langchain 0.0.354 which is incompatible.
nemoguardrails 0.16.0 requires langchain-community<0.4.0,>=0.2.5, but you have langchain-community 0.0.20 which is incompatible.
nemoguardrails 0.16.0 requires langchain-core<0.4.0,>=0.2.14, but you have langchain-core 0.1.23 which is incompatible.
pinecone-plugin-assistant 1.8.0 requires packaging<25.0,>=24.2, but you have packaging 23.2 which is incompatible.
langchain-pinecone 0.2.12 requires langchain-core<1.0.0,>=0.3.34, but you have langchain-core 0.1.23 which is incompatible.
langchain-groq 0.3.8 requires langchain-core<1.0.0,>=0.3.75, but you have langchain-core 0.1.23 which is incompatible.
langchain-openai 0.3.33 requires langchain-core<1.0.0,>=0.3.76, but you have langchain-core 0.1.23 whi

In [2]:
!pip install -qU langchain-huggingface
!pip install -qU langchain-groq

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nemoguardrails 0.16.0 requires langchain<0.4.0,>=0.2.14, but you have langchain 0.0.354 which is incompatible.
nemoguardrails 0.16.0 requires langchain-community<0.4.0,>=0.2.5, but you have langchain-community 0.0.20 which is incompatible.
langchain 0.0.354 requires langchain-core<0.2,>=0.1.5, but you have langchain-core 0.3.76 which is incompatible.
langchain 0.0.354 requires langsmith<0.1.0,>=0.0.77, but you have langsmith 0.4.30 which is incompatible.
langchain-openai 0.3.33 requires openai<2.0.0,>=1.104.2, but you have openai 1.6.1 which is incompatible.
langchain-openai 0.3.33 requires tiktoken<1,>=0.7, but you have tiktoken 0.5.2 which is incompatible.
langchain-community 0.0.20 requires langchain-core<0.2,>=0.1.21, but you have langchain-core 0.3.76 which is incompatible.
langchain-community 0.0.20 req

In [3]:
!pip install -qU nemoguardrails

In [4]:
!pip install -qU "langchain[groq]"

<hr>

In [5]:
from google.colab import userdata
groq_api = userdata.get('GROQ_API_USPLACES')
# groq_api

In [6]:
import getpass
import os
os.environ['GROQ_API_KEY'] = groq_api.strip()

In [7]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq

In [None]:
# chat = ChatGroq(
#     temperature = 0,
#     model_name = "qwen-qwq32b"
# )

In [None]:
chat = ChatGroq(
    temperature = 0,
    model_name = "llama-3.3-70b-versatile"
)

In [9]:
system = "You are a helpful assistant."
human  = "{text}"
prompt = ChatPromptTemplate.from_messages([("system",system),("human",human)])

chain  = prompt | chat
chain.invoke({
    "text" : "Explain the importance of low latency LLMs."
})

AIMessage(content="Low latency Large Language Models (LLMs) are crucial for various applications, particularly those that require real-time or near-real-time interactions. Latency refers to the time it takes for a model to process input and generate output. Here are some reasons why low latency LLMs are important:\n\n1. **Improved User Experience**: Low latency LLMs enable faster response times, which is essential for applications like chatbots, virtual assistants, and conversational AI. Users expect quick and timely responses, and high latency can lead to frustration and abandonment.\n2. **Real-time Applications**: Low latency LLMs are necessary for real-time applications, such as:\n\t* Live language translation\n\t* Real-time text summarization\n\t* Live chat support\n\t* Virtual event moderation\n3. **Edge Computing**: With the increasing adoption of edge computing, low latency LLMs can be deployed on edge devices, reducing the need for cloud connectivity and enabling faster process

In [10]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

In [11]:
messages = [
    SystemMessage(content = "You are a helpful assistant."),
    HumanMessage(content  = "Hi AI, how are you today?"),
    AIMessage(content = "I'm great thank you. How can I help you?"),
    HumanMessage(content = "I'd like to understand string theory.")     # prompt engineering AI --> human
]

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2") #Embedding text ---> vector
# We have to use the same embedding with the one that We create Vector database
embed_model = embeddings

In [13]:
from google.colab import userdata
pinecone_api = userdata.get('PINECONE_API_KEY')

In [14]:
from pinecone import Pinecone

In [15]:
pc = Pinecone(api_key = pinecone_api)
index = pc.Index("us-places-ragv99")

In [16]:
def format_rag_results(documents):
    """
    Format RAG results into a clean, structured representation with preserved source URLs.
    Args:
        documents : List of Document objects from langchain with metadata and page_content
    Returns:
        str : A formatted string with the structured information
    """
    formatted_results = []
    for i, doc in enumerate(documents):
        # Extract metadata
        metadata = doc.metadata
        content = doc.page_content
        # Format each document
        doc_formatted = f"# Document {i+1} : {metadata.get('name','Unnamed Document')}/n/n"
        # Basic information section
        doc_formatted += "## Basic Information\n"
        for key in ['name','location','categories']:
            if key in metadata:
                value = metadata[key]
                if isinstance(value, list):
                    value = ", ".join(value)
                doc_formatted += f"- **{key.title()}**: {value}\n"
        if 'rating' in metadata:
            doc_formatted += f"- **Rating**: {metadata['rating']}/5\n"
        # Always include the source URL if available
        if 'url' in metadata:
            doc_formatted += f"- **Source URL**: {metadata['url']}\n"
        # Content section
        if content:
            doc_formatted += f"\n## Description\n{content}\n\n"
        # Reviews section if available
        if 'reviews' in metadata and metadata['reviews']:
            doc_formatted += "## Reviews\n"
            for j, review in enumerate(metadata['reviews']):
                # Try to extract a title if the review has a comma
                if ',' in review:
                    parts = review.split(",",1)
                    title = parts[0].strip()
                    review_text = parts[1].strip()
                    doc_formatted += f"### {title}\n{review_text}\n\n"
                else:
                    doc_formatted += f"### Review {j+1}\n{review}\n\n"
        formatted_results.append(doc_formatted)
    # Combine all formatted documents
    return "\n".join(formatted_results)

# Example usage:
# formatted_output = format_rag_results(document)
# print(formatted_output)


In [17]:
from langchain_huggingface import HuggingFaceEmbeddings
from pinecone import Pinecone
from google.colab import userdata
pinecone_key = pinecone_api

In [18]:
# Your existing code to set up embeddings and Pinecone
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
embed_model = embeddings
pc = Pinecone(api_key = pinecone_api)    # pc = Pinecone(api_key=pinecone_key)
index = pc.Index("us-places-ragv99")     # index = pc.Index("us-places-rag")
text_field = "text"

In [19]:
!pip install -U langchain-pinecone

Collecting openai<2.0.0,>=1.104.2 (from langchain-openai>=0.3.11->langchain-pinecone)
  Using cached openai-1.108.2-py3-none-any.whl.metadata (29 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai>=0.3.11->langchain-pinecone)
  Using cached tiktoken-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting packaging>=23.2 (from langchain-core<1.0.0,>=0.3.34->langchain-pinecone)
  Using cached packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Using cached openai-1.108.2-py3-none-any.whl (948 kB)
Using cached packaging-24.2-py3-none-any.whl (65 kB)
Using cached tiktoken-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
Installing collected packages: packaging, tiktoken, openai
  Attempting uninstall: packaging
    Found existing installation: packaging 23.2
    Uninstalling packaging-23.2:
      Successfully uninstalled packaging-23.2
  Attempting uninstall: tiktoken
    Found existing installation: tiktoken 0.5.2
    Unin

In [20]:
from pinecone import Pinecone
from langchain_pinecone import PineconeVectorStore


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  from langchain_pinecone.vectorstores import Pinecone, PineconeVectorStore


In [21]:
# vectorstore = Pinecone(index, embed_model.embed_query, text_field)
vectorstore = PineconeVectorStore(
    index     = index,
    embedding = embed_model,
    text_key  = "text"
)

In [22]:
def augment_prompt(query:str):
    results = vectorstore.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augmented_prompt = f"""Using the contexts below, answer the query.
    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt, results     # Return the results as well

In [23]:
# New function to display formatted results alongs with the RAG response
def display_rag_results(query):
    augmented_prompt, documents = augment_prompt(query)
    # Get the LLM response
    prompt = HumanMessage(content=augmented_prompt)
    messages.append(prompt)
    res = chat(messages)
    # Format the retrieved documents
    formatted_docs = format_rag_results(documents)
    print("LLM RESPONSE:")
    print(res.content)
    print("\nRETRIEVED SOURCES:")
    print(formatted_docs)
    return res.content, formatted_docs

In [24]:
# Example usage
query = "What is Vizcaya Museum and Gardens?"
response, formatted_sources = display_rag_results(query)

  res = chat(messages)


LLM RESPONSE:
Based on the provided contexts, it appears that Vizcaya Museum and Gardens is a lavish villa built in 1916 as a winter retreat, inspired by the Italian Renaissance. The museum features original furnishings and artwork, and is surrounded by lush, formal gardens.

RETRIEVED SOURCES:
# Document 1 : Vizcaya Museum and Gardens/n/n## Basic Information
- **Name**: Vizcaya Museum and Gardens
- **Location**: Miami
- **Categories**: 
- **Rating**: 4.5/5
- **Source URL**: https://www.tripadvisor.com/Attraction_Review-g34438-d130345-Reviews-Vizcaya_Museum_and_Gardens-Miami_Florida.html

## Description
Built in 1916 as a winter retreat, this lavish villa is a tribute to the Italian Renaissance. The museum contains much of the original furnishings and artwork, and is surrounded by lush, formal gardens.


# Document 2 : Bellagio Conservatory & Botanical Garden/n/n## Basic Information
- **Name**: Bellagio Conservatory & Botanical Garden
- **Location**: Las Vegas
- **Categories**: 
- **Ra

In [25]:
from langchain.vectorstores import Pinecone

In [26]:
text_field = "text"      # the metadata field that contains our text
# initialize the vector store object
vectorstore = PineconeVectorStore(
    index     = index,
    embedding = embed_model,
    text_key  = "text"
)

In [27]:
query = "What is Vizcaya Museum and Gardens?"
vectorstore.similarity_search(query, k=1)

[Document(id='Miami_Vizcaya_Museum_and_Gardens_0', metadata={'categories': [], 'location': 'Miami', 'name': 'Vizcaya Museum and Gardens', 'rating': 4.5, 'reviews': [], 'url': 'https://www.tripadvisor.com/Attraction_Review-g34438-d130345-Reviews-Vizcaya_Museum_and_Gardens-Miami_Florida.html'}, page_content='Built in 1916 as a winter retreat, this lavish villa is a tribute to the Italian Renaissance. The museum contains much of the original furnishings and artwork, and is surrounded by lush, formal gardens.')]

In [28]:
import re

In [29]:
def format_rag_results(documents):
    """
    Format RAG results into a clean, structured representation with preserved source URLs.
    """
    formatted_results = []
    for i, doc in enumerate(documents):
        # Extract metadata
        metadata = doc.metadata
        content = doc.page_content
        # Format each document
        doc_formatted = f"# Document {i+1} : {metadata.get('name','Unnamed Document')}\\n\\n"
        # Basic information section
        doc_formatted += "## Basic Information\n"
        for key in ['name', 'location','categories']:
            if key in metadata:
                value = metadata[key]
                if isinstance(value, list):
                    value = ", ".join(value)
                doc_formatted += f"- **{key.title()}**: {value}\n"
        if 'rating' in metadata:
            doc_formatted += f"- **Rating**: {metadata['rating']}/5\n"
        # Always include the source URL if available
        if 'url' in metadata:
            doc_formatted += f"- **Source URL**: {metadata['url']}\n"
        # Content section
        if content:
            doc_formatted += f"\n## Description\n{content}\n\n"
        # Reviews section if available
        if 'reviews' in metadata and metadata['reviews']:
            doc_formatted += "## Reviews\n"
            for j, review in enumerate(metadata['reviews']):
                # Try to extract a title if the review has a comma
                if ',' in review:
                    parts = review.split(",",1)
                    title = parts[0].strip()
                    review_text = parts[1].strip()
                    doc_formatted += f"### {title}\n{review_text}\n\n"
                else:
                    doc_formatted += f"### Review {j+1}\n{review}\n\n"
        formatted_results.append(doc_formatted)
    # Combine all formatted documents
    return "\\n".join(formatted_results)

def augment_prompt(query:str):
    results = vectorstore.similarity_search(query,k=3)
    source_knowledge = "\\n".join([x.page_content for x in results])
    augmented_prompt_text = f"""Using the contexts below, answer the query. If the context does not contain the answer, say that the provided context does not have the information.

    Contexts:
    {source_knowledge}

    Query: {query}
    """
    return augmented_prompt_text, results


# --- Main Interaction Function ---
messages_history = [
    SystemMessage(content="You are a helpful assistant that answers questions based on the provided context. Do not include any of your own thinking process or notes (e.g., <think>...</think> tags) in the final answer. If the context does not contain the answer, state that the provided context does not have the informat.")
]

def remove_thinking_block(text:str) -> str:
    """Removes the <think>...</think> block from the text."""
    return re.sub(r"<think>.*?</think>\s*", "", text, flags=re.DOTALL).strip()

def get_groq_response_with_references(user_query:str):
    global messages_history
    augmented_prompt_text, retrieved_documents = augment_prompt(user_query)
    prompt = HumanMessage(content=augmented_prompt_text)
    # Use a copy of the history for the current call, and add the new user prompt
    current_conversation_messages = messages_history + [prompt]
    try:
        llm_response_obj = chat.invoke(current_conversation_messages)
        llm_content      = llm_response_obj.content
    except Exception as e:
        print(f"Error calling Groq API: {e}")
        return "Sorry, I encountered an error trying to get a response.", ""

    # Remove the <think> block from the LLM's content
    cleaned_llm_content = remove_thinking_block(llm_content)

    # Update history with the actual user query and the cleaned LLM response for context in future turns
    # For simplicity, we're just adding the last user query and response.
    # A more sophiticated hitory management might be needed for longer conversations.
    messages_history.append(HumanMessage(content=user_query))          # Add user's original query
    messages_history.append(AIMessage(content=cleaned_llm_content))    # Add cleaned AI response
    # Keep history to a manageable size (e.g., last N interactions)
    if len(messages_history) > 7: # System message + 3 paires of Human/AI
        messages_history = [messages_history[0]] + messages_history[-6:]
    formatted_sources = format_rag_results(retrieved_documents)

    print("--- LLM RESPONSE ---")
    print(cleaned_llm_content)
    print("\\n--- RETRIEVED SOURCES (REFERENCES) ---")
    print(formatted_sources)

    return cleaned_llm_content, formatted_sources

# --- Example Usage ---
if __name__ == "__main__":
    # query1 = "What is Vizcaya Museum and Gardens?"
    # print(f"Processing query: {query1}")
    # response1, sources1 = get_groq_response_with_references(query1)
    # print("\\n" + "="*50 + "\\n")

    query2 = "Tell me about the best vegetarian restaurants in Los Angeles with good reviews."
    print(f"Processing query: {query2}")
    response2, sources2 = get_groq_response_with_references(query2)
    print("\\n" + "="*50 + "\\n")

    # query3 = "Any recommendations for art museums in Boston?"
    # print(f"Processing query: {query3}")
    # response3, sources3 = get_groq_response_with_references(query3)

Processing query: Tell me about the best vegetarian restaurants in Los Angeles with good reviews.
--- LLM RESPONSE ---
The provided context does not have the information.
\n--- RETRIEVED SOURCES (REFERENCES) ---
# Document 1 : The Original Farmers Market\n\n## Basic Information
- **Name**: The Original Farmers Market
- **Location**: Los Angeles
- **Categories**: 
- **Rating**: 4.5/5
- **Source URL**: https://www.tripadvisor.com/Attraction_Review-g32655-d219483-Reviews-The_Original_Farmers_Market-Los_Angeles_California.html

## Description
The Original Farmers Market has been Los Angeles’ favorite destination since 1934. With more than 100 old-world grocers, an eclectic array of shops and dozens of restaurants serving cuisine from around the globe in an al fresco setting, the Market is a beloved town square for locals and one of the city’s top tourist attractions. Known for its variety of cuisine, the Market offers something for every taste - American, Brazilian, Cajun, Chinese, French,

In [30]:

def remove_thinking_block(text: str) -> str:
    """Removes the <think>...</think> block from the text."""
    return re.sub(r"<think>.*?</think>\s*", "", text, flags=re.DOTALL).strip()

def format_rag_results(documents):
    """
    Format RAG results into a clean, structured representation with preserved source URLs.
    """
    formatted_results = []
    for i, doc in enumerate(documents):
        metadata = doc.metadata
        content = doc.page_content
        doc_formatted = f"# Document {i+1}: {metadata.get('name', 'Unnamed Document')}\\n\\n"
        doc_formatted += "## Basic Information\\n"
        for key in ['name', 'location', 'categories']:
            if key in metadata:
                value = metadata[key]
                if isinstance(value, list):
                    value = ", ".join(value)
                doc_formatted += f"- **{key.title()}**: {value}\\n"
        if 'rating' in metadata:
            doc_formatted += f"- **Rating**: {metadata['rating']}/5\\n"
        if 'url' in metadata:
            doc_formatted += f"- **Source URL**: {metadata['url']}\\n"
        if content:
            doc_formatted += f"\\n## Description\\n{content}\\n\\n"
        if 'reviews' in metadata and metadata['reviews']:
            doc_formatted += "## Reviews\\n"
            for j, review in enumerate(metadata['reviews']):
                if ',' in review: # Simple check if review might have a title-like part
                    parts = review.split(',', 1)
                    title = parts[0].strip()
                    review_text = parts[1].strip()
                    doc_formatted += f"### {title}\\n{review_text}\\n\\n"
                else:
                    doc_formatted += f"### Review {j+1}\\n{review}\\n\\n"
        formatted_results.append(doc_formatted)
    return "\\n".join(formatted_results)

# --- RAG Functions ---

def augment_prompt(query: str):
    results = vectorstore.similarity_search(query, k=3)
    source_knowledge = "\\n".join([x.page_content for x in results])
    augmented_prompt_text = f"""Using the contexts below, answer the query. If the context does not contain the answer, say that the provided context does not have the information.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt_text, results

# Global message history for RAG (can be adjusted)
rag_messages_history = [
    SystemMessage(content="You are a helpful assistant that answers questions based on the provided context. Do not include any of your own thinking process or notes (e.g., <think>...</think> tags) in the final answer. If the context does not contain the answer, state that the provided context does not have the information.")
]

def get_llm_response_with_rag(user_query: str):
    global rag_messages_history
    augmented_prompt_text, retrieved_documents = augment_prompt(user_query)
    prompt = HumanMessage(content=augmented_prompt_text)
    current_conversation_messages = rag_messages_history + [prompt]

    try:
        llm_response_obj = chat.invoke(current_conversation_messages)
        llm_content = llm_response_obj.content
    except Exception as e:
        print(f"Error calling Groq API (with RAG): {e}")
        return "Sorry, I encountered an error trying to get a RAG response.", ""

    cleaned_llm_content = remove_thinking_block(llm_content)

    # Update RAG history (optional, for conversational context)
    # rag_messages_history.append(HumanMessage(content=user_query)) # Using augmented prompt might be too verbose for history
    # rag_messages_history.append(AIMessage(content=cleaned_llm_content))
    # if len(rag_messages_history) > 7:
    #     rag_messages_history = [rag_messages_history[0]] + rag_messages_history[-6:]

    formatted_sources = format_rag_results(retrieved_documents)
    return cleaned_llm_content, formatted_sources

# --- Non-RAG Function ---
non_rag_messages_history = [
    SystemMessage(content="You are a helpful assistant. Answer the user's query based on your general knowledge. Do not include any of your own thinking process or notes (e.g., <think>...</think> tags) in the final answer.")
]

def get_llm_response_without_rag(user_query: str):
    global non_rag_messages_history
    prompt = HumanMessage(content=user_query)
    current_conversation_messages = non_rag_messages_history + [prompt]

    try:
        llm_response_obj = chat.invoke(current_conversation_messages)
        llm_content = llm_response_obj.content
    except Exception as e:
        print(f"Error calling Groq API (without RAG): {e}")
        return "Sorry, I encountered an error trying to get a non-RAG response."

    cleaned_llm_content = remove_thinking_block(llm_content)

    # Update Non-RAG history (optional)
    # non_rag_messages_history.append(prompt)
    # non_rag_messages_history.append(AIMessage(content=cleaned_llm_content))
    # if len(non_rag_messages_history) > 7:
    #    non_rag_messages_history = [non_rag_messages_history[0]] + non_rag_messages_history[-6:]

    return cleaned_llm_content

# --- Main Execution for Comparison ---
if __name__ == "__main__":
    queries_to_test = [
        "What is Vizcaya Museum and Gardens?",
        "Tell me about the best vegetarian restaurants in Los Angeles with good reviews.",
        "Any recommendations for art museums in Boston?"
    ]

    for query in queries_to_test:
        print(f"Processing query: \"{query}\"")
        print("-" * 70)

        # 1. Get response WITHOUT RAG
        print("--- LLM RESPONSE (Without RAG) ---")
        response_no_rag = get_llm_response_without_rag(query)
        print(response_no_rag)
        print("\\n" + "~"*70 + "\\n")

        # 2. Get response WITH RAG
        print("--- LLM RESPONSE (With RAG) ---")
        response_with_rag, sources = get_llm_response_with_rag(query)
        print(response_with_rag)
        print("\\n--- RETRIEVED SOURCES (REFERENCES FOR RAG) ---")
        print(sources)
        print("\\n" + "="*70 + "\\n")

Processing query: "What is Vizcaya Museum and Gardens?"
----------------------------------------------------------------------
--- LLM RESPONSE (Without RAG) ---
Vizcaya Museum and Gardens is a National Historic Landmark located in Coconut Grove, Miami, Florida. It was built in the early 1900s as a winter residence for James Deering, a wealthy industrialist. The estate features an impressive European-inspired mansion with 54 rooms, surrounded by 50 acres of beautifully landscaped gardens, including a variety of plants, trees, and sculptures.

The mansion was designed by architect F. Burrall Hoffman and decorator Paul Chalfin, and its construction took over 1,000 workers and 4 years to complete. The estate's design was influenced by Mediterranean Revival architecture, with a mix of French, Italian, and Spanish styles.

Vizcaya Museum and Gardens is now owned by Miami-Dade County and is open to the public for tours. The estate showcases an extensive collection of European art and furnish

<hr>

### Colang

In [31]:
config_py_content = """
from langchain_groq import ChatGroq
from nemoguardrails.llm.providers import register_llm_provider

# Register Groq as a custom LLM provider
register_llm_provider("groq", ChatGroq)
"""

with open("config.py","w") as f:
    f.write(config_py_content)

print("config.py created successfully.")

config.py created successfully.


- เราสามารถที่จะใช้ Guardrail เป็นตัวที่ส่งข้อมูลไปให้อีกฟังก์ชันได้ด้วย

In [32]:
from langchain.schema import HumanMessage, SystemMessage # Ensure SystemMessage is imported

# Make sure your other necessary functions and variables are defined and accessible:
# - chat: Your initialized ChatGroq model
# - augment_prompt(user_query: str) -> (str, list_of_documents):
#   This function should return the augmented prompt string for the LLM
#   and the list of raw retrieved document objects.
# - remove_thinking_block(text: str) -> str: Your function to clean LLM output.
# - format_rag_results(documents: list_of_documents) -> str:
#   Your existing function that creates a detailed, formatted string of the RAG sources.
#   We will also add a more concise formatting for direct inclusion in the bot response.

def rag_query_action(user_query: str):
    """
    This function is registered as an action with Nemo Guardrails.
    It performs the RAG process, aims for a more detailed answer, and includes sources.
    """
    print(f"\n[Nemo Action LOG] Received query for RAG: {user_query}")

    # 1. Augment the prompt and get retrieved documents
    # This function (augment_prompt) is from your notebook.
    # It constructs the prompt that includes the retrieved context and the original query.
    augmented_prompt_text, retrieved_documents = augment_prompt(user_query)

    # 2. Prepare messages for the LLM call
    # We'll add a SystemMessage to encourage more detailed and synthesized answers.
    # The augmented_prompt_text itself (from your augment_prompt function) already
    # contains the core instruction, context, and query.
    rag_call_messages = [
        SystemMessage(content=(
            "You are a helpful and informative assistant. Based on the provided 'Contexts', "
            "please provide a comprehensive and detailed answer to the 'Query'. "
            "Synthesize the information from all relevant parts of the contexts to create a cohesive response. "
            "If the contexts describe a place or thing that matches the query's subject, even if the exact name isn't repeated in the context, try to provide the relevant information. "
            "If the contexts truly do not contain relevant information to answer the query, clearly state that."
        )),
        HumanMessage(content=augmented_prompt_text) # This contains: "Contexts: ..." and "Query: ..."
    ]

    print(f"[Nemo Action LOG] Augmented prompt text being sent to LLM (first 200 chars): {augmented_prompt_text[:200]}...")

    # 3. Call the LLM
    try:
        llm_response_obj = chat.invoke(rag_call_messages)
        llm_content = llm_response_obj.content
    except Exception as e:
        print(f"[Nemo Action ERROR] Error calling LLM for RAG: {e}")
        # Fallback message if the LLM call fails
        return "My apologies, I encountered an issue while trying to retrieve detailed information. Please try again later."

    # 4. Clean the LLM's response
    cleaned_llm_content = remove_thinking_block(llm_content)

    # 5. Format and include references/sources
    formatted_sources_text = ""
    if retrieved_documents:
        # Option A: Using your existing detailed format_rag_results (if you want a very verbose source list)
        # This might be too long for a single bot utterance.
        # detailed_sources = format_rag_results(retrieved_documents)
        # formatted_sources_text = f"\n\n--- Details from Retrieved Context ---\n{detailed_sources}"

        # Option B: A more concise list of sources for the bot's response
        concise_sources_list = []
        for i, doc in enumerate(retrieved_documents):
            # Prefer 'name' metadata for the source title
            source_title = doc.metadata.get('name', f"Retrieved Context {i+1}")
            # Include URL if available
            source_url = doc.metadata.get('url', '')
            source_entry = f"- {source_title}"
            if source_url:
                source_entry += f" (Source: {source_url})"
            concise_sources_list.append(source_entry)

        if concise_sources_list:
            formatted_sources_text = "\n\n**Here are some sources I used:**\n" + "\n".join(concise_sources_list)
    else:
        formatted_sources_text = "\n\n(No specific documents were retrieved to answer this query.)"

    # 6. Combine the LLM's answer with the formatted sources
    final_response_with_sources = f"{cleaned_llm_content}{formatted_sources_text}"

    print(f"[Nemo Action LOG] Cleaned LLM Content: {cleaned_llm_content}")
    if retrieved_documents:
        print(f"[Nemo Action LOG] Concise Sources: {formatted_sources_text}")
    print(f"[Nemo Action LOG] Final response being returned to Guardrails (first 300 chars): {final_response_with_sources[:300]}...")

    return final_response_with_sources

In [33]:
yaml_content_for_rag = """
models:
- type: main
  engine: groq
  model: llama3-70b-8192 # Or your preferred Groq model
"""

- อันนี้เป็น Gardrails มันจะมีลักษณะเหมือน Prompt engineer

In [34]:
colang_content_for_rag = """
# Define a custom action for RAG queries
# The name "rag_query_action" must match the name used when registering the Python function.

# --- Greetings and Basic Interaction ---
define user express greeting
    "hello"
    "hi"
    "hey there"

define bot express greeting
    "Hello! I'm your US travel assistant. How can I help you today?"

define flow greeting
    user express greeting
    bot express greeting
    bot offer further assistance

# --- RAG-Specific Flows ---
define user ask about_us_place
    # Example intents that should trigger the RAG action
    "What is Vizcaya Museum and Gardens?"
    "Tell me about the Original Farmers Market in Los Angeles."
    "Any recommendations for art museums in Boston?"
    "What can you tell me about {entity} in {location}?"
    "Describe {place_name}."

define flow handle_us_place_query_with_rag
    user ask about_us_place
    bot inform "Let me check my information about that..."
    # Execute the custom Python action for RAG
    $rag_answer = execute rag_query_action(user_query=$last_user_message)
    bot $rag_answer
    bot offer further assistance

# --- Standard Guardrails (e.g., for out-of-scope topics) ---
define user ask politics
    "what are your political beliefs?"
    "thoughts on the president?"
    "left wing"
    "right wing"

define bot refuse politics
    "As a US travel assistant, I focus on providing information about places and travel. I don't discuss politics."

define flow politics
    user ask politics
    bot refuse politics
    bot offer further assistance

# --- Fallback for unhandled queries ---
# You might want this to also try the RAG, or give a "cannot help" message
define flow unhandled_query_fallback
    user ...
    # For this example, let's try the RAG as a general fallback if appropriate
    # Or, you could have a simpler LLM call here if RAG is too resource-intensive for all fallbacks
    bot inform "I'll try to find information on that using my knowledge base."
    $fallback_rag_answer = execute rag_query_action(user_query=$last_user_message)
    bot $fallback_rag_answer
    bot offer further assistance

# --- Reusable bot utterances ---
define bot offer further assistance
    "Is there anything else I can help you with regarding US travel or specific places?"
"""

In [37]:
# from langchain_groq import ChatGroq
# from nemoguardrails.llm.providers import register_llm_provider

# register_llm_provider("groq", ChatGroq)
!pip install -qU nemoguardrails langchain-groq

from langchain_groq import ChatGroq
from nemoguardrails.llm.helpers import get_llm_instance_wrapper
from nemoguardrails.llm.providers import register_llm_provider

# 1) Make a LangChain instance
chat = ChatGroq(model_name="llama-3.3-70b-versatile", temperature=0)

# 2) Wrap it into a provider class Guardrails understands
GroqLCWrapped = get_llm_instance_wrapper(llm_instance=chat, llm_type="groq")

# 3) Register the wrapped class
register_llm_provider("groq", GroqLCWrapped)

# 4) Use engine: groq in YAML
yaml_content_for_rag = """
models:
- type: main
  engine: groq
  model: llama-3.3-70b-versatile
"""


In [38]:
from nemoguardrails import LLMRails, RailsConfig
# Ensure 'chat' (Groq LLM) and RAG helper functions are defined and available in this scope.
# Also, 'config_py_content' should have been written to config.py

# 1. Initialize RailsConfig
# This will automatically look for 'config.py' if it's in the CWD or specified path.
config = RailsConfig.from_content(
    colang_content=colang_content_for_rag,  # Use the new Colang content
    yaml_content=yaml_content_for_rag      # Use the YAML for Groq
)

# 2. Create LLMRails instance
rails = LLMRails(config)

# 3. Register your custom RAG action
# The name "rag_query_action" here MUST match the name used in your Colang file.
rails.register_action(action=rag_query_action, name="rag_query_action")

print("Nemo LLMRails initialized with Groq and custom RAG action ('rag_query_action') registered.")

Nemo LLMRails initialized with Groq and custom RAG action ('rag_query_action') registered.


In [39]:
import asyncio

async def get_response(query):
    response = await rails.generate_async(messages=[{"role": "user", "content": query}])
    return response.get("content") # Or however you access the bot's message

async def main_test():
    # Test a query that should trigger the RAG action
    query1 = "What is Vizcaya Museum and Gardens?"
    print(f"\\nUser Query: {query1}")
    bot_response1 = await get_response(query1)
    print(f"Bot Response: {bot_response1}")

    # Test a query that should trigger the politics guardrail
    query2 = "Is the President of the Thailand a good person?"
    print(f"\\nUser Query: {query2}")
    bot_response2 = await get_response(query2)
    print(f"Bot Response: {bot_response2}")

    # Test a general greeting
    query3 = "Hello"
    print(f"\\nUser Query: {query3}")
    bot_response3 = await get_response(query3)
    print(f"Bot Response: {bot_response3}")

# Run the async test
# If you are in a Jupyter Notebook, you might need to run this with nest_asyncio or ensure an event loop is running.
# For example:
# import nest_asyncio
# nest_asyncio.apply()
# asyncio.run(main_test())

# If running in a plain Python script:
if __name__ == "__main__":
    # To run asyncio code in a script or compatible environment
    # For Jupyter, you might need:
    # import nest_asyncio
    # nest_asyncio.apply()
    try:
        asyncio.run(main_test())
    except RuntimeError as e:
        if "asyncio.run() cannot be called from a running event loop" in str(e) and 'google.colab' in str(type(get_ipython())):
            print("Detected running in Colab/Jupyter. Applying nest_asyncio.")
            import nest_asyncio
            nest_asyncio.apply()
            asyncio.run(main_test())
        else:
            raise

Detected running in Colab/Jupyter. Applying nest_asyncio.
\nUser Query: What is Vizcaya Museum and Gardens?

[Nemo Action LOG] Received query for RAG: What is Vizcaya Museum and Gardens?
[Nemo Action LOG] Augmented prompt text being sent to LLM (first 200 chars): Using the contexts below, answer the query. If the context does not contain the answer, say that the provided context does not have the information.

    Contexts:
    Built in 1916 as a winter retrea...




[Nemo Action LOG] Cleaned LLM Content: The provided context does not have the information to answer the query about Vizcaya Museum and Gardens. Although the contexts describe various establishments with gardens, such as the Conservatory & Botanical Gardens and a Victorian-era greenhouse, none of them mention Vizcaya Museum and Gardens specifically. The first context describes a lavish villa built in 1916 as a winter retreat, which might be related to Vizcaya Museum and Gardens, but without a direct mention, it's impossible to confirm.
[Nemo Action LOG] Concise Sources: 

**Here are some sources I used:**
- Vizcaya Museum and Gardens (Source: https://www.tripadvisor.com/Attraction_Review-g34438-d130345-Reviews-Vizcaya_Museum_and_Gardens-Miami_Florida.html)
- Bellagio Conservatory & Botanical Garden (Source: https://www.tripadvisor.com/Attraction_Review-g45963-d625114-Reviews-Bellagio_Conservatory_Botanical_Garden-Las_Vegas_Nevada.html)
- Conservatory of Flowers (Source: https://www.trip

  asyncio.run(main_test())
