# Enterprise Chat Example

This notebook creates an example of a LLM agent that can answer user questions about an internal company knowledge base, such as: 

- "Do I need to use PTO if I am sick?"
- "What if I am sick for a long time?"

The notebook uses: 
- NVIDIA NIMs for foundational LLMs and embedding models
- Glean for storing the corporate knowledge base
- Glean Search API for accessing the Glean knowledge base
- Chroma DB for storing cached query results and performing RAG
- LangGraph for creating an agent

Best of all, because both Glean and NVIDIA NIMs can be deployed in your private environment, it is possible to create this type of enterprise chatbot without any data leaving your control.

To get started with this notebook, set the following environment variables. You will need a Glean deployment, a Glean API key, and a [NVIDA API Key](https://build.nvidia.com).

In [None]:
!export GLEAN_API_KEY="YOUR GLEAN API KEY"
!export GLEAN_API_BASE_URL="https://your-org.glean.com/rest/api/v1"
!export NVIDIA_API_KEY="nvapi-YOUR NVIDIA API KEY"

We start by instantiating the LLM and embedding model. You can update this code to use different foundational LLMs, or add the `base_url` parameter if you are using private NVIDIA NIM microservices.

In [None]:
import os
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

model = ChatNVIDIA(
    model="meta/llama-3.3-70b-instruct", api_key=os.getenv("NVIDIA_API_KEY")
)
embeddings = NVIDIAEmbeddings(
    model="nvidia/llama-3.2-nv-embedqa-1b-v2",
    api_key=os.getenv("NVIDIA_API_KEY"),
    truncate="NONE",
)

We test the foundational LLM by asking it a simple question:

In [None]:
response = model.invoke("do I need to use PTO if I am sick?")
print(response.content)

While the model is able to interpret our question and formulate a response, it does not have access to any information about company-specific policies. To add this type of information we will follow a multi-step process: 

1. Have the LLM interpret the user's question and add any relevant context. Most free form questions can be passed directly to the Glean Search API.
2. Add relevant context about the user and then query the Glean knowledge base using the Glean search API to get the most relevant supporting documents. 
3. Embed those supporting documents into a local vector DB.
4. Use a retriever model to fetch the most relevant supporting document based on the user's original question.
5. Take the most relevant supporting document and add it to the LLM by adding it to the LLM's prompt (RAG).
6. Ask the LLM to summarize the results and answer the user's question with this new relevant context.

To help organize these steps we use a LangGraph agent. The full implementation of the agent is available in the file `glean_example/src/agent.py`. The following code samples explain some core concepts of that code.

```python
class InfoBotState(BaseModel):
    messages: List[Tuple[str, str]] = None
    glean_query_required: Optional[bool] = None
    glean_results: Optional[List[str]] = None
    db: Optional[Any] = None
    answer_candidate: Optional[str] = None

graph = StateGraph(InfoBotState)
graph.add_node("determine_user_intent", determine_user_intent)
graph.add_node("call_glean", call_glean)
graph.add_node("add_embeddings", add_embeddings)
graph.add_node("answer_candidates", answer_candidates)
graph.add_node("summarize_answer", summarize_answer)
graph.add_edge(START, "determine_user_intent")
graph.add_conditional_edges(
    "determine_user_intent",
    route_glean, 
    {"call_glean": "call_glean", "summarize_answer": "summarize_answer"}
)
graph.add_edge("call_glean", "add_embeddings")
graph.add_edge("add_embeddings", "answer_candidates")
graph.add_edge("answer_candidates", "summarize_answer")
graph.add_edge("summarize_answer", END)
agent = graph.compile()

```

This code is responsible for creating the agent. Each node represents a function responsible for implementing one of the six steps in our process. The `InfoBotState` is a special type of dictionary that will hold all of the information the agent needs through each step of the process. 

The source of each function is also available in `glean_example/src/agent.py`. For example, the implementation of `call_bot` is: 

```python
def summarize_answer(state: InfoBotState):
    """the main agent responsible for taking all the context and answering the question"""
    logger.info("Generate final answer")

    llm = PROMPT_ANSWER | model

    response = llm.invoke(
        {
            "messages": state.messages,
            "glean_search_result_documents": state.glean_results,
            "answer_candidate": state.answer_candidate,
        }
    )
    state.messages.append(("agent", response.content))
    return state
```

This function takes the NVIDIA NIM foundational LLM model and invokes it with a specific prompt and the information available in the agent state. The prompt tells the agent what to do, injecting the relevant information from the agent state. You can see the prompts in the file `glean_example/src/prompts.py`. For example, the `PROMPT_ANSWER` is: 

```raw
You are the final part of an agent graph. Your job is to answer the user's question based on the information below. Include a url citation in your answer.

Message History: {messages}

All Supporting Documents from Glean: 

{glean_search_result_documents}

Content from the most relevant document that you should prioritize: 

{answer_candidate}

Answer: 

Citation Url: 
```

A main part of this agent is the step that calls the Glean Search API. This RESTful request is implemented in the file `glean_example/src/glean_utils/utils.py`: 

```python
def glean_search(query, api_key, base_url, **kwargs):
    endpoint = f"{base_url}/search"

    headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}

    payload = {
        "query": query,
        "pageSize": kwargs.get("page_size", 10),
        "requestOptions": {},
    }

    # Add optional parameters
    if "cursor" in kwargs:
        payload["cursor"] = kwargs["cursor"]

    if "facet_filters" in kwargs:
        payload["requestOptions"]["facetFilters"] = kwargs["facet_filters"]

    if "timeout_millis" in kwargs:
        payload["timeoutMillis"] = kwargs["timeout_millis"]

    try:
        response = requests.post(endpoint, json=payload, headers=headers)
        response.raise_for_status()

        data = response.json()

        result = {
            "status_code": response.status_code,
            "request_id": data.get("requestID"),
            "results": data.get("results", []),
            "facet_results": data.get("facetResults", []),
            "cursor": data.get("cursor"),
            "has_more_results": data.get("hasMoreResults", False),
            "tracking_token": data.get("trackingToken"),
            "backend_time_millis": data.get("backendTimeMillis"),
        }

        return result

    except requests.exceptions.RequestException as e:
        raise e
```

To try out the agent we can load the `glean_example` source code and invoke the full agent:

In [None]:
from glean_example.src.agent import agent

msg = "What's the latest on the new API project?"
history = []
history.append(("user", msg))
response = agent.invoke({"messages": history})
print(response["messages"][-1][1])