<img src="./images/logo.png" alt="Drawing" style="width: 500px;"/>

<div class="alert alert-block alert-danger">
<b>Important:</b> This exercise requires the completion of <a href="./03.explore_data_with_spark.ipynb" <b>Exercise 3:</b> Explore Retail Data with Apache Spark</a></div>


# **Exercise 4:** Creating an AI Agent to analyze the data.

In this exercise, you'll explore how to harness the power of HPE Private Cloud AI’s **NVIDIA Inference Microservices (NIM)**, featuring Meta's **Llama 3.1 8b Instruct**, to create your very own **AI-powered Data Analyst Agent**. This agent will interact with your prepared data and help you analyze, summarize, and derive insights—all with natural language.

HPE PCAI provides scalable, containerized access to state-of-the-art models like Llama 3.1, enabling low-latency, high-throughput inferencing—perfect for building intelligent agents that can reason over structured and unstructured data.

Your journey in this exercise will include:
- Integrating your previously prepared datasets with the inference workflow.
- Configuring your AI agent so that it leverages Llama 3.1 8b via NVIDIA Inference Microservices.
- Crafting prompts and building logic for your AI agent to act like a data analyst.
- Interacting with your AI agent using natural language within a Jupyter notebook.

By the end of this exercise, you’ll be able to prototype a lightweight, intelligent AI assistant that can query, explain, and generate insights—turning raw data into valuable knowledge with just a few prompts.

Let’s get started and build your first Data Analyst AI Agent!

## **1. Agent Configuration**

This section covers the configuration of the agent, including:  
* Defining the data context that the agent will interact with  
* Setting up the routine the agent will follow as a system prompt (embedding the data context)  
* Establishing the list of tools available for the agent to complete its tasks  

<div class="alert alert-block alert-danger">
    <b>Important:</b> Set your <b>Username</b>, your <b>Domain</b> and the name of your <b>Presto connection</b> (catalog) here !
</div>

In [None]:
USERNAME=""
DOMAIN=""
CATALOG=f"delta{USERNAME}"

In [None]:
# 0. Import Librairies
import os
from pathlib import Path
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA
from llama_index.embeddings.nvidia import NVIDIAEmbedding
import json
import inspect
from pandas import DataFrame
SCHEMA="default"

This function will retrieve and refresh the NVIDIA JWT authentication token from a secure file path, as the token expires every 30 minutes and must be updated regularly to maintain API access.

In [None]:
# 1. Read JWT Token
def get_nvidia_auth_token():
    %update_token
    token_path = Path("/etc/secrets/ezua/.auth_token")
    if token_path.exists():
        with open(token_path, "r") as f:
            return f.read().strip()
    raise ValueError("NVIDIA auth token not found at /etc/secrets/ezua/.auth_token")

nvidia_api_key = get_nvidia_auth_token()

Initialize an NVIDIA LLM client with specific parameters (like model, URL, and API key) and assign it to `Settings.llm` for use throughout the application.

<div class="alert alert-block alert-danger">
    <b>Important:</b> Set your <b>base_url</b> here !
</div>

In [None]:
# 3. NVIDIA NIM Setup
llm = NVIDIA(
    base_url="https://llama-3-1-8b-6efc4543-predictor-ezai-services.hpepcai-ingress.pcai.hpecic.net/v1",
    model="meta/llama-3.1-8b-instruct",
    api_key=nvidia_api_key,
    temperature=0.1,
    max_tokens=1024
)
Settings.llm = llm

In [None]:
from pyhive import presto
from pandas import DataFrame
import json

This function will establish and return a Presto connection using specified host, port, catalog, and schema settings, with HTTPS and SSL verification disabled `IgnoreSSLChecks=true`.

In [None]:
def get_presto_connection():
    return presto.connect(
        host=f"ezpresto.{DOMAIN}",
        port=443,
        catalog=CATALOG,
        schema=SCHEMA,
        protocol='https',
        requests_kwargs={
            'verify': False
        }
    )


This function will query the Delta table schema from Presto, fetch column metadata for a specific schema, convert the results into a JSON-formatted DataFrame, and handle connection cleanup and errors gracefully.

In [None]:
# 5. Delta Table Schema Query (fixed connection handling)
def query_delta_dictionary():
    query = f'''
    SELECT 
        table_schema as "DatabaseName",
        table_name as "TableName", 
        column_name as "ColumnName",
        data_type as "ColumnType"
    FROM {CATALOG}.information_schema.columns
    WHERE table_schema NOT IN ('information_schema', 'sys')
      AND table_schema = '{SCHEMA}'
    '''
    
    conn = None
    try:
        conn = get_presto_connection()
        cursor = conn.cursor()
        cursor.execute(query)
        results = cursor.fetchall()
        table_dictionary = DataFrame(
            results, 
            columns=["DatabaseName", "TableName", "ColumnName", "ColumnType"]
        )
        return json.dumps(table_dictionary.to_json())
    except Exception as e:
        return json.dumps({"error": f"Connection failed: {str(e)}"})
    finally:
        if conn:
            conn.close()

We define a detailed system prompt that guides the LLM to act as a data analyst, using Delta table metadata and Presto SQL to answer business questions with optimized read-only queries and clear, user-friendly responses.

In [None]:
# 6. System Prompt Setup
db_dictionary = query_delta_dictionary()

system_prompt = f"""
You are an advanced data analyst for a retailer company, specializing in analyzing data from our Delta Lake tables accessed via Presto. Your primary responsibility is to assist users by answering business-related questions using SQL queries. Follow these steps:

1. Understanding User Requests
   - Users provide business questions in plain English.
   - Extract relevant data points needed to construct a meaningful response.

2. Generating SQL Queries
   - Construct an optimized Presto SQL query to retrieve the necessary data from Delta tables.
   - The query must be a **single-line string** without carriage returns or line breaks.
   - Ensure the query uses proper catalog.schema.table references (format: {CATALOG}.{SCHEMA}.table_name)
   - The metadata of available tables and columns is in this json structure: 
     {db_dictionary}
   - Apply appropriate filtering, grouping, and ordering to enhance performance.
   - Presto-specific considerations:
     * Use `DATE()` for date casting instead of `::date`
     * String concatenation uses `||` not `+`
     * For approximate counts, consider `approx_distinct()` 
   - Don't display the SQL queries unless specifically asked

3. Executing the Query
   - Run the SQL query on our Presto system and retrieve the results efficiently.

4. Responding to the User
   - Convert the query results into a **concise, insightful, and plain-English response**.
   - Present the information in a clear, structured, and user-friendly manner.
   - For large results, consider summarizing trends instead of listing all data points.

You have access to these tools:
- `query_delta_database`: For executing Presto SQL queries on Delta tables
- `query_delta_dictionary`: For fetching metadata about tables and columns

Always use `query_delta_database` when the user asks for data stored in our Delta tables.
Important: Never suggest queries that would modify data - we only allow read operations.
"""

This function will execute a sanitized Presto SQL query on Delta tables—automatically adding catalog and schema if missing—then return the results as JSON, handling connections and errors gracefully.

In [None]:
# 7. Query Delta Tables (fixed connection handling)
def query_delta_database(sql_statement):
    try:
        query_statement = sql_statement.strip().replace('\n', ' ')
        
        # Auto-add catalog.schema prefix if missing
        if 'FROM ' in query_statement and '.' not in query_statement.split('FROM ')[1].split()[0]:
            table_ref = query_statement.split('FROM ')[1].split()[0]
            query_statement = query_statement.replace(
                f'FROM {table_ref}', 
                f'FROM {CATALOG}.{SCHEMA}.{table_ref}'
            )
        
        conn = None
        try:
            conn = get_presto_connection()
            cursor = conn.cursor()
            cursor.execute(query_statement)
            
            if cursor.description:
                columns = [desc[0] for desc in cursor.description]
                data = cursor.fetchall()
                df = DataFrame(data, columns=columns)
                return json.dumps(df.to_dict(orient='records'))
            else:
                return json.dumps({"message": "Query executed successfully"})
        finally:
            if conn:
                conn.close()
    except Exception as e:
        return json.dumps({"error": str(e)})

This function will send the system prompt you just created and user query as messages to the LLM, then return its chat-based response as a string.

In [None]:
# 8. Agent Conversation Function
def run_agent_conversation(user_query):
    from llama_index.core.llms import ChatMessage
    
    messages = [
        ChatMessage(role="system", content=system_prompt),
        ChatMessage(role="user", content=user_query)
    ]
    
    response = llm.chat(messages)
    return str(response)  # Changed from response.content to str(response)

Now, we invoke the agent with a natural language question to generate and execute a SQL-based analysis, then print the response.
Let's try it out!

In [None]:
# 9. Example Usage (with proper string termination)
response = run_agent_conversation("What are the top 5 selling products by revenue?")
print(response)

## 2. Agent Runtime
This section covers the code executed while the agent is in action, including:
* Preparing the tools for use by the agent
* The agent's runtime function

This function will convert a Python function's signature and docstring into an NVIDIA-compatible tool schema by extracting parameter types, requirements, and descriptions.

In [None]:
from typing import Dict, Any, Callable
import inspect
import json

def function_to_schema(func: Callable) -> Dict[str, Any]:
    """Convert a Python function to a tool schema compatible with NVIDIA LLM
    
    Args:
        func: The Python function to convert
        
    Returns:
        Dictionary containing the function schema in NVIDIA-compatible format
    """
    sig = inspect.signature(func)
    docstring = inspect.getdoc(func) or ""
    
    # Extract parameter information
    parameters = {
        "type": "object",
        "properties": {},
        "required": []
    }
    
    for name, param in sig.parameters.items():
        if name == "self":
            continue
            
        param_type = "string"  # default type
        if param.annotation != inspect.Parameter.empty:
            if param.annotation == str:
                param_type = "string"
            elif param.annotation == int:
                param_type = "integer"
            elif param.annotation == float:
                param_type = "number"
            elif param.annotation == bool:
                param_type = "boolean"
        
        parameters["properties"][name] = {
            "type": param_type,
            "description": ""  # Can be enhanced with parameter-specific docs
        }
        
        if param.default == inspect.Parameter.empty:
            parameters["required"].append(name)
    
    return {
        "name": func.__name__,
        "description": docstring,
        "parameters": parameters
    }

First, we need to prepare the agent tools by creating tool schemas, mapping functions to their names, and defining a mechanism to execute tool calls with the appropriate arguments.

In [None]:
# 10. Prepare Tools for Agent
from llama_index.core.llms import ChatMessage
from typing import List, Dict, Any

tools = [query_delta_database]
tool_schemas = [function_to_schema(tool) for tool in tools]
tools_map = {tool.__name__: tool for tool in tools}

def execute_tool_call(tool_call, tools_map):
    name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    print(f"Assistant: {name}({args})")

    # call corresponding function with provided arguments
    return tools_map[name](**args)

Then, we convert a dictionary message into a LlamaIndex `ChatMessage` object, including optional additional arguments.

In [None]:
def convert_to_chat_message(message: Dict[str, Any]) -> ChatMessage:
    """Convert dictionary message to LlamaIndex ChatMessage"""
    return ChatMessage(
        role=message["role"],
        content=message["content"],
        additional_kwargs=message.get("additional_kwargs", {})
    )

Finally, we handle a full conversation turn by streaming and displaying tokenized responses from the LLM, processing tool calls if needed, and appending the results to the message history for further interaction.

In [None]:
from IPython.display import display, Markdown
import time

def run_full_turn(system_message: str, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    chat_messages = [convert_to_chat_message(msg) for msg in messages]
    
    while True:
        # Initialize streaming
        full_response = []
        response_buffer = ""
        out = display(Markdown(""), display_id=True)
        
        # Get streaming response with token awareness
        response_stream = llm.stream_chat(
            chat_messages,
            max_tokens=4096,  # Adjust based on your model's limits
            temperature=0.1
        )
        
        # Process stream with token-aware chunking
        for chunk in response_stream:
            content = chunk.delta
            if content:
                response_buffer += content
                full_response.append(content)
                
                # Display when we hit natural breaks or every 20 tokens
                if len(response_buffer.split()) >= 20 or content.endswith(('\n', '.', '!', '?')):
                    out.update(Markdown("".join(full_response)))
                    response_buffer = ""
                    time.sleep(0.05)  # Natural reading speed
        
        # Final update to ensure complete display
        out.update(Markdown("".join(full_response)))
        
        # Store complete response
        response_dict = {
            "role": "assistant",
            "content": "".join(full_response),
            "additional_kwargs": getattr(response_stream, "additional_kwargs", {})
        }
        messages.append(response_dict)
        
        # Handle tool calls (unchanged)
        additional_kwargs = response_dict.get("additional_kwargs", {})
        if "tool_calls" in additional_kwargs:
            for tool_call in additional_kwargs["tool_calls"]:
                result = execute_tool_call(tool_call, tools_map)
                result_message = {
                    "role": "tool",
                    "content": result,
                    "tool_call_id": tool_call.get("id", ""),
                    "name": tool_call["function"]["name"]
                }
                messages.append(result_message)
                chat_messages.append(convert_to_chat_message(result_message))
        else:
            break
    
    return messages

## 3. Running the Agent

Congratulations! You've created an AI agent!

Now, let's try it out!

### Sample Questions:
1. What are our top-selling products by revenue and quantity sold?
2. Who are our top 10 customers by total spend and order frequency?
3. Which products have the lowest stock levels relative to their sales velocity?
4. Which product categories generate the highest profit margins?
5. What is our order fulfillment rate and average time to fulfill orders?
6. How has our customer base grown over time?
7. What are the seasonal trends in our product categories?
8. What products are frequently purchased together?
9. What percentage of customers make repeat purchases?
10. Which customer segments are most profitable when considering acquisition cost and lifetime value?

In [None]:
# Updated imports
from typing import AsyncIterator, Iterator
import sys
import time

# 10. Prepare Tools for Agent (unchanged)
tools = [query_delta_database]
tool_schemas = [function_to_schema(tool) for tool in tools]
tools_map = {tool.__name__: tool for tool in tools}

# Modified agent interaction with streaming
def run_agent_interaction():
    messages = [{"role": "system", "content": system_prompt}]
    
    while True:
        user_input = input("\nUser (type 'exit' to quit): ")
        if user_input.lower() == 'exit':
            break
            
        messages.append({"role": "user", "content": user_input})
        messages = run_full_turn(system_prompt, messages)
        
        # Display any tool results
        for msg in reversed(messages):
            if msg.get("role") == "tool" and "content" in msg:
                print(f"\n[Database Result]: {msg['content']}")
                break

if __name__ == "__main__":
    run_agent_interaction()

# **Conclusion**

Great job building your first **AI-powered Data Analyst Agent** with **HPE Private Cloud AI** and **NVIDIA Inference Microservices** (NIM) featuring Meta’s **Llama 3.1 8b Instruct**!

In this exercise, you successfully:
- Integrated your datasets with the AI inference pipeline
- Configured and deployed a responsive, containerized LLM-powered agent
- Designed smart prompting strategies for data analysis and summarization
- Interacted with the agent in natural language via Jupyter notebooks
- Transformed complex datasets into clear, actionable insights with ease

By leveraging PCAI’s scalable infrastructure and the reasoning power of Llama 3.1, you've prototyped a modern, intelligent assistant capable of bridging the gap between data and decisions.

From prompt to insight, you’ve created a data analyst that works at the speed of thought.

Next: <a href="./05.cleanup.ipynb" style="color: black"><b style="color: #01a982;">Cleanup</b></a>

Previous: <a href="./03.explore_data_with_spark.ipynb" style="color: black"><b style="color: #01a982;">Exercise 3:</b> Explore Retail Data with Apache Spark</a>