# People Analytics Agent - Agentic AI Workshop

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DougTrajano/agentic-ai-workshop/blob/main/people_analytics_agent.ipynb)

This Jupyter Notebook demonstrates how to build an intelligent People Analytics Agent using LangChain and Google's Gemini model.

The agent can:
- **Query HR databases** using natural language questions
- **Execute SQL queries** automatically based on user requests
- **Perform calculations** and statistical analysis
- **Generate visualizations** using Plotly for data insights
- **Provide structured responses** with summaries, SQL queries, datasets, and charts

This notebook showcases an agentic workflow that combines SQL database interaction with AI-powered analysis to answer complex HR and people analytics questions.

## Set up Environment

### Install Dependencies

Install all required Python packages for building the People Analytics Agent, including LangChain for agent orchestration, Google Generative AI for the LLM, and data manipulation libraries.

In [17]:
%pip install -q "pydantic>=2.11" "pydantic-settings>=2.11" "faker>=37.12" \
    "datasets>=4.0" "huggingface_hub>=0.35" "sqlalchemy>=2.0.44" "sqlparse>=0.5" \
        "numexpr>=2.14" "numpy>=2.0" "pandas>=2.2" "plotly>=5.24" "langchain>=1.0" \
            "langgraph>=1.0" "langchain-community>=0.4" "langchain-google-genai>=3.0" \
                "chainlit>=2.8" "pyngrok>=7.4"

Note: you may need to restart the kernel to use updated packages.


### Define Parameters

Configure the agent settings including:
- **GOOGLE_API_KEY**: Authentication for Google AI Studio (Gemini models)
- **HF_TOKEN**: Hugging Face authentication for accessing datasets
- **HF_DATASET_NAME**: The synthetic HR database we generated in the previous notebook

These parameters ensure secure access to both the AI model and the HR dataset.

In [18]:
from pydantic import BaseModel, Field, SecretStr, field_validator
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    """People Analytics Agent Settings."""

    GOOGLE_API_KEY: SecretStr = Field(
        ...,
        description="Google API key for accessing Google AI Studio.",
    )

    HF_TOKEN: SecretStr = Field(
        ...,
        description="Hugging Face API token for accessing private resources.",
    )

    HF_DATASET_NAME: str = Field(
        default="dougtrajano/hr-synthetic-database",
        description="The name of the Hugging Face dataset to load.",
    )

In [19]:
import os


# Detect if running on Google Colab
try:
    import google.colab

    IN_COLAB = True
except ImportError:
    IN_COLAB = False

# Load environment variables based on environment
if IN_COLAB:
    # Running on Google Colab - use userdata
    from google.colab import userdata

    os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')
    os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')


settings = Settings()

## Load HR Synthetic Database

Load the synthetic HR database that was generated in the `hr_synthetic_database.ipynb` notebook from Hugging Face Datasets.

This dataset contains:
- **business_units**: Top-level organizational divisions
- **departments**: Functional units within business units
- **jobs**: Job position definitions and classifications
- **employees**: Employee personal and demographic information
- **compensations**: Employee compensation packages

We'll use the [load_dataset()](https://huggingface.co/docs/datasets/v4.3.0/en/package_reference/loading_methods#datasets.load_dataset) function from the [datasets](https://huggingface.co/docs/datasets/) library to load each table separately.

In [20]:
from datasets import load_dataset


business_units = load_dataset(settings.HF_DATASET_NAME, 'business_units')
departments = load_dataset(settings.HF_DATASET_NAME, 'departments')
jobs = load_dataset(settings.HF_DATASET_NAME, 'jobs')
employees = load_dataset(settings.HF_DATASET_NAME, 'employees')
compensations = load_dataset(settings.HF_DATASET_NAME, 'compensations')

### Convert to Pandas DataFrames

Convert the Hugging Face datasets to Pandas DataFrames for easier manipulation and loading into SQLite.

We also clean the data by replacing Hugging Face's string representations of NULL values (like 'None' or empty strings) with proper `np.nan` values for correct database handling.

In [21]:
import numpy as np
import pandas as pd


def clean_huggingface_nulls(df: pd.DataFrame) -> pd.DataFrame:
    """
    Clean Hugging Face dataset string representations of NULL values.

    Hugging Face datasets may contain string 'None' or empty strings instead
    of actual NULL values. This function replaces them with np.nan for proper
    NULL handling in PostgreSQL.

    Parameters:
    df (pd.DataFrame): Input DataFrame with potential string NULL values.

    Returns:
    pd.DataFrame: Cleaned DataFrame with proper NULL values.
    """
    df_clean = df.copy()
    for col in df_clean.columns:
        df_clean[col] = df_clean[col].replace(['None', ''], np.nan)
    return df_clean

In [22]:
# Convert Hugging Face datasets to Pandas DataFrames
df_business_units = clean_huggingface_nulls(
    business_units['train'].to_pandas()
)
df_departments = clean_huggingface_nulls(
    departments['train'].to_pandas()
)
df_jobs = clean_huggingface_nulls(
    jobs['train'].to_pandas()
)
df_employees = clean_huggingface_nulls(
    employees['train'].to_pandas()
)
df_compensations = clean_huggingface_nulls(
    compensations['train'].to_pandas()
)


print(f"Business Units: {df_business_units.shape}")
print(f"Departments: {df_departments.shape}")
print(f"Jobs: {df_jobs.shape}")
print(f"Employees: {df_employees.shape}")
print(f"Compensations: {df_compensations.shape}")

Business Units: (4, 4)
Departments: (17, 5)
Jobs: (94, 7)
Employees: (15700, 12)
Compensations: (15700, 7)


### Load DataFrames into SQLite

Create an in-memory SQLite database and load all HR tables into it.

This approach provides:
- **Fast querying**: In-memory databases offer excellent performance
- **SQL compatibility**: Standard SQL interface for the agent to query
- **No persistence needed**: Perfect for demo/workshop scenarios
- **LangChain integration**: Works seamlessly with LangChain's SQL tools

We also create a SQLAlchemy engine for compatibility with LangChain's SQL toolkit.

In [33]:
import sqlite3

from sqlalchemy import create_engine


# Create a SQLite file-based connection
conn = sqlite3.connect('hr_synthetic_database.db', check_same_thread=False)

# Create SQLAlchemy engine for the SQL toolkit
db_engine = create_engine('sqlite:///hr_synthetic_database.db', creator=lambda: conn)

# Create tables in SQLite from the DataFrames
df_business_units.to_sql('business_units', conn, index=False, if_exists='replace')
df_departments.to_sql('departments', conn, index=False, if_exists='replace')
df_jobs.to_sql('jobs', conn, index=False, if_exists='replace')
df_employees.to_sql('employees', conn, index=False, if_exists='replace')
df_compensations.to_sql('compensations', conn, index=False, if_exists='replace')

# Verify the tables were created
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print("Tables in SQLite:")
print(cursor.fetchall())

Tables in SQLite:
[('business_units',), ('departments',), ('jobs',), ('employees',), ('compensations',)]


## Build People Analytics Agent

In this section, we'll construct our intelligent agent using LangChain's agent framework.

The agent will combine:
- **Language Model**: Google's Gemini for natural language understanding
- **Tools**: SQL querying and mathematical calculations
- **System Prompt**: Instructions for proper database interaction
- **Structured Output**: Consistent response format with summaries and visualizations

### Load the Language Model

Initialize Google's Gemini Flash model as our agent's reasoning engine.

Gemini Flash offers:
- **Fast inference**: Quick responses for interactive analysis
- **Cost-effective**: Efficient for high-volume queries
- **Strong reasoning**: Capable of complex SQL generation and data analysis

In [24]:
from langchain_google_genai import ChatGoogleGenerativeAI


llm = ChatGoogleGenerativeAI(model="models/gemini-flash-latest")

### Define Agent Tools

Equip the agent with tools for data analysis and computation:

1. **Calculator Tool**: Performs mathematical calculations using numexpr for statistical analysis
2. **SQL Database Tools**: LangChain's SQL toolkit provides:
   - `sql_db_list_tables`: List all available database tables
   - `sql_db_schema`: Get table schemas and column information
   - `sql_db_query`: Execute SQL queries and retrieve results
   - `sql_db_query_checker`: Validate SQL syntax before execution

These tools enable the agent to autonomously explore the database schema, construct appropriate SQL queries, and perform calculations on the results.

In [None]:
import math

import numexpr
from langchain_core.tools import tool


@tool
def calculator(expression: str) -> str:
    """Calculate expression using Python's numexpr library.

    Expression should be a single line mathematical expression
    that solves the problem.

    Examples:
        "37593 * 67" for "37593 times 67"
        "37593**(1/5)" for "37593^(1/5)"
    """
    local_dict = {'pi': math.pi, 'e': math.e}
    return str(
        numexpr.evaluate(
            expression.strip(),
            global_dict={},  # restrict access to globals
            local_dict=local_dict,  # add common mathematical functions
        )
    )

In [26]:
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase


db = SQLDatabase(engine=db_engine)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
sql_tools = toolkit.get_tools()

print(f"Available tables: {db.get_usable_table_names()}")

Available tables: ['business_units', 'compensations', 'departments', 'employees', 'jobs']


In [27]:
tools = [calculator] + sql_tools

### Define System Prompt

Create the system prompt that guides the agent's behavior when interacting with the SQL database.

The prompt instructs the agent to:
- **Explore first**: Always check available tables and schemas before querying
- **Write safe queries**: Create syntactically correct, read-only SQL (no DML operations)
- **Limit results**: Return at most 5 results by default (configurable)
- **Verify queries**: Double-check SQL before execution
- **Handle errors**: Rewrite and retry queries if they fail
- **Be efficient**: Only query relevant columns, not all columns

This prompt ensures the agent follows best practices for database interaction.

In [28]:
def get_system_prompt(dialect: str, top_k: int = 5) -> str:
    """Get the system prompt for the agent.

    Args:
        dialect (str): The SQL dialect of the database.
        top_k (int): The maximum number of results to return.

    Returns:
        str: The system prompt.
    """

    system_prompt = """
    You are an agent designed to interact with a SQL database.
    Given an input question, create a syntactically correct {dialect} query to run,
    then look at the results of the query and return the answer. Unless the user
    specifies a specific number of examples they wish to obtain, always limit your
    query to at most {top_k} results.

    You can order the results by a relevant column to return the most interesting
    examples in the database. Never query for all the columns from a specific table,
    only ask for the relevant columns given the question.

    You MUST double check your query before executing it. If you get an error while
    executing a query, rewrite the query and try again.

    DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
    database.

    To start you should ALWAYS look at the tables in the database to see what you
    can query. Do NOT skip this step.

    Then you should query the schema of the most relevant tables.
    """

    return system_prompt.format(
        dialect=dialect,
        top_k=top_k,
    )


system_prompt = get_system_prompt(dialect=db.dialect, top_k=5)

### Define Response Format

Define a structured output format using Pydantic to ensure consistent, high-quality agent responses.

The `AgentOutput` model contains:
- **summary**: Natural language answer with key findings and insights
- **sql_query**: The SQL query executed (formatted for readability)
- **dataset**: JSON representation of query results in pandas-compatible format
- **plotly_json_fig**: Optional Plotly chart for data visualization

This structure ensures every response is:
- **Informative**: Contains both explanation and raw data
- **Transparent**: Shows the SQL query used
- **Visual**: Includes charts when appropriate
- **Machine-readable**: JSON format for downstream processing

In [None]:
import sqlparse


class AgentOutput(BaseModel):
    """Agent response containing analysis summary, SQL query, dataset, and an optional Plotly JSON Chart."""

    summary: str = Field(
        ...,
        description=(
            'Markdown-based short answer to the user question. '
            "Example: 'The Sales department has 150 employees.'"
        ),
    )

    sql_query: str | None = Field(
        default=None,
        description=(
            'SQL query executed to retrieve data. '
            "Example: 'SELECT AVG(Salary), MIN(Salary), MAX(Salary) FROM employees;'"
        ),
    )

    dataset: str | None = Field(
        default=None,
        description=(
            'A JSON-serializable representation of the dataset returned by the SQL query. '
            'This should be compatible with pandas DataFrame construction (data and columns). '
            'Include this field when the SQL query returns tabular data that supports the answer. '
            'Example: \'{"data": [[120000, 70000, 210000]], "columns": ["average_salary", "min_salary", "max_salary"]}\''
        ),
    )

    plotly_json_fig: str | None = Field(
        default=None,
        description=(
            'A Plotly JSON figure representation for visualizing the dataset. '
            'Include this field when a graphical representation of the data is helpful. '
            "IMPORTANT: Always include data labels on the chart by setting 'text' in the trace "
            "and 'textposition' to display values on the bars/points. "
            "For bar charts, use 'textposition': 'auto' or 'outside'. "
            "For scatter plots, use 'mode': 'markers+text'. "
            'Example: \'{"data": [{"type": "bar", "x": ["A", "B"], "y": [10, 20], '
            '"text": [10, 20], "textposition": "auto"}], '
            '"layout": {"title": "Sample Bar Chart"}}\''
        ),
    )

    @field_validator('sql_query')
    @classmethod
    def format_sql_query(cls, v: str | None) -> str | None:
        """Format SQL query by removing common leading whitespace."""
        if v is None:
            return None
        return sqlparse.format(v, reindent=True, keyword_case='upper', indent_width=4).strip()

    def get_message(self) -> str:
        """Get a user-friendly message summarizing the agent's response."""
        message = self.summary
        if self.sql_query:
            message += f'\n\nSQL Query Executed:\n```sql\n{self.sql_query}\n```'
        return message


### Create the Agent

Instantiate the LangChain agent with all components:
- **Model**: Gemini Flash for reasoning and query generation
- **Tools**: SQL database tools + calculator
- **System Prompt**: Database interaction guidelines
- **Response Format**: Structured output schema

The agent uses a ReAct (Reasoning + Acting) pattern to:
1. **Reason** about the user's question
2. **Act** by using tools (query DB, calculate, etc.)
3. **Observe** the results
4. **Repeat** until the question is fully answered

This creates an autonomous agent that can handle complex multi-step analytics questions.

In [30]:
from langchain.agents import create_agent


agent = create_agent(
    model=llm,
    tools=tools,
    system_prompt=system_prompt,
    response_format=AgentOutput,
)

### Test the Agent

Run a test query to verify the agent is working correctly.

This example asks the agent to "show the number of employees by business unit", which requires:
1. **Discovering tables**: Agent explores the database schema
2. **Understanding relationships**: Identifies the connection between employees and business_units
3. **Writing SQL**: Constructs an appropriate GROUP BY query
4. **Formatting output**: Returns results as a structured response with visualization

The streaming output shows the agent's thought process and tool usage in real-time.

In [31]:
from langchain_core.runnables import RunnableConfig


inputs = {
    "messages": [
        {
            "role": "user",
            "content": "show the number of employees by business unit",
        }
    ]
}

user_config = RunnableConfig(
    configurable={'thread_id': 'user_thread_id_1993'},
    recursion_limit=40,
)

for step in agent.stream(inputs, config=user_config, stream_mode="values"):
    step["messages"][-1].pretty_print()


show the number of employees by business unit
Tool Calls:
  sql_db_list_tables (78c6e216-02d4-4ae0-9af4-fa7d9c68460e)
 Call ID: 78c6e216-02d4-4ae0-9af4-fa7d9c68460e
  Args:
    tool_input:
Name: sql_db_list_tables

business_units, compensations, departments, employees, jobs
Tool Calls:
  sql_db_schema (76dd605f-af47-4015-816a-f12a5699f737)
 Call ID: 76dd605f-af47-4015-816a-f12a5699f737
  Args:
    table_names: employees, business_units
Name: sql_db_schema


CREATE TABLE business_units (
	id TEXT, 
	name TEXT, 
	description TEXT, 
	director_job_id TEXT
)

/*
3 rows from business_units table:
id	name	description	director_job_id
b1a0e1f2-1111-4c1a-9a10-0001d0b00101	Domestic Retail Division	Oversees all retail formats in the home market, including supercenters, supermarkets, and discount s	d1b2c3d4-0001-4e01-9000-000000000001
b1a0e1f2-2222-4c1a-9a10-0001d0b00102	International Markets Division	Manages international retail formats and regional marketplaces in EMEA, APAC, and LATAM.	d1b2c3d4

In [32]:
import plotly.io as pio


if "structured_response" in step:
    response = AgentOutput.model_validate(step["structured_response"])

    print(f"Agent Response:\n{response.get_message()}")

    if response.plotly_json_fig:
        fig = pio.from_json(response.plotly_json_fig, skip_invalid=True)
        fig.update_layout(template='plotly_dark')
        fig.show()

Agent Response:
The number of employees by business unit is:
- Domestic Retail Division: 10,795 employees
- International Markets Division: 4,746 employees
- Membership-Based Wholesale Club Division: 93 employees
- Shared Services: 66 employees

SQL Query Executed:
```sql
SELECT T2.name,
       COUNT(T1.id) AS num_employees
FROM employees AS T1
INNER JOIN business_units AS T2 ON T1.business_unit_id = T2.id
GROUP BY T2.name
ORDER BY num_employees DESC
```


## Next Steps: Chainlit Integration

The agent can be integrated with Chainlit to create an interactive conversational UI for People Analytics.

Chainlit provides:
- **Chat interface**: User-friendly UI for natural language queries
- **Conversation history**: Maintains context across multiple questions
- **Real-time streaming**: Shows agent progress as it thinks and acts
- **Visualization support**: Renders Plotly charts inline

This would enable HR professionals and analysts to interact with the HR database conversationally, asking follow-up questions and exploring data without writing SQL.

### Write Chainlit App

In the next cell, we will use the magic command `%%writefile app.py` to create a Chainlit application that leverages the People Analytics Agent we built in this notebook.

This app will allow us to interact with the agent through a web-based chat interface.

In [None]:
%%writefile app.py
import json
import math

import chainlit as cl
import numexpr
import pandas as pd
import plotly.io as pio
import sqlparse
from langchain.agents import create_agent
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
from langchain_core.runnables.config import RunnableConfig
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.pregel import Pregel
from pydantic import BaseModel, Field, field_validator
from sqlalchemy import create_engine


@tool
def calculator(expression: str) -> str:
    """Calculate expression using Python's numexpr library.

    Expression should be a single line mathematical expression
    that solves the problem.

    Examples:
        "37593 * 67" for "37593 times 67"
        "37593**(1/5)" for "37593^(1/5)"
    """
    local_dict = {'pi': math.pi, 'e': math.e}
    return str(
        numexpr.evaluate(
            expression.strip(),
            global_dict={},  # restrict access to globals
            local_dict=local_dict,  # add common mathematical functions
        )
    )


class AgentOutput(BaseModel):
    """Agent response containing analysis summary, SQL query, dataset, and an optional Plotly JSON Chart."""

    summary: str = Field(
        ...,
        description=(
            'Markdown-based short answer to the user question. '
            "Example: 'The Sales department has 150 employees.'"
        ),
    )

    sql_query: str | None = Field(
        default=None,
        description=(
            'SQL query executed to retrieve data. '
            "Example: 'SELECT AVG(Salary), MIN(Salary), MAX(Salary) FROM employees;'"
        ),
    )

    dataset: str | None = Field(
        default=None,
        description=(
            'A JSON-serializable representation of the dataset returned by the SQL query. '
            'This should be compatible with pandas DataFrame construction (data and columns). '
            'Include this field when the SQL query returns tabular data that supports the answer. '
            'Example: \'{"data": [[120000, 70000, 210000]], "columns": ["average_salary", "min_salary", "max_salary"]}\''
        ),
    )

    plotly_json_fig: str | None = Field(
        default=None,
        description=(
            'A Plotly JSON figure representation for visualizing the dataset. '
            'Include this field when a graphical representation of the data is helpful. '
            "IMPORTANT: Always include data labels on the chart by setting 'text' in the trace "
            "and 'textposition' to display values on the bars/points. "
            "For bar charts, use 'textposition': 'auto' or 'outside'. "
            "For scatter plots, use 'mode': 'markers+text'. "
            'Example: \'{"data": [{"type": "bar", "x": ["A", "B"], "y": [10, 20], '
            '"text": [10, 20], "textposition": "auto"}], '
            '"layout": {"title": "Sample Bar Chart"}}\''
        ),
    )


    @field_validator('sql_query')
    @classmethod
    def format_sql_query(cls, v: str | None) -> str | None:
        """Format SQL query by removing common leading whitespace."""
        if v is None:
            return None
        return sqlparse.format(v, reindent=True, keyword_case='upper', indent_width=4).strip()

    def get_message(self) -> str:
        """Get a user-friendly message summarizing the agent's response."""
        message = self.summary
        if self.sql_query:
            message += f'\n\nSQL Query Executed:\n```sql\n{self.sql_query}\n```'
        return message


def get_system_prompt(dialect: str, top_k: int = 5) -> str:
    """Get the system prompt for the agent.

    Args:
        dialect (str): The SQL dialect of the database.
        top_k (int): The maximum number of results to return.

    Returns:
        str: The system prompt.
    """

    system_prompt = """
    You are an agent designed to interact with a SQL database.
    Given an input question, create a syntactically correct {dialect} query to run,
    then look at the results of the query and return the answer. Unless the user
    specifies a specific number of examples they wish to obtain, always limit your
    query to at most {top_k} results.

    You can order the results by a relevant column to return the most interesting
    examples in the database. Never query for all the columns from a specific table,
    only ask for the relevant columns given the question.

    You MUST double check your query before executing it. If you get an error while
    executing a query, rewrite the query and try again.

    DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
    database.

    To start you should ALWAYS look at the tables in the database to see what you
    can query. Do NOT skip this step.

    Then you should query the schema of the most relevant tables.
    """

    return system_prompt.format(
        dialect=dialect,
        top_k=top_k,
    )


# Load LLM model
llm = ChatGoogleGenerativeAI(model="models/gemini-flash-latest")

# Load SQLite database and get SQL toolkit
db_engine = create_engine('sqlite:///hr_synthetic_database.db')
db = SQLDatabase(engine=db_engine)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
sql_tools = toolkit.get_tools()
tools = [calculator] + sql_tools

# Get system prompt
system_prompt = get_system_prompt(dialect=db.dialect, top_k=5)


@cl.set_starters
async def set_starters(user: cl.User | None = None) -> list[cl.Starter]:
    """Set the starters for the chat application."""
    return [
        cl.Starter(
            label='Headcount by Business Unit',
            message='Show me the headcount by business unit.',
        ),
        cl.Starter(
            label='Headcount by Gender',
            message='Show me the headcount by gender.',
        ),
        cl.Starter(
            label='Headcount by Generation',
            message='Show me the headcount by generation.',
        ),
        cl.Starter(
            label='Average Salary by Job Title',
            message='What is the average salary for each job title?',
        ),
        cl.Starter(
            label='Total Compensation by Department',
            message='What is the total compensation for each department?',
        ),
    ]


@cl.on_chat_start
async def on_chat_start():
    """Handle the chat start event."""
    agent = create_agent(
        model=llm,
        tools=tools,
        system_prompt=system_prompt,
        response_format=AgentOutput,
    )
    cl.user_session.set('agent', agent)


@cl.on_message
async def on_message(msg: cl.Message):
    """Handle the message event."""

    # Load the agent from the user session
    print('Retrieving agent from user session.')
    agent = cl.user_session.get('agent')
    if not isinstance(agent, Pregel):
        print('Failed to retrieve a valid agent from user session.')
        await cl.Message(content='Agent not initialized.').send()
        raise ValueError('Agent not initialized.')

    config = RunnableConfig(
        configurable={'thread_id': cl.context.session.thread_id},
        recursion_limit=40,
        # callbacks=[cl.LangchainCallbackHandler()],
    )

    # Get response from the agent
    print("Invoking agent with user's message.")
    response = agent.invoke({'messages': [HumanMessage(content=msg.content)]}, config=config)
    print({'response': response})
    response = AgentOutput.model_validate(response['structured_response'])

    elements = []
    if response.plotly_json_fig:
        fig = pio.from_json(response.plotly_json_fig, skip_invalid=True)
        fig.update_layout(template='plotly_dark')
        elements.append(cl.Plotly(name='plot', figure=fig, display='inline'))
    if response.dataset:
        df = pd.DataFrame(**json.loads(response.dataset))
        elements.append(cl.Dataframe(name='DataFrame', data=df, display='inline'))
    print({"elements": elements})
    await cl.Message(content=response.get_message(), elements=elements or None).send()

### Run Chainlit App

In the next cell, we will run the Chainlit application using the command `chainlit run app.py -w`.

In [None]:
!chainlit run app.py -w &>/content/logs.txt &

### Start pyngrok Tunnel

With pyngrok, we can expose our local Chainlit app to the internet, allowing others to access it via a public URL.

You need to replace the placeholder `YOUR_NGROK_AUTH_TOKEN` with your actual ngrok authentication token to enable the tunnel. 

To achieve this, access https://dashboard.ngrok.com/get-started/your-authtoken, sign up or log in, and copy your Authtoken.

**Reference:** [Integration Examples - Google Colab — pyngrok documentation](https://pyngrok.readthedocs.io/en/latest/integrations.html#google-colaboratory)

In [None]:
!ngrok config add-authtoken YOUR_NGROK_AUTH_TOKEN

In [None]:
from pyngrok import ngrok


ngrok_tunnel = ngrok.connect(8000)
print('Public URL:', ngrok_tunnel.public_url)

### Kill ngrok Tunnel

In the next cell, we will stop the ngrok tunnel to close public access to our Chainlit app.

In [36]:
# Get user confirmation before killing ngrok
confirm = input("Are you sure you want to stop the ngrok tunnel? (yes/y): ")
if confirm.lower() in ['yes', 'y']:
    ngrok.kill()
    print("Ngrok tunnel has been stopped.")
else:
    print("Ngrok tunnel is still running.")

Ngrok tunnel is still running.


## 🎉 Lab Complete

Congratulations! You've successfully completed the **People Analytics Agent** workshop! 🎊

### What You've Accomplished

Throughout this notebook, you've built a sophisticated AI-powered analytics system from scratch:

- ✅ **Loaded a synthetic HR database** with realistic employee, compensation, and organizational data
- ✅ **Created a SQLite database** optimized for fast querying and analysis
- ✅ **Built an intelligent agent** using LangChain and Google's Gemini model
- ✅ **Equipped the agent with tools** for SQL queries and mathematical calculations
- ✅ **Defined structured outputs** with summaries, SQL queries, datasets, and visualizations
- ✅ **Tested the agent** with real-world people analytics questions
- ✅ **Created a Chainlit web application** for interactive conversational analytics
- ✅ **Deployed the app** with ngrok for public access

### Key Takeaways

- **Agentic AI workflows** combine reasoning (LLMs) with actions (tools) to solve complex problems autonomously
- **Structured outputs** ensure consistent, high-quality responses that combine explanation with data
- **SQL + AI integration** enables natural language querying of databases without manual SQL writing
- **Visualization integration** makes insights more accessible and actionable
- **Interactive interfaces** (like Chainlit) democratize data analytics for non-technical users

You now have the foundation to build intelligent data analysis agents for various domains beyond HR—from finance to operations to customer analytics!

Check out the [Governing AI Agents - DeepLearning.AI](https://www.deeplearning.ai/short-courses/governing-ai-agents/) course that also covers HR Analytics Agent in the Databricks ecosystem.

**Well done! 🚀**