# <a id="top">Lab 1: Contextual Grounding with Agentic AI Tools</a>
## Mitigating Hallucinations Through RAG and Bedrock Guardrails

In this notebook, we will explore how to reduce LLM hallucinations using **Retrieval-Augmented Generation (RAG)** and **AWS Bedrock Guardrails**. We'll demonstrate how providing relevant context to LLMs and validating their outputs can significantly improve factual accuracy - and how the RAG pattern relates to more general tool use in **AI Agents**.

##### Notebook Kernel
Please choose `Python3` as the kernel type at the top right corner of the notebook if that does not appear by default.

<div style="border: 4px solid coral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px">
    <h4>üí° Key Learning Objectives</h4>
    <ul>
        <li>Understand why LLMs hallucinate without proper context</li>
        <li>Learn how RAG (Retrieval-Augmented Generation) reduces hallucinations</li>
        <li>Integrate MCP (Model Context Protocol) servers with Strands agents</li>
        <li>Configure AWS Bedrock Guardrails for contextual grounding checks</li>
        <li>Build production-ready agents with automatic hallucination detection</li>
    </ul>
</div>
<br/>

## Use-Case Overview

**The Problem:** LLMs are trained on static data and cannot access real-time or proprietary information. When asked about specific details they don't know, they often "hallucinate" plausible-sounding but incorrect answers.

**The Solution:** By combining:
- **RAG (Retrieval-Augmented Generation)**: Provide relevant context from authoritative sources
- **Contextual Grounding Checks**: Validate that responses are supported by the provided context

We can dramatically reduce hallucinations and improve factual accuracy.


## Sections

This notebook has the following sections:

1. [Dependencies and Setup](#1.-Dependencies-and-Setup)
2. [Baseline: LLMs Without Context (Hallucinations)](#2.-Baseline:-LLMs-Without-Context-(Hallucinations))
3. [RAG Helps: Adding Context with MCP Servers](#3.-RAG-Helps:-Adding-Context-with-MCP-Servers)
4. [Bedrock Guardrails: Contextual Grounding Validation](#4.-Bedrock-Guardrails:-Contextual-Grounding-Validation)
5. [Production Integration: Agents with Automatic Detection](#5.-Production-Integration:-Agents-with-Automatic-Detection)
6. [Conclusion and Best Practices](#6.-Conclusion-and-Best-Practices)

Please work from top to bottom and don't skip sections as this could lead to error messages due to missing dependencies.

----

## 1. Dependencies and Setup
(<a href="#top">Go to top</a>)

**If you haven't already** installed the workshop's dependencies (from [pyproject.toml](./pyproject.toml)), you can un-comment (remove `# `) and run the below cell to do so. We've commented it out by default, assuming you already ran it at the start of lab 0:

In [None]:
# %pip install -e .

<div style="border: 4px solid coral; text-align: left; margin: auto; padding-left: 20px; padding-right: 20px; padding-top: 15px;">
    <h4>üîÑ Restart the kernel after installing</h4>
    <p>
        <strong>IF</strong> you ran the above install command cell, you'll need to restart the
        notebook kernel afterwards for the installations to take full effect.
    </p>
    <p>
        Note that you may see some error notices about dependency conflicts in SageMaker Studio
        environments, but this is okay as long as the installations are completed.
    </p>
</div>
<br/>

With the installation complete, you're ready to import the libraries we'll use in the notebook and set up some basic configurations including:

- **Langfuse:** See the ["Getting started" section](https://catalog.us-east-1.prod.workshops.aws/workshops/1fa309f2-c771-42d5-87bc-e8f919e7bcc9/en-US/10-setup/03-langfuse) of the accompanying workshop guide for how to find your Langfuse keys and host name
    - If you chose not to set up Langfuse, you can ignore the error here or comment-out the `set_up_notebook_langfuse()` call
- **Model:** We'll use an API-based foundation model on Amazon Bedrock for this lab.

> ‚ö†Ô∏è **Note:** To avoid re-prompting you after every notebook or kernel restart, the `set_up_notebook_langfuse()` utility will store your Langfuse credentials **unencrypted** in a local file called `.env` (hidden in JupyterLab folder explorer, but visible and deletable via the terminal).

In [None]:
%load_ext autoreload
%autoreload 2

# External Dependencies:
import boto3
from mcp import StdioServerParameters, stdio_client
from mcp.client.streamable_http import streamablehttp_client
from strands import Agent
from strands.types.exceptions import EventLoopException
from strands.models.bedrock import BedrockModel
from strands.tools.mcp import MCPClient
from strands_tools import calculator

# Local Dependencies:
from hallucination_utils.bedrock_guardrails import (
    BedrockGuardrailHook,
    GuardrailFailedError,
)
from hallucination_utils.mcp import get_aws_mcp_env, mcp_all_tools
from hallucination_utils.tracing import set_up_notebook_langfuse

set_up_notebook_langfuse(
    ## Un-comment the below to force asking for the credentials again:
    # refresh=True
)

In [None]:
model = BedrockModel(
    model_id="global.anthropic.claude-haiku-4-5-20251001-v1:0",
)

## 2. Baseline: LLMs Without Context (Hallucinations)
(<a href="#top">Go to top</a>)

First, let's demonstrate an example of hallucination when an LLM is asked a question without access to external knowledge. We'll create a basic agent and ask it a specific question about AWS SageMaker that requires up-to-date documentation.

**Desired Behavior**: The correct answer to the below question is **v2.98.0+** as stated on [this page](https://docs.aws.amazon.com/sagemaker/latest/dg/train-heterogeneous-cluster.html).

**Expected Hallucination**: The model should here provide a plausible-sounding but incorrect answer because it lacks access to the specific documentation needed.

In [None]:
agent = Agent(model=model)

In [None]:
agent("What version of SageMaker SDK do I need to start heterogeneous training jobs?")

## 3. RAG Helps: Adding Context with MCP Servers
(<a href="#top">Go to top</a>)

In this section, you'll build an intelligent agent using the [Strands Agents SDK](https://strandsagents.com/latest/) (an open-source framework for building agentic AI applications) and [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro).

> **Note**: This lab covers contextual grounding using MCP tool outputs as reference sources. Lab 4 (optional) demonstrates traditional RAG-based contextual grounding using Amazon Bedrock Knowledge Bases.

### Understanding RAG

[Retrieval-Augmented Generation (RAG)](https://aws.amazon.com/what-is/retrieval-augmented-generation/) reduces LLM hallucinations by grounding responses in trusted sources. Instead of relying solely on the model's training data, we:

1. **Retrieve** relevant documents from a knowledge base based on the user's query
2. **Augment** the LLM prompt with retrieved context alongside the original question
3. **Generate** an answer grounded in the provided sources

Traditional RAG requires building and maintaining a static knowledge base through data ingestion pipelines, vectorization, and indexing.

### Why MCP over Traditional RAG?

MCP extends the RAG pattern with dynamic, standardized tool access:

- **Zero setup**: Pre-built tools (e.g., AWS Documentation MCP) work immediately‚Äîno ingestion pipeline required
- **Live data**: Real-time access to APIs and documentation vs. static snapshots
- **Broader ecosystem**: Beyond documents‚ÄîAPIs, databases, calculators, and custom tools
- **Dynamic routing**: The LLM selects which tools to invoke based on the query

### MCP as a Standard

[MCP](https://modelcontextprotocol.io/docs/getting-started/intro) provides a standardized interface for [AI Agents](https://aws.amazon.com/what-is/ai-agents/) to connect with multiple knowledge sources and APIs, enabling tool reusability across different agent frameworks.

---

Below, let's connect our agent to real-time AWS Documentation search using the [AWS Documentation MCP Server from AWSLabs](https://github.com/awslabs/mcp). With this access, the agent should be able to look up the correct **v2.98.0+** answer and reflect that in its response:

In [None]:
aws_docs_mcp = MCPClient(
    lambda: streamablehttp_client(url="https://knowledge-mcp.global.api.aws")
)

with mcp_all_tools(aws_docs_mcp) as tools:
    rag_agent = Agent(
        model=model,
        tools=tools,
    )
    rag_agent(
        "What version of SageMaker SDK do I need to start heterogeneous training jobs?"
    )

### RAG Improves Accuracy, But Isn't Perfect

While RAG significantly reduces hallucinations, errors can still occur:

- **Relevance Issues**: The agent might retrieve relevant content but generate an answer that doesn't actually address the user's specific question
- **Grounding Issues**: The agent might generate claims that aren't fully supported by the retrieved documentation

Let's consider a more complex query:

> *"By default, Can an AgentCore agent have more aliases than I can count on one hand?"*

By default at the time of writing, this limit is 10 as documented [here in the Developer Guide](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/bedrock-agentcore-limits.html#runtime-service-limits). The model and agent we're using will generally give a confident "yes" based on an **assumption** that you, the user, can count to exactly 5 on one hand - predicated on you having four fingers and a thumb, and you **not** being familiar with alternative systems that can count to 10 or more (like [Chinese](https://en.wikipedia.org/wiki/Chinese_number_gestures) or [other systems](https://en.wikipedia.org/wiki/Finger-counting#By_country_or_region)).

These assumptions may be reasonable for many contexts, but still provides a basic toy example for ways that vagueness or assumed external knowledge in the question can re-introduce risk of hallucination issues in Retrieval-Augmented systems:

In [None]:
with mcp_all_tools(aws_docs_mcp) as tools:
    rag_agent = Agent(
        model=model,
        tools=tools,
    )
    rag_agent("By default, Can an AgentCore agent have more aliases than I can count on one hand?")

## 4. Bedrock Guardrails: Contextual Grounding Validation
(<a href="#top">Go to top</a>)

While MCP tools provide rich information access, we need to ensure responses remain factually grounded and relevant. To address the remaining accuracy issues, you'll implement Amazon Bedrock Guardrails with contextual grounding checks that validate against tool-provided context.

### Contextual Grounding Beyond Traditional RAG

Contextual grounding extends beyond static document retrieval:

- **Tool-based grounding**: Use information retrieved by MCP tools as grounding sources
- **Multi-source validation**: Combine multiple tool outputs for comprehensive grounding
- **Dynamic context**: Grounding sources change based on which tools the agent invokes
- **Real-time validation**: Ensure responses reflect the most current information

### How Bedrock Guardrails Works

Amazon Bedrock Guardrails validates responses across two dimensions:

1. **Grounding**: The model's response is factually supported by tool-provided information
2. **Relevance**: The response directly addresses the user's query

Guardrails analyzes both the retrieved context and the model's response, assigning confidence scores for each dimension. If either score falls below your configured threshold, the guardrail blocks the response.

---

Let's create a guardrail with contextual grounding enabled:

In [None]:
guardrail_name = "context-guardrail"
print(f"guardrail_name: {guardrail_name}")

bedrock = boto3.client("bedrock")
try:
    create_guardrail_resp = bedrock.create_guardrail(
        name=guardrail_name,
        description="Guardrail for AWS assistant with contextual grounding",
        contextualGroundingPolicyConfig={
            "filtersConfig": [
                {
                    "type": "GROUNDING",
                    "threshold": 0.8,  # 80% confidence threshold for grounding
                },
                {
                    "type": "RELEVANCE",
                    "threshold": 0.8,  # 80% confidence threshold for relevance
                },
            ]
        },
        blockedInputMessaging="Sorry I cannot help with this request. Please ask a question about AWS",
        blockedOutputsMessaging="Sorry, I wasn't able to generate you a confident answer on this topic.",
    )

    guardrail_id = create_guardrail_resp["guardrailId"]
    guardrail_version = create_guardrail_resp["version"]

    print(f"‚úì Created guardrail ID: {guardrail_id}")

    ## Note that you can version-control your guardrails, but for the purposes of this workshop
    ## we'll skip that:
    # version_response = bedrock.create_guardrail_version(
    #     guardrailIdentifier=guardrail_id,
    #     description="Initial version with contextual grounding"
    # )
    # guardrail_version = version_response['version']
    guardrail_version = "DRAFT"

except bedrock.exceptions.ConflictException as e:
    found = False
    for page in bedrock.get_paginator("list_guardrails").paginate():
        for guardrail in page["guardrails"]:
            if guardrail["name"] == guardrail_name:
                found = True
                guardrail_id = guardrail["id"]
                guardrail_version = "DRAFT"
                print(f"WARNING - Pre-existing guardrail ID {guardrail_id}")
                break
        if found:
            break
    if not found:
        raise ValueError(
            "Got ConflictExpression but couldn't find matching guardrail"
        ) from e

print(f"‚úì Using guardrail version: {guardrail_version}")

You can also find your created guardrail and explore other configuration options in the [AWS Console for Amazon Bedrock Guardrails](https://console.aws.amazon.com/bedrock/home?#/guardrails).

## 5. Production Integration: Agents with Automatic Detection
(<a href="#top">Go to top</a>)

In this section, you'll integrate guardrails with Strands agents using custom hooks. Since direct BedrockModel-based integration doesn't work with all agent frameworks, you'll implement wrapper patterns that apply contextual grounding checks to tool outputs.

### Custom Integration Approach

We've created a custom [Strands Hook](https://strandsagents.com/latest/documentation/docs/api-reference/hooks/#) (callback) implementation that:

- **Wraps agent responses**: Intercepts responses before they reach the user
- **Applies contextual grounding**: Uses MCP tool outputs as grounding sources for validation
- **Validates concurrently**: Checks both grounding and relevance against configured thresholds
- **Blocks hallucinations**: Raises an exception if either check fails, preventing ungrounded responses from reaching users

You can explore the implementation in the [`hallucination_utils`](./hallucination_utils) folder.

### Practical Implementation

You'll build an agent that:

1. **Configures AWS Documentation MCP server** for immediate use without ingestion
2. **Answers AWS questions** using live documentation via MCP tools
3. **Validates responses** with contextual grounding checks using tool outputs as reference sources
4. **Tests and optimizes** grounding and relevance thresholds across various query types

If our agent generates a response that fails the grounding or relevance checks, the guardrail will raise an exception - preventing the hallucinated response from reaching the user.

In [None]:
with mcp_all_tools(aws_docs_mcp) as tools:
    guarded_agent = Agent(
        model=model,
        tools=tools,
        hooks=[
            BedrockGuardrailHook(
                guardrail_id=guardrail_id,
                guardrail_version=guardrail_version,
            )
        ],
    )
    try:
        guarded_agent("By default, Can an AgentCore agent have more aliases than I can count on one hand?")
    except (GuardrailFailedError, EventLoopException) as e:
        e_original = getattr(e, "original_exception", e)
        if not isinstance(e_original, GuardrailFailedError):
            raise e  # Some other EventLoopException not caused by Guardrail
        print("Guardrail intervened!")
        print(e_original.guardrail_response.model_dump_json(indent=2))

Because we set the guardrail thresholds quite high, you should see the guardrail intervene in this example - usually due to `GROUNDING` because of the external assumptions introduced by the question.

If you've enabled Langfuse tracing, you can open up the most recent trace in the Langfuse UI to further explore the detailed findings of the Bedrock Guardrail API operation that intervened in the generation, as shown below. In this case, we've set up our hook to log the full details provided by the [Bedrock ApplyGuardrail API](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-use-independent-api.html).

![](./img/langfuse-lab1.png "Screenshot of Langfuse web UI showing the details of a recorded trace. After an initial event loop cycle, the agent has errored due to Bedrock Guardrail intervening. In the span details pane on the right hand side, additional details of the Bedrock ApplyGuardrail API response are available")

You can also refer to the messages history of the agent, which does *not* include the final output message that triggered the guardrail intervention:

In [None]:
guarded_agent.messages

üéØ **A small challenge:** Can you think of any other example questions to illustrate this guardrail in action? Some scenarios to explore could include:
- Distraction to produce **irrelevant** responses by questions that *seem* to overlap with material from the AWS documentation (due to similar names, etc), but were actually asking about something else
- Causing **ungrounded** responses by asking to relate AWS information to other details that are *not* in the documentation visible to the agent - like general world knowledge, out-of-scope technologies, or even competitor services?

## 6. Conclusion and Best Practices
(<a href="#top">Go to top</a>)

### What We Learned

In situations where trusted source data is available, inserting relevant content into the input prompt greatly reduces the likelihood of LLM hallucinations. This "Retrieval-Augmented Generation" pattern can be used in deterministic flows, but also applied to AI Agents which might work with multiple knowledge repositories or other API tools and decide when to call each one.

However, **RAG isn't perfect**: Our knowledgebase might not contain relevant information to answer a question, or the wording of the content or question could make it difficult for search engines to retrieve the necessary data. In these cases:
1. The LLM might generate answers that are **irrelevant** to the user's actual question - distracted by irrelevant search results
2. We might still generate **ungrounded** answers that aren't fully faithful to the source content - especially if its relationship to the question is complex.

To address these risks, contextual grounding checks like those provided in Amazon Bedrock Guardrails provide an extra layer of safety for high-performing RAG-like systems - including AI Agents and MCP.


### Best Practices for Production

When building production agents with hallucination mitigation:

- ‚úÖ **Always use RAG** for domain-specific or up-to-date information
- ‚úÖ **Add guardrails** for high-stakes applications where accuracy is critical
- ‚úÖ **Configure appropriate thresholds** based on your use case (stricter for medical/legal, more relaxed for general queries)
- ‚úÖ **Monitor guardrail interventions** to identify areas where your knowledge base may need improvement
- ‚úÖ **Provide fallback responses** when guardrails block a response, guiding users on how to rephrase their query

### Additional Guardrail Features

Beyond contextual grounding, AWS Bedrock Guardrails also support:
- **Content filtering**: Block harmful, inappropriate, or sensitive content
- **PII detection**: Prevent leakage of personally identifiable information
- **Topic blocking**: Restrict conversations to approved topics
- **Custom word filters**: Block specific terms or phrases

Explore the [Bedrock Guardrails documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html) to learn more about these features.