Agentic RAG - Web Search with Accuracy and Hallucination Controls

Boost RAG with an Agentic Layer

Route: Checks the RAG context for relevance to the query and adds live web search if the context is thin
Evaluate: Checks responses for relevance and accuracy, flags hallucinations
Iterate: Goes through multiple evaluation and generation cycles

Table Extraction and Validation (Extended)

Extract: Uploads DOCX/PDF documents and converts each table to CALS XML, preserving spans, column widths, and cell text
Validate Content: Cross-checks every extracted cell value against two independent PDF parsers (pdfplumber + Camelot) and marks cells verify="ok" or "unconfirmed"
Annotate Styles: Sends original page images to a local vision LLM (Qwen2.5-VL via LM Studio) to detect bold formatting and indent levels, capturing the visual hierarchy of the source document
Compare Snapshots: (Planned) TEDS-based tree edit distance comparison between the original snapshot and any re-transformed output (re-exported PDF, HTML, iXBRL)

Modify Agentic RAG

Edit Prompts: Customize results through your own prompts
Change Parameters: Adjust agent behavior through parameters and runtime variables
Look and Feel: Change the agent and UI by editing the code yourself

Inference Your Way

Free Endpoints: use free endpoints on build.nvidia.com
Self-Hosted: Point to Ollama or NIM on your own GPUs
Local VLM: Point to a self-hosted LM Studio instance for offline vision LLM annotation

Get Started

This README has three modes:

Easy Mode: Use the application
Intermediate Mode: Modify the application
Advanced Mode: Self-host GPUs for inference

Prerequisites - AI Workbench and an Internet Connection

You can run Agentic RAG without Workbench, but this README requires NVIDIA AI Workbench installed. See how to install it here.

You need internet because Agentic RAG uses an NVIDIA endpoint for document embedding.

Table extraction and VLM annotation work fully offline once LM Studio is running locally — no NVIDIA API key required for those features.

Easy Mode (< 5 minutes if Workbench installed)

Get NVIDIA and Tavily API keys:
- NVIDIA_API_KEY → Generate See instructions here.
- TAVILY_API_KEY → Generate
Clone this repo with AI Workbench > configure the keys when prompted.
Click Open Chat > Go to the Document tab in the web app > Click Add to Context.
Type in your question > Hit enter - answers come from free cloud endpoints.

Using the Table Browser (optional)

Upload a DOCX containing financial or structured tables via the Document tab.
Switch to the Table Browser tab to inspect the extracted CALS XML, interactive HTML rendering, and per-cell verification status.
Click Re-annotate with VLM to run Qwen2.5-VL (via a local LM Studio instance at http://localhost:1234/v1) and write bold and indent attributes onto each cell.
Annotated snapshots are persisted to data/table_catalog.json after each table so progress survives interruption.

LM Studio requirement: launch LM Studio with the --no-sandbox flag and load the Qwen2.5-VL-72B model before clicking Re-annotate.

Details for the README Modes

Click to Expand Easy Mode

Clone Project > Start Chat > Create Context > Ask Questions

Steps	What can go wrong	Screen shot
1. Open the Desktop App > Select local.	Probably a Docker Desktop issue (if selected on install). Fix: See troubleshooting here
2. Click Clone Project > Paste repository URL > Clone	Incorrect URL. Fix: use the correct URL.
3. Click Resolve Now > Enter NVIDIA and Tavily API keys.	You don't see the banner. Fix: go to Project Container > Variables > Configure for API keys. See docs here
4. Click Open Chat.	Very little can go wrong here
5. Click Documents > Create Context.	Incorrect API key. Fix per Step 3 above.
6. Type question > Hit enter.	Incorrect API key. Fix per Step 3 above.

Clear Context > Change URLs > Create Context > Ask Questions

Use these steps when you want to work with your own documents and your own prompts.

Steps	What can go wrong	Screen shot
1. Click Documents > Clear Context.	Very little.	Vector DB reset.
2. Delete the URLs > Add your own > Click Add to Context.	URLs that can't be resolved. Fix: Enter appropriate URLs	New context.
3. Type question > Hit enter.	Incorrect API key. Fix: Fix per Step 3 in table above.	Triggers the agent.

Click to Expand Intermediate Mode

Intermediate Mode

See Full Intermediate Mode Instructions Here

This application is a quick prototype and not a robust piece of software. So there are many opportunities to improve it.

Fork this project to your own GitHub account. Then clone it in Workbench
Add VS Code to the project
Create an experiment branch to protect main
Open VS Code from the Desktop App and edit the application code
- Change recursion limit, number of web sites returned by Tavily, whether previous searches are saved
- Add new endpoints from build.nvidia.com
- Change the look and feel of the Gradio app or add new features
- Modify the agent
- Extend the table extraction pipeline in code/chatui/utils/database.py:
  - _load_docx_direct() — DOCX → CALS XML with spans and column widths
  - verify_table() — cross-checks cell values against pdfplumber + Camelot
  - _annotate_entry_styles_with_vlm() — VLM bold/indent detection
  - _cals_to_fop_pdf() / _cals_to_interactive_html() — table rendering
- See agentic-rag-docs/table-validation-approach.md for the full validation design
- Fix any bugs you find

Click to Expand Advanced Mode

Advanced Mode

See Full Advanced Mode Instructions Here.

Use these details if you want to modify the application, e.g. by configuring prompts, adding your own endpoints, changing the Gradio app or whatever else occurs to you.

Set up a Linux box with an NVIDIA GPU and Docker.
Deploy an Ollama container or an NVIDIA NIM on that host.
Configure the chat app to use the self-hosted endpoint.

Self-hosting the VLM for table annotation

Install LM Studio on a machine with a compatible GPU.
Load model Qwen2.5-VL-72B (or any OpenAI-compatible vision model).
Start the local server: ./LM-Studio-*.AppImage --no-sandbox and enable the API server at port 1234.
The app auto-detects the first available model via client.models.list(); no extra configuration required.

License

This NVIDIA AI Workbench example project is under the Apache 2.0 License

This project may utilize additional third-party open source software projects. Review the license terms of these open source projects before use. Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components. You are responsible for confirming compliance with third-party component license terms and requirements.

❓ Have Questions?
Please direct any issues, fixes, suggestions, and discussion on this project to the DevZone Members Only Forum thread here

Other Resources

⬇️ Download AI Workbench | 📖 User Guide |📂 Other Projects | 🚨 User Forum

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.project		.project
agentic-rag-docs		agentic-rag-docs
code		code
models		models
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
apt.txt		apt.txt
compose.yaml		compose.yaml
key		key
postBuild.bash		postBuild.bash
preBuild.bash		preBuild.bash
requirements.txt		requirements.txt
start_app.sh		start_app.sh
variables.env		variables.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic RAG - Web Search with Accuracy and Hallucination Controls

Boost RAG with an Agentic Layer

Table Extraction and Validation (Extended)

Modify Agentic RAG

Inference Your Way

Get Started

This README has three modes:

Prerequisites - AI Workbench and an Internet Connection

Easy Mode (< 5 minutes if Workbench installed)

Using the Table Browser (optional)

Details for the README Modes

Clone Project > Start Chat > Create Context > Ask Questions

Clear Context > Change URLs > Create Context > Ask Questions

Intermediate Mode

See Full Intermediate Mode Instructions Here

Advanced Mode

See Full Advanced Mode Instructions Here.

Self-hosting the VLM for table annotation

License

Other Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG - Web Search with Accuracy and Hallucination Controls

Boost RAG with an Agentic Layer

Table Extraction and Validation (Extended)

Modify Agentic RAG

Inference Your Way

Get Started

This README has three modes:

Prerequisites - AI Workbench and an Internet Connection

Easy Mode (< 5 minutes if Workbench installed)

Using the Table Browser (optional)

Details for the README Modes

Clone Project > Start Chat > Create Context > Ask Questions

Clear Context > Change URLs > Create Context > Ask Questions

Intermediate Mode

See Full Intermediate Mode Instructions Here

Advanced Mode

See Full Advanced Mode Instructions Here.

Self-hosting the VLM for table annotation

License

Other Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages