Skip to content

This repository demonstrates research-backed techniques for preventing agent failures with working code examples.

License

Notifications You must be signed in to change notification settings

aws-samples/sample-why-agents-fail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ€– Why AI Agents Fail (And How to Fix Them)

GitHub stars License Python AWS Strands

Research-backed solutions to the three critical failure modes that break AI agents in production: hallucinations, timeouts, and memory loss.

⭐ Star this repository


🎯 Learning Path: Understand β†’ Prevent β†’ Scale

This repository demonstrates research-backed techniques for preventing AI agent failures with working code examples.

🚨 Failure Mode πŸ’‘ Solution Approach πŸ“Š Projects ⏱️ Total Time
Hallucinations Detection and mitigation through 4 techniques 4 demos 2 hours
Timeouts Context management and async patterns Coming soon -
Memory Loss Persistent memory and context retrieval Coming soon -

🎭 Stop AI Agent Hallucinations

The Problem: Agents fabricate statistics, choose wrong tools, ignore business rules, and claim success when operations fail.

The Solution: 4 research-backed techniques that detect, contain, and mitigate hallucinations before they cause damage.

πŸ““ Hallucination Prevention Demos

πŸ““ Demo 🎯 Focus & Key Learning ⏱️ Time πŸ“Š Level
01 - Graph-RAG vs Traditional RAG Structured data retrieval - Compare RAG vs Graph-RAG on 300 hotel FAQs, Neo4j knowledge graph with auto entity extraction, eliminate statistical hallucinations 30 min Intermediate
02 - Semantic Tool Selection Intelligent tool filtering - Filter 31 tools to top 3 relevant, reduce errors and token costs, dynamic tool swapping 45 min Intermediate
03 - Multi-Agent Validation Pattern Cross-validation workflows - Executor β†’ Validator β†’ Critic pattern catches hallucinations, Strands Swarm orchestration 30 min Intermediate
04 - Neurosymbolic Guardrails for AI Agents Symbolic validation - Compare prompt engineering vs symbolic rules, business rule compliance, LLM cannot bypass 20 min Intermediate

πŸ“Š Key Results

🎯 Technique πŸ“ˆ Improvement πŸ” Metric
Graph-RAG Accuracy Precise queries on 300 hotel FAQs via knowledge graph
Semantic Tool Selection Reduce errors and token costs Tool selection hallucination detection (research validated), Token cost per query
Neurosymbolic Rules Compliance Business rule enforcement - LLM cannot bypass
Multi-Agent Validation Detects errors Invalid operation detection before reaching users

β†’ Explore hallucination prevention demos


Why Your Agent Times Out

(Coming soon)


Your Agent Doesn't Remember You

(Coming soon)


πŸ”§ Technologies Used

Details
πŸ”§ Technology 🎯 Purpose ⚑ Key Capabilities
Strands Agents AI agent framework Dynamic tool swapping, multi-agent orchestration, conversation memory, hooks system
Amazon Bedrock LLM access Claude 3 Haiku/Sonnet for agent reasoning and tool calling
Neo4j Graph database Relationship-aware queries, precise aggregations, multi-hop traversal
FAISS Vector search Semantic similarity, tool filtering, efficient nearest neighbor search
SentenceTransformers Embeddings Text embeddings for semantic tool selection and memory retrieval

Prerequisites

Before You Begin:

  • Python 3.9+ installed locally
  • LLM access: OpenAI (default), AWS Bedrock, Anthropic, or Ollama
  • OPENAI_API_KEY environment variable (for default setup)
  • AWS CLI configured if using Bedrock (aws configure)
  • Basic understanding of AI agents and tool calling

Model Configuration: All demos use OpenAI with GPT-4o-mini by default. You can swap to any provider supported by Strands β€” see Strands Model Providers for configuration.

AWS Credentials Setup (if using Bedrock): Follow the AWS credentials configuration guide to configure your environment.


πŸš€ Quick Start Guide

1. Clone Repository

git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-fail

2. Start with Hallucinations

cd stop-ai-agent-hallucinations

3. Explore All Techniques

Each demo folder contains detailed README files and working code examples.


πŸ’° Cost Estimation

πŸ’° Service πŸ’΅ Approximate Cost πŸ“Š Usage Pattern πŸ”— Pricing Link
OpenAI GPT-4o-mini ~$0.15 per 1M input tokens Agent reasoning and tool calling OpenAI Pricing
Amazon Bedrock (Claude) ~$0.25 per 1M input tokens Alternative LLM provider Bedrock Pricing
Neo4j (local) Free Graph database for demos Neo4j Community
FAISS (local) Free Vector search library Open source
SentenceTransformers Free Local embeddings Open source

πŸ’‘ All demos can run locally with minimal costs. OpenAI GPT-4o-mini is the most cost-effective option for testing.


πŸ“– Additional Learning Resources


⭐ Star this repository β€’ πŸ“– Start Learning


🀝 Contributing

Contributions are welcome! See CONTRIBUTING for more information.


πŸ“„ License

This library is licensed under the MIT-0 License. See the LICENSE file for details.

About

This repository demonstrates research-backed techniques for preventing agent failures with working code examples.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published