agent-evaluation

🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems

testing qa benchmark machine-learning evaluation chaos artificial-intelligence chaos-monkey testing-tools awesome-list quality-assurance ai-safety ai-agents chaos-engineering llm llm-evaluation agentic-ai ai-benchmark agent-evaluation

Updated May 28, 2025

ahsanblock / NVIDIA-AgentIQ-Agents-Evaluator

Star

Visual dashboard to evaluate multi-agent & RAG-based AI apps. Compare models on accuracy, latency, token usage, and trust metrics - powered by NVIDIA AgentIQ

nvidia multi-agent-systems model-comparison production-ai rag streamlit trustworthy-ai llmops genai enterprise-ai llm-evaluation open-source-ai agent-evaluation agentiq pipeline-evaluation

Updated Apr 10, 2025
Python

smuddana-7 / Cart-Pole-Gymnasium-Environment

Star

Train a reinforcement learning agent using PPO to balance a pole on a cart in the CartPole-v0 environment using Gymnasium and Stable-Baselines3. Includes model training, evaluation, and rendering using Python and Jupyter Notebook.

reinforcement-learning python3 agent-evaluation openai-gymnasium cartpole-v0-environment

Updated Jun 2, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the agent-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the agent-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-evaluation

Here are 7 public repositories matching this topic...

Giskard-AI / giskard

truera / trulens

mozilla-ai / any-agent

shiragannavar / Testing-RAG

chaosync-org / awesome-ai-agent-testing

ahsanblock / NVIDIA-AgentIQ-Agents-Evaluator

smuddana-7 / Cart-Pole-Gymnasium-Environment

Improve this page

Add this topic to your repo