Point it at a URL. It explores the app, generates a test plan, runs it, and reports failed scenarios, visual regressions, accessibility violations, and REST API contract findings.
Status: beta, live on PyPI as
sentinel-agent(latest 0.1.x). Standalone: zero runtime dependency on any other ThinkNext package. Web functional testing + self-healing + visual regression + WCAG 2.1 AA accessibility + REST API contract tests all ship today. Mobile (React Native) is planned for a future release.Install:
pip install 'sentinel-agent[anthropic]'(or[claude-code],[openai],[google],[all]). Repo: GitHub. Issues: file one.
Point Sentinel at a URL:
sentinel run https://your-app.comIn one command, the agent:
- Opens the URL in headless Chromium
- Reads the rendered HTML + visible text
- Asks the LLM to generate a focused test plan (2-5 scenarios, 3-8 steps each)
- Runs the plan in fresh browser sessions per scenario
- Captures screenshots and compares against baselines (visual regression)
- Scans each page state for WCAG 2.1 AA violations (axe-core)
- Reports findings: failed scenarios, visual diffs, accessibility issues, with cost
The same teams that need Cascade (meeting-to-PR) and Relay (issue-to-PR) need a way to verify that the PRs those agents produce actually work. Hand-writing Playwright tests for every feature is the bottleneck. Sentinel removes the bottleneck: generate tests with the same LLM that writes the code.
Sentinel is fully standalone. It carries its own LLM-client layer and config so it does not depend on any other ThinkNext package at runtime.
# Core install + the LLM provider you want:
pip install 'sentinel-agent[anthropic]' # Anthropic Claude
pip install 'sentinel-agent[openai]' # OpenAI
pip install 'sentinel-agent[google]' # Google Gemini
pip install 'sentinel-agent[claude-code]' # Local Claude Code subscription, no API key
pip install 'sentinel-agent[all]' # All providers
# One-time: install the Chromium binary Playwright needs
playwright install chromium# Set up an LLM provider. Credentials live at ~/.config/sentinel/config.yaml.
sentinel configure llm anthropic --key sk-ant-xxx --set-default
# Or, if you have Claude Code installed locally (no API key needed):
sentinel configure llm claude_code --set-defaultIf you want a project-local config (highly recommended; lets you set viewport, baseline directory, accessibility thresholds):
sentinel initThis scaffolds sentinel.yaml with sensible defaults you can edit.
sentinel run https://cascadeagent.dev
# Output (truncated):
# ✓ 3/3 scenarios passed, 0 visual diff(s), 2 a11y violation(s)
#
# ✓ Homepage loads and primary CTA is visible (1.42s)
# ✓ Get-started link navigates to /getting-started/ (1.83s)
# ✓ Docs sidebar contains all expected sections (2.10s)
#
# Accessibility violations:
# [moderate] color-contrast: Elements must meet minimum color contrast...
# sample: .text-slate-500
# (3 node(s) affected)
# [minor] image-alt: Images must have alt text...
# sample: img.hero-illustration
# (1 node(s) affected)
#
# cost: $0.04 (5,210 in / 980 out tokens)| Capability | Module |
|---|---|
| Web testing via Playwright | sentinel.browser, sentinel.runner |
| LLM-driven test plan generation | sentinel.planner |
| Self-healing tests (LLM re-plan on failed step + retry once) | sentinel.planner.regenerate_step |
| Multi-page exploration (up to 4 same-origin links) | sentinel.agent |
| Visual regression (PIL pixel diff) | sentinel.visual |
| Accessibility scan (axe-core 4.10, WCAG 2.1 AA) | sentinel.a11y |
| REST API contract testing (OpenAPI + URL-probe modes) | sentinel.api_* |
| Multi-LLM (Anthropic / OpenAI / Google / Claude Code / Ollama) | sentinel.llm |
| Mobile (React Native via Detox) | planned for a future release |
| Playwright Codegen | Pytest + Playwright | Percy / Chromatic | Sentinel | |
|---|---|---|---|---|
| Generates tests from a URL | partial (record/replay) | ❌ | ❌ | ✅ |
| Self-hosted | ✅ | ✅ | ❌ | ✅ |
| Bring your own LLM | n/a | n/a | n/a | ✅ |
| Visual regression | ❌ | ❌ | ✅ | ✅ |
| Accessibility scan | ❌ | partial (plugin) | ❌ | ✅ |
| Open source | ✅ | ✅ | ❌ | ✅ |
Sentinel is for teams who want test coverage without spending the engineering hours to author it. The trade-off is that AI-generated tests have failure modes hand-written tests do not (e.g. an LLM picks a fragile selector). The self-healing path is the answer to that: on a failed step, the runner asks the LLM for a more specific selector with the failure context and retries once.
sentinel.yaml (after sentinel init):
version: 1
agent:
provider: anthropic
model: claude-opus-4-7
temperature: 0.2
browser:
headless: true
viewport_width: 1280
viewport_height: 720
timeout_ms: 30000
visual:
enabled: true
baseline_dir: sentinel-baselines
diff_threshold_percent: 0.5
a11y:
enabled: true
fail_on:
- critical
- serious sentinel run <url>
│
▼
┌──────────────┐
│ explore page │ Playwright opens URL, grabs HTML + visible text
└──────┬───────┘
│
▼
┌──────────────┐
│ planner │ LLM produces TestPlan (2-5 scenarios, 3-8 steps each)
└──────┬───────┘
│
▼
┌──────────────┐
│ runner │ Fresh browser session per scenario
│ │ Each step is one Playwright action
│ │ screenshot steps → visual regression check
│ │ a11y_scan steps → axe-core injection
└──────┬───────┘
│
▼
┌──────────────┐
│ SentinelReport │ Scenarios + visual diffs + a11y violations + cost
└──────────────┘
| Version | Status | Highlights |
|---|---|---|
| 0.1.0a1 → 0.1.0a3 | Shipped 2026-05-26 | Web testing via Playwright, visual regression (PIL), WCAG 2.1 AA scan via axe-core, multi-page exploration, self-healing tests, REST API contract testing (OpenAPI + URL-probe modes) |
| 0.1.0 | Shipped 2026-05-26 | Standalone release: vendored own LLM client + config layer, zero runtime dependency on any other ThinkNext package. Per-provider install extras |
| 0.1.1 → 0.1.8 | Shipped 2026-05-26 | Eight dogfooding-driven patches against a real Next.js 15 app: asyncio fix for self-heal inside Playwright; wait_for_url event listener for SPA navigation (the polling approach never saw the URL update); regex support in assert_url and assert_text; url= routing for wait_for; glob-to-regex escape correctness; longer reasoning length for repair envelopes |
| v0.2 | Planned Q4 2026 | CI integration (GitHub Actions / GitLab CI / Bitbucket Pipelines / Azure Pipelines), parallel scenario execution, storage-state seeding for authenticated flows |
| v0.3 | Planned Q1 2027 | Mobile (React Native via Detox or Maestro), cross-browser (Firefox / WebKit), test-history dashboard |
| v1.0 | Planned mid-2027 | Stable API, full coverage of web + API + mobile + visual + a11y, baselined against real-world OSS apps |
Roadmap is directional. The 0.1.1 → 0.1.8 series is a strong signal that the LLM-prompt / runner contract is still being discovered in the wild; we expect more patches as Sentinel meets non-Next.js frameworks, auth-gated flows, iframes, and complex multi-step forms. File issues against the current 0.1.x line.
MIT. See LICENSE.
Built and maintained by ThinkNext Software Solutions, alongside our other open-source projects Cascade (meeting-to-PR) and Relay (issue-to-PR).
Follow along: @ThinkNextHQ · LinkedIn · Blog