Sentinel

Point it at a URL. It explores the app, generates a test plan, runs it, and reports failed scenarios, visual regressions, accessibility violations, and REST API contract findings.

Status: beta, live on PyPI as sentinel-agent (latest 0.1.x). Standalone: zero runtime dependency on any other ThinkNext package. Web functional testing + self-healing + visual regression + WCAG 2.1 AA accessibility + REST API contract tests all ship today. Mobile (React Native) is planned for a future release.

Install: pip install 'sentinel-agent[anthropic]' (or [claude-code], [openai], [google], [all]). Repo: GitHub. Issues: file one.

What it does

Point Sentinel at a URL:

sentinel run https://your-app.com

In one command, the agent:

Opens the URL in headless Chromium
Reads the rendered HTML + visible text
Asks the LLM to generate a focused test plan (2-5 scenarios, 3-8 steps each)
Runs the plan in fresh browser sessions per scenario
Captures screenshots and compares against baselines (visual regression)
Scans each page state for WCAG 2.1 AA violations (axe-core)
Reports findings: failed scenarios, visual diffs, accessibility issues, with cost

Why this exists

The same teams that need Cascade (meeting-to-PR) and Relay (issue-to-PR) need a way to verify that the PRs those agents produce actually work. Hand-writing Playwright tests for every feature is the bottleneck. Sentinel removes the bottleneck: generate tests with the same LLM that writes the code.

Sentinel is fully standalone. It carries its own LLM-client layer and config so it does not depend on any other ThinkNext package at runtime.

Install

# Core install + the LLM provider you want:
pip install 'sentinel-agent[anthropic]'        # Anthropic Claude
pip install 'sentinel-agent[openai]'           # OpenAI
pip install 'sentinel-agent[google]'           # Google Gemini
pip install 'sentinel-agent[claude-code]'      # Local Claude Code subscription, no API key
pip install 'sentinel-agent[all]'              # All providers

# One-time: install the Chromium binary Playwright needs
playwright install chromium

Configure

# Set up an LLM provider. Credentials live at ~/.config/sentinel/config.yaml.
sentinel configure llm anthropic --key sk-ant-xxx --set-default

# Or, if you have Claude Code installed locally (no API key needed):
sentinel configure llm claude_code --set-default

If you want a project-local config (highly recommended; lets you set viewport, baseline directory, accessibility thresholds):

sentinel init

This scaffolds sentinel.yaml with sensible defaults you can edit.

Run

sentinel run https://cascadeagent.dev

# Output (truncated):
#   ✓  3/3 scenarios passed, 0 visual diff(s), 2 a11y violation(s)
#
#   ✓  Homepage loads and primary CTA is visible  (1.42s)
#   ✓  Get-started link navigates to /getting-started/  (1.83s)
#   ✓  Docs sidebar contains all expected sections  (2.10s)
#
#   Accessibility violations:
#     [moderate] color-contrast: Elements must meet minimum color contrast...
#       sample: .text-slate-500
#       (3 node(s) affected)
#     [minor] image-alt: Images must have alt text...
#       sample: img.hero-illustration
#       (1 node(s) affected)
#
#   cost:    $0.04 (5,210 in / 980 out tokens)

What ships in v0.1.0

Capability	Module
Web testing via Playwright	`sentinel.browser`, `sentinel.runner`
LLM-driven test plan generation	`sentinel.planner`
Self-healing tests (LLM re-plan on failed step + retry once)	`sentinel.planner.regenerate_step`
Multi-page exploration (up to 4 same-origin links)	`sentinel.agent`
Visual regression (PIL pixel diff)	`sentinel.visual`
Accessibility scan (axe-core 4.10, WCAG 2.1 AA)	`sentinel.a11y`
REST API contract testing (OpenAPI + URL-probe modes)	`sentinel.api_*`
Multi-LLM (Anthropic / OpenAI / Google / Claude Code / Ollama)	`sentinel.llm`
Mobile (React Native via Detox)	planned for a future release

How it differs from existing tools

	Playwright Codegen	Pytest + Playwright	Percy / Chromatic	Sentinel
Generates tests from a URL	partial (record/replay)	❌	❌	✅
Self-hosted	✅	✅	❌	✅
Bring your own LLM	n/a	n/a	n/a	✅
Visual regression	❌	❌	✅	✅
Accessibility scan	❌	partial (plugin)	❌	✅
Open source	✅	✅	❌	✅

Sentinel is for teams who want test coverage without spending the engineering hours to author it. The trade-off is that AI-generated tests have failure modes hand-written tests do not (e.g. an LLM picks a fragile selector). The self-healing path is the answer to that: on a failed step, the runner asks the LLM for a more specific selector with the failure context and retries once.

Configuration

sentinel.yaml (after sentinel init):

version: 1

agent:
  provider: anthropic
  model: claude-opus-4-7
  temperature: 0.2

browser:
  headless: true
  viewport_width: 1280
  viewport_height: 720
  timeout_ms: 30000

visual:
  enabled: true
  baseline_dir: sentinel-baselines
  diff_threshold_percent: 0.5

a11y:
  enabled: true
  fail_on:
    - critical
    - serious

Architecture

   sentinel run <url>
          │
          ▼
   ┌──────────────┐
   │ explore page │  Playwright opens URL, grabs HTML + visible text
   └──────┬───────┘
          │
          ▼
   ┌──────────────┐
   │   planner    │  LLM produces TestPlan (2-5 scenarios, 3-8 steps each)
   └──────┬───────┘
          │
          ▼
   ┌──────────────┐
   │    runner    │  Fresh browser session per scenario
   │              │  Each step is one Playwright action
   │              │  screenshot steps → visual regression check
   │              │  a11y_scan steps → axe-core injection
   └──────┬───────┘
          │
          ▼
   ┌──────────────┐
   │ SentinelReport │  Scenarios + visual diffs + a11y violations + cost
   └──────────────┘

Roadmap

Version	Status	Highlights
0.1.0a1 → 0.1.0a3	Shipped 2026-05-26	Web testing via Playwright, visual regression (PIL), WCAG 2.1 AA scan via axe-core, multi-page exploration, self-healing tests, REST API contract testing (OpenAPI + URL-probe modes)
0.1.0	Shipped 2026-05-26	Standalone release: vendored own LLM client + config layer, zero runtime dependency on any other ThinkNext package. Per-provider install extras
0.1.1 → 0.1.8	Shipped 2026-05-26	Eight dogfooding-driven patches against a real Next.js 15 app: asyncio fix for self-heal inside Playwright; `wait_for_url` event listener for SPA navigation (the polling approach never saw the URL update); regex support in `assert_url` and `assert_text`; `url=` routing for `wait_for`; glob-to-regex escape correctness; longer reasoning length for repair envelopes
v0.2	Planned Q4 2026	CI integration (GitHub Actions / GitLab CI / Bitbucket Pipelines / Azure Pipelines), parallel scenario execution, storage-state seeding for authenticated flows
v0.3	Planned Q1 2027	Mobile (React Native via Detox or Maestro), cross-browser (Firefox / WebKit), test-history dashboard
v1.0	Planned mid-2027	Stable API, full coverage of web + API + mobile + visual + a11y, baselined against real-world OSS apps

Roadmap is directional. The 0.1.1 → 0.1.8 series is a strong signal that the LLM-prompt / runner contract is still being discovered in the wild; we expect more patches as Sentinel meets non-Next.js frameworks, auth-gated flows, iframes, and complex multi-step forms. File issues against the current 0.1.x line.

License

MIT. See LICENSE.

About

Built and maintained by ThinkNext Software Solutions, alongside our other open-source projects Cascade (meeting-to-PR) and Relay (issue-to-PR).

Follow along: @ThinkNextHQ · LinkedIn · Blog

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/sentinel		src/sentinel
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentinel

What it does

Why this exists

Install

Configure

Run

What ships in v0.1.0

How it differs from existing tools

Configuration

Architecture

Roadmap

License

About

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentinel

What it does

Why this exists

Install

Configure

Run

What ships in v0.1.0

How it differs from existing tools

Configuration

Architecture

Roadmap

License

About

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages