Vision
Make oasis as approachable as the Ollama CLI. Type oasis with no arguments and get a guided, interactive experience — no flags, no docs reading, no setup friction. The tool meets you where you are.
This is how we believe open-source AI security benchmarking should work: accessible to everyone, not just people who read man pages.
Interactive Mode
Running oasis with no arguments launches an interactive menu:
$ oasis
╔══════════════════════════════════════╗
║ OASIS — AI Security Benchmarking ║
╚══════════════════════════════════════╝
? What would you like to do?
> Run Benchmark
View Results
Configure API Keys
Advanced Mode (CLI flags)
Exit
Run Benchmark Flow
Selecting Run Benchmark walks you through everything step by step:
1. Choose a challenge:
? Select a challenge:
── Official Registry (ghcr.io/kryptsec) ──
> Gatekeeper web · easy · SQL injection + privesc
Lockpick web · medium · JWT auth bypass
Shadowgate net · hard · Network pivoting
── Custom ──
Load local challenge (docker-compose)
2. Choose the model to benchmark:
? Select the model to benchmark:
> Claude Sonnet 4.5 (Anthropic)
GPT-4o (OpenAI)
Grok 3 (xAI)
Custom model...
Enter your Anthropic API key: sk-ant-•••••••••
✓ Key validated
3. Configure analysis:
? Analysis model (recommended: Claude Sonnet 4.5 for standardized results):
> Use same key (Claude Sonnet 4.5)
Different model/key
Skip analysis
We recommend Anthropic Sonnet 4.5 as the analysis model for standardized, comparable results across the community — but any model works.
4. Environment setup with live progress:
⠋ Pulling challenge: ghcr.io/kryptsec/oasis-gatekeeper:latest
████████████████████░░░░ 78% — Pulling target image...
✓ Challenge ready
⠋ Pulling kali environment: ghcr.io/kryptsec/oasis-kali:latest
✓ Kali environment ready
⠋ Starting containers...
✓ Environment ready — target: 10.0.0.2, kali: 10.0.0.3
5. Live benchmark execution:
═══ Benchmark Running ═══
Model: claude-sonnet-4-5 | Challenge: gatekeeper | Max: 45 iterations
[1/45] Reconnaissance — nmap -sV target...
[2/45] Found HTTP:5000, MySQL:3306
[3/45] Exploring web app — curl http://target:5000...
...
[8/45] ✓ Flag captured: KX{a3f8b2c1}
6. Results + analysis:
═══ Results ═══
Status: SUCCESS | Time: 47s | Iterations: 8/45 | Tokens: 15,700
⠋ Running analysis (Claude Sonnet 4.5)...
✓ Analysis complete
KSS: 94.2 | Methodology: 92 | Efficiency: 96
MITRE ATT&CK: T1592 → T1190 → T1078 → T1068
Results saved: results/a1b2c3d4.json
Full report: oasis report a1b2c3d4
Challenge Registry
Official Registry (ghcr.io/kryptsec)
Challenges are published as container images to GitHub Container Registry:
ghcr.io/kryptsec/oasis-gatekeeper:latest # Target image
ghcr.io/kryptsec/oasis-kali:latest # Shared Kali attacker image
Each challenge also has a challenge.json manifest (either baked into the image or fetched from a registry index).
Registry index concept:
- A public JSON endpoint or GitHub repo that lists all available challenges with metadata (name, difficulty, category, description, image refs)
- The CLI fetches this on
oasis launch (or oasis challenges --remote) to show what's available
- Challenges are pulled on-demand — nothing pre-installed
Custom / Local Challenges
Users who build their own challenges can point to a local directory:
> Load local challenge (docker-compose)
? Path to challenge directory: ./my-challenge/
✓ Found challenge.json — "My Custom SQLi Lab" (easy)
✓ Found docker-compose.yml — 2 services
This preserves the existing oasis run -c <id> workflow for power users and challenge developers.
Advanced Mode
The full CLI with flags remains available for power users, CI/CD, and scripting:
oasis run -c gatekeeper -m claude-sonnet-4-5-20250929 -p anthropic --analyze --report
Selecting "Advanced Mode" from the interactive menu drops you into a help screen showing all available commands and flags.
Implementation Notes
Dependencies
- Interactive prompts:
@inquirer/prompts or prompts
- Progress bars: Could extend current
ora usage or add a progress bar lib
- Docker pull progress: Parse
docker pull output stream for layer progress
Key Design Decisions
oasis with no args = interactive mode (current behavior shows help)
oasis <command> = direct CLI mode (unchanged, backwards compatible)
- API keys are saved to
~/.config/oasis/credentials.json after first entry — not asked again
- Challenge images are cached locally after first pull
- The kali image (
ghcr.io/kryptsec/oasis-kali) is shared across all challenges
Challenge Registry Architecture
- Option A: Static JSON file in a GitHub repo (e.g.,
kryptsec/oasis-challenges/registry.json) — simple, versioned, PR-based contributions
- Option B: GitHub Container Registry labels/tags with a discovery API
- Option C: Simple API endpoint that returns the challenge index
Option A is likely best for v1 — transparent, community-contributable, no infrastructure needed.
Why This Matters
The current CLI works great for developers who are comfortable with flags and config files. But the mission of OASIS is to give the entire security community visibility into how AI performs offensive security. That means:
- A pentester who's never used Node.js should be able to run
npx @kryptsec/oasis and benchmark a model in 2 minutes
- A security team evaluating AI tools should be able to compare models without writing scripts
- A researcher should be able to reproduce any benchmark with zero configuration beyond an API key
The interactive mode is how we get there.
Vision
Make
oasisas approachable as the Ollama CLI. Typeoasiswith no arguments and get a guided, interactive experience — no flags, no docs reading, no setup friction. The tool meets you where you are.This is how we believe open-source AI security benchmarking should work: accessible to everyone, not just people who read man pages.
Interactive Mode
Running
oasiswith no arguments launches an interactive menu:Run Benchmark Flow
Selecting Run Benchmark walks you through everything step by step:
1. Choose a challenge:
2. Choose the model to benchmark:
3. Configure analysis:
We recommend Anthropic Sonnet 4.5 as the analysis model for standardized, comparable results across the community — but any model works.
4. Environment setup with live progress:
5. Live benchmark execution:
6. Results + analysis:
Challenge Registry
Official Registry (
ghcr.io/kryptsec)Challenges are published as container images to GitHub Container Registry:
Each challenge also has a
challenge.jsonmanifest (either baked into the image or fetched from a registry index).Registry index concept:
oasislaunch (oroasis challenges --remote) to show what's availableCustom / Local Challenges
Users who build their own challenges can point to a local directory:
This preserves the existing
oasis run -c <id>workflow for power users and challenge developers.Advanced Mode
The full CLI with flags remains available for power users, CI/CD, and scripting:
Selecting "Advanced Mode" from the interactive menu drops you into a help screen showing all available commands and flags.
Implementation Notes
Dependencies
@inquirer/promptsorpromptsorausage or add a progress bar libdocker pulloutput stream for layer progressKey Design Decisions
oasiswith no args = interactive mode (current behavior shows help)oasis <command>= direct CLI mode (unchanged, backwards compatible)~/.config/oasis/credentials.jsonafter first entry — not asked againghcr.io/kryptsec/oasis-kali) is shared across all challengesChallenge Registry Architecture
kryptsec/oasis-challenges/registry.json) — simple, versioned, PR-based contributionsOption A is likely best for v1 — transparent, community-contributable, no infrastructure needed.
Why This Matters
The current CLI works great for developers who are comfortable with flags and config files. But the mission of OASIS is to give the entire security community visibility into how AI performs offensive security. That means:
npx @kryptsec/oasisand benchmark a model in 2 minutesThe interactive mode is how we get there.