Skip to content

Interactive CLI mode + Challenge Registry #6

@Treelovah

Description

@Treelovah

Vision

Make oasis as approachable as the Ollama CLI. Type oasis with no arguments and get a guided, interactive experience — no flags, no docs reading, no setup friction. The tool meets you where you are.

This is how we believe open-source AI security benchmarking should work: accessible to everyone, not just people who read man pages.


Interactive Mode

Running oasis with no arguments launches an interactive menu:

$ oasis

  ╔══════════════════════════════════════╗
  ║   OASIS — AI Security Benchmarking   ║
  ╚══════════════════════════════════════╝

  ? What would you like to do?

  > Run Benchmark
    View Results
    Configure API Keys
    Advanced Mode (CLI flags)
    Exit

Run Benchmark Flow

Selecting Run Benchmark walks you through everything step by step:

1. Choose a challenge:

  ? Select a challenge:

    ── Official Registry (ghcr.io/kryptsec) ──
  > Gatekeeper        web  · easy   · SQL injection + privesc
    Lockpick           web  · medium · JWT auth bypass
    Shadowgate         net  · hard   · Network pivoting
    ── Custom ──
    Load local challenge (docker-compose)

2. Choose the model to benchmark:

  ? Select the model to benchmark:

  > Claude Sonnet 4.5 (Anthropic)
    GPT-4o (OpenAI)
    Grok 3 (xAI)
    Custom model...

  Enter your Anthropic API key: sk-ant-•••••••••
  ✓ Key validated

3. Configure analysis:

  ? Analysis model (recommended: Claude Sonnet 4.5 for standardized results):
  > Use same key (Claude Sonnet 4.5)
    Different model/key
    Skip analysis

We recommend Anthropic Sonnet 4.5 as the analysis model for standardized, comparable results across the community — but any model works.

4. Environment setup with live progress:

  ⠋ Pulling challenge: ghcr.io/kryptsec/oasis-gatekeeper:latest
    ████████████████████░░░░  78% — Pulling target image...
  ✓ Challenge ready
  ⠋ Pulling kali environment: ghcr.io/kryptsec/oasis-kali:latest
  ✓ Kali environment ready
  ⠋ Starting containers...
  ✓ Environment ready — target: 10.0.0.2, kali: 10.0.0.3

5. Live benchmark execution:

  ═══ Benchmark Running ═══
  Model: claude-sonnet-4-5  |  Challenge: gatekeeper  |  Max: 45 iterations

  [1/45] Reconnaissance — nmap -sV target...
  [2/45] Found HTTP:5000, MySQL:3306
  [3/45] Exploring web app — curl http://target:5000...
  ...
  [8/45] ✓ Flag captured: KX{a3f8b2c1}

6. Results + analysis:

  ═══ Results ═══
  Status: SUCCESS  |  Time: 47s  |  Iterations: 8/45  |  Tokens: 15,700

  ⠋ Running analysis (Claude Sonnet 4.5)...
  ✓ Analysis complete

  KSS: 94.2  |  Methodology: 92  |  Efficiency: 96
  MITRE ATT&CK: T1592 → T1190 → T1078 → T1068

  Results saved: results/a1b2c3d4.json
  Full report: oasis report a1b2c3d4

Challenge Registry

Official Registry (ghcr.io/kryptsec)

Challenges are published as container images to GitHub Container Registry:

ghcr.io/kryptsec/oasis-gatekeeper:latest     # Target image
ghcr.io/kryptsec/oasis-kali:latest            # Shared Kali attacker image

Each challenge also has a challenge.json manifest (either baked into the image or fetched from a registry index).

Registry index concept:

  • A public JSON endpoint or GitHub repo that lists all available challenges with metadata (name, difficulty, category, description, image refs)
  • The CLI fetches this on oasis launch (or oasis challenges --remote) to show what's available
  • Challenges are pulled on-demand — nothing pre-installed

Custom / Local Challenges

Users who build their own challenges can point to a local directory:

  > Load local challenge (docker-compose)
  ? Path to challenge directory: ./my-challenge/
  ✓ Found challenge.json — "My Custom SQLi Lab" (easy)
  ✓ Found docker-compose.yml — 2 services

This preserves the existing oasis run -c <id> workflow for power users and challenge developers.


Advanced Mode

The full CLI with flags remains available for power users, CI/CD, and scripting:

oasis run -c gatekeeper -m claude-sonnet-4-5-20250929 -p anthropic --analyze --report

Selecting "Advanced Mode" from the interactive menu drops you into a help screen showing all available commands and flags.


Implementation Notes

Dependencies

  • Interactive prompts: @inquirer/prompts or prompts
  • Progress bars: Could extend current ora usage or add a progress bar lib
  • Docker pull progress: Parse docker pull output stream for layer progress

Key Design Decisions

  • oasis with no args = interactive mode (current behavior shows help)
  • oasis <command> = direct CLI mode (unchanged, backwards compatible)
  • API keys are saved to ~/.config/oasis/credentials.json after first entry — not asked again
  • Challenge images are cached locally after first pull
  • The kali image (ghcr.io/kryptsec/oasis-kali) is shared across all challenges

Challenge Registry Architecture

  • Option A: Static JSON file in a GitHub repo (e.g., kryptsec/oasis-challenges/registry.json) — simple, versioned, PR-based contributions
  • Option B: GitHub Container Registry labels/tags with a discovery API
  • Option C: Simple API endpoint that returns the challenge index

Option A is likely best for v1 — transparent, community-contributable, no infrastructure needed.


Why This Matters

The current CLI works great for developers who are comfortable with flags and config files. But the mission of OASIS is to give the entire security community visibility into how AI performs offensive security. That means:

  • A pentester who's never used Node.js should be able to run npx @kryptsec/oasis and benchmark a model in 2 minutes
  • A security team evaluating AI tools should be able to compare models without writing scripts
  • A researcher should be able to reproduce any benchmark with zero configuration beyond an API key

The interactive mode is how we get there.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions