Interactive CLI mode + Challenge Registry

## Vision

Make `oasis` as approachable as the Ollama CLI. Type `oasis` with no arguments and get a guided, interactive experience — no flags, no docs reading, no setup friction. The tool meets you where you are.

This is how we believe open-source AI security benchmarking should work: **accessible to everyone**, not just people who read man pages.

---

## Interactive Mode

Running `oasis` with no arguments launches an interactive menu:

```
$ oasis

  ╔══════════════════════════════════════╗
  ║   OASIS — AI Security Benchmarking   ║
  ╚══════════════════════════════════════╝

  ? What would you like to do?

  > Run Benchmark
    View Results
    Configure API Keys
    Advanced Mode (CLI flags)
    Exit
```

### Run Benchmark Flow

Selecting **Run Benchmark** walks you through everything step by step:

**1. Choose a challenge:**
```
  ? Select a challenge:

    ── Official Registry (ghcr.io/kryptsec) ──
  > Gatekeeper        web  · easy   · SQL injection + privesc
    Lockpick           web  · medium · JWT auth bypass
    Shadowgate         net  · hard   · Network pivoting
    ── Custom ──
    Load local challenge (docker-compose)
```

**2. Choose the model to benchmark:**
```
  ? Select the model to benchmark:

  > Claude Sonnet 4.5 (Anthropic)
    GPT-4o (OpenAI)
    Grok 3 (xAI)
    Custom model...

  Enter your Anthropic API key: sk-ant-•••••••••
  ✓ Key validated
```

**3. Configure analysis:**
```
  ? Analysis model (recommended: Claude Sonnet 4.5 for standardized results):
  > Use same key (Claude Sonnet 4.5)
    Different model/key
    Skip analysis
```

We recommend Anthropic Sonnet 4.5 as the analysis model for standardized, comparable results across the community — but any model works.

**4. Environment setup with live progress:**
```
  ⠋ Pulling challenge: ghcr.io/kryptsec/oasis-gatekeeper:latest
    ████████████████████░░░░  78% — Pulling target image...
  ✓ Challenge ready
  ⠋ Pulling kali environment: ghcr.io/kryptsec/oasis-kali:latest
  ✓ Kali environment ready
  ⠋ Starting containers...
  ✓ Environment ready — target: 10.0.0.2, kali: 10.0.0.3
```

**5. Live benchmark execution:**
```
  ═══ Benchmark Running ═══
  Model: claude-sonnet-4-5  |  Challenge: gatekeeper  |  Max: 45 iterations

  [1/45] Reconnaissance — nmap -sV target...
  [2/45] Found HTTP:5000, MySQL:3306
  [3/45] Exploring web app — curl http://target:5000...
  ...
  [8/45] ✓ Flag captured: KX{a3f8b2c1}
```

**6. Results + analysis:**
```
  ═══ Results ═══
  Status: SUCCESS  |  Time: 47s  |  Iterations: 8/45  |  Tokens: 15,700

  ⠋ Running analysis (Claude Sonnet 4.5)...
  ✓ Analysis complete

  KSS: 94.2  |  Methodology: 92  |  Efficiency: 96
  MITRE ATT&CK: T1592 → T1190 → T1078 → T1068

  Results saved: results/a1b2c3d4.json
  Full report: oasis report a1b2c3d4
```

---

## Challenge Registry

### Official Registry (`ghcr.io/kryptsec`)

Challenges are published as container images to GitHub Container Registry:

```
ghcr.io/kryptsec/oasis-gatekeeper:latest     # Target image
ghcr.io/kryptsec/oasis-kali:latest            # Shared Kali attacker image
```

Each challenge also has a `challenge.json` manifest (either baked into the image or fetched from a registry index).

**Registry index concept:**
- A public JSON endpoint or GitHub repo that lists all available challenges with metadata (name, difficulty, category, description, image refs)
- The CLI fetches this on `oasis` launch (or `oasis challenges --remote`) to show what's available
- Challenges are pulled on-demand — nothing pre-installed

### Custom / Local Challenges

Users who build their own challenges can point to a local directory:
```
  > Load local challenge (docker-compose)
  ? Path to challenge directory: ./my-challenge/
  ✓ Found challenge.json — "My Custom SQLi Lab" (easy)
  ✓ Found docker-compose.yml — 2 services
```

This preserves the existing `oasis run -c <id>` workflow for power users and challenge developers.

---

## Advanced Mode

The full CLI with flags remains available for power users, CI/CD, and scripting:

```bash
oasis run -c gatekeeper -m claude-sonnet-4-5-20250929 -p anthropic --analyze --report
```

Selecting "Advanced Mode" from the interactive menu drops you into a help screen showing all available commands and flags.

---

## Implementation Notes

### Dependencies
- Interactive prompts: [`@inquirer/prompts`](https://www.npmjs.com/package/@inquirer/prompts) or [`prompts`](https://www.npmjs.com/package/prompts)
- Progress bars: Could extend current `ora` usage or add a progress bar lib
- Docker pull progress: Parse `docker pull` output stream for layer progress

### Key Design Decisions
- `oasis` with no args = interactive mode (current behavior shows help)
- `oasis <command>` = direct CLI mode (unchanged, backwards compatible)
- API keys are saved to `~/.config/oasis/credentials.json` after first entry — not asked again
- Challenge images are cached locally after first pull
- The kali image (`ghcr.io/kryptsec/oasis-kali`) is shared across all challenges

### Challenge Registry Architecture
- **Option A:** Static JSON file in a GitHub repo (e.g., `kryptsec/oasis-challenges/registry.json`) — simple, versioned, PR-based contributions
- **Option B:** GitHub Container Registry labels/tags with a discovery API
- **Option C:** Simple API endpoint that returns the challenge index

Option A is likely best for v1 — transparent, community-contributable, no infrastructure needed.

---

## Why This Matters

The current CLI works great for developers who are comfortable with flags and config files. But the mission of OASIS is to give the **entire security community** visibility into how AI performs offensive security. That means:

- A pentester who's never used Node.js should be able to run `npx @kryptsec/oasis` and benchmark a model in 2 minutes
- A security team evaluating AI tools should be able to compare models without writing scripts
- A researcher should be able to reproduce any benchmark with zero configuration beyond an API key

The interactive mode is how we get there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactive CLI mode + Challenge Registry #6

Vision

Interactive Mode

Run Benchmark Flow

Challenge Registry

Official Registry (`ghcr.io/kryptsec`)

Custom / Local Challenges

Advanced Mode

Implementation Notes

Dependencies

Key Design Decisions

Challenge Registry Architecture

Why This Matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Interactive CLI mode + Challenge Registry #6

Description

Vision

Interactive Mode

Run Benchmark Flow

Challenge Registry

Official Registry (ghcr.io/kryptsec)

Custom / Local Challenges

Advanced Mode

Implementation Notes

Dependencies

Key Design Decisions

Challenge Registry Architecture

Why This Matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Official Registry (`ghcr.io/kryptsec`)