In [None]:
#| default_exp core

In [None]:
#| export
import subprocess,json,shutil
from safecmd.bashxtract import *
from fastcore.utils import *

# core
> Core API for safecmd

## Introduction

`safecmd.core` provides a safe execution layer for shell commands. It's designed for situations where you need to run bash commands from untrusted sources—such as LLM-generated commands—while ensuring they can't modify your system in dangerous ways.

The module builds on top of `safecmd.bashxtract` (which parses bash into an AST and extracts commands) to validate commands against an allowlist before execution. The key insight is that rather than trying to blacklist dangerous commands (which is error-prone), we whitelist a generous set of read-only and easily-reverted commands that are safe to run.

The core workflow is:

1. Parse the bash command string using `extract_commands()` from bashxtract
2. Check each extracted command against `ok_cmds` (the allowlist). Commands inside substitutions (`$(...)`), subshells, pipelines, etc are extracted recursively, so nested commands are also validated.
3. Check that only safe operators are used (pipes, semicolons, etc.—but not redirects like `>`)
4. If everything passes, execute the command and return the result

This approach handles complex bash syntax correctly—pipelines, command substitutions, subshells, and more—because it uses a proper bash parser rather than regex or string splitting.

The allowlist (`ok_cmds`) uses **prefix matching** to determine if a command is permitted. A simple entry like `'ls'` matches any command starting with `ls`—so `ls`, `ls -la`, and `ls /home/user` are all allowed. A multi-word entry like `'git status'` only matches commands that start with both those words—so `git status` and `git status --short` are allowed, but `git push` is not.

This prefix approach lets you be precise about which subcommands are safe. For instance, you might allow `git log`, `git status`, and `git diff` (all read-only) while blocking `git push` and `git reset` (which modify state).

Some commands are mostly safe but have a few dangerous flags. For example, `find` is useful for searching files, but its `-exec` flag can run arbitrary commands—which defeats our safety guarantees. For these cases, you can specify a **denied list** of flags that will cause the command to be rejected. So we allow `find . -name '*.py'` but block `find . -exec rm {} \;` because `-exec` is in the denied list.

The operators in a command are also checked. By default, pipes (`|`), logical operators (`&&`, `||`), semicolons (`;`), and input redirection (`<`) are allowed. But output redirection (`>`, `>>`) is blocked by default since it writes to files.

## How to use

The simplest way to use safecmd is to call `safe_run()` with a bash command string. This function validates the command against the built-in allowlist and executes it if safe, returning the combined stdout/stderr output as a string. If the command fails, it raises an `IOError`. If the command or operators aren't allowed, it raises either `DisallowedCmd` or `DisallowedOps`.

For example: `safe_run('ls -la | grep py')` will execute and return the filtered directory listing, while `safe_run('rm -rf /')` will raise a `DisallowedCmd` exception before anything dangerous happens.

The module comes with a predefined set of safe commands in `ok_cmds`. This includes common read-only utilities like `cat`, `grep`, `ls`, `diff`, as well as safe git subcommands like `git log`, `git status`, and `git diff`. The `find` command is included with a denied list that blocks `-exec`, `-delete`, and similar dangerous flags.

If you want to start with a clean slate, call `clear_cmds()` to empty the allowlist. Then use `add_cmds()` to add your own commands. You can pass simple command names as strings (e.g., `add_cmds('cat', 'ls')`), multi-word prefixes as space-separated strings (e.g., `add_cmds('git log', 'git status')`), or `CmdSpec` objects for commands that need denied flags (e.g., `add_cmds(CmdSpec('find', denied=['-exec', '-delete']))`).

You can also customize the allowed operators by passing an `ops` parameter to `safe_run()`. The default set is `ok_ops = {'|', '<', '&&', '||', ';'}`, which allows pipes, input redirection, logical operators, and command sequences, but blocks output redirection. If you want to allow writing to files, you could call `safe_run(cmd, ops=ok_ops | {'>', '>>'})`.

## API

### Helpers

In [None]:
#| export
def run(cmd, ignore_ex=False):
    "Run `cmd` in shell; return stdout (+ stderr if any); raise IOError on failure"
    res = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    out = res.stdout.strip()
    if res.stderr: out += ('\n' if out else '') + res.stderr.strip()
    if ignore_ex: return (res.returncode, out)
    if res.returncode: raise IOError(out)
    return out

Executes a shell command and returns its combined stdout/stderr output. If `ignore_ex=True`, returns a tuple of `(returncode, output)` instead of raising on failure. This is the low-level execution function—it doesn't do any safety checking.

In [None]:
from fastcore.test import test_fail,test_eq

In [None]:
test_eq(run('echo hello'), 'hello')
test_eq(run('echo out; echo err >&2'), 'out\nerr')
test_eq(run('exit 1', ignore_ex=True), (1, ''))
test_eq(run('echo fail >&2; exit 1', ignore_ex=True), (1,'fail'))
test_fail(lambda: run('exit 1'))

### Command Specifications

In [None]:
#| export
class CmdSpec(BasicRepr):
    def __init__(self,
        name,  # the command (str, will be split into tuple)
        denied=None):  # if set, these flags blocked
        self.name = tuple(name.split())
        self.denied = set(denied or [])

    def __hash__(self): return hash(self.name)
    def __eq__(self, b): return self.name==b.name
    
    def __repr__(self):
        s = ' '.join(self.name)
        if self.denied: s += f' !{self.denied}'
        return s
    
    def __call__(self, toks):
        "Returns True if allowed, False if no match or denied flag found"
        if tuple(toks[:len(self.name)]) != self.name: return False
        return not (self.denied and self.denied & set(toks))

`CmdSpec` represents an allowed command with optional denied flags. The `name` is stored as a tuple for prefix matching—so `CmdSpec('git log')` matches `git log`, `git log --oneline`, etc. The `denied` set contains flags that will cause the command to be rejected even if the prefix matches.

In [None]:
find = CmdSpec('find', denied=['-exec', '-delete'])
find

find !{'-delete', '-exec'}

In [None]:
assert find(['find', '.', '-name', '*.py'])
assert not find(['find', '.', '-exec', 'rm'])
assert not find(['ls', '-la'])

In [None]:
#| export
def add_cmds(*cmds):
    ok_cmds.update(c if isinstance(c, CmdSpec) else CmdSpec(c) for c in cmds)

`add_cmds` is a convenience function for populating `ok_cmds`. You can pass strings (which become `CmdSpec` objects) or `CmdSpec` instances directly for commands that need denied flags.

### Default Allowlists

In [None]:
#| export
cmd_groups = {
    'File viewing': ['cat', 'head', 'tail', 'less', 'more', 'bat'],
    'Directory listing': ['ls', 'tree', 'locate'],
    'Search': ['grep', 'rg', 'ag', 'ack', 'fgrep', 'egrep'],
    'Text processing': ['cut', 'sort', 'uniq', 'wc', 'tr', 'column'],
    'File info': ['file', 'stat', 'du', 'df', 'which', 'whereis', 'type'],
    'Comparison': ['diff', 'cmp', 'comm'],
    'Archives': ['tar', 'unzip', 'gunzip', 'bunzip2', 'unrar'],
    'Network': ['curl', 'wget', 'ping', 'dig', 'nslookup', 'host'],
    'System info': ['date', 'cal', 'uptime', 'whoami', 'hostname', 'uname', 'env', 'printenv'],
    'Utilities': ['echo', 'printf', 'yes', 'seq', 'basename', 'dirname', 'realpath'],
    'Git (read-only)': ['git log', 'git show', 'git diff', 'git status', 'git branch', 'git tag', 'git remote', 'git stash list', 'git blame', 'git shortlog', 'git describe', 'git rev-parse', 'git ls-files', 'git ls-tree', 'git cat-file', 'git config --get', 'git config --list'],
    'Git (workspace)': ['git fetch', 'git add', 'git commit', 'git switch', 'git checkout'],
}

ok_cmds = set()

for v in cmd_groups.values(): add_cmds(*v)
find_spec = CmdSpec('find', denied=['-exec', '-execdir', '-delete', '-ok', '-okdir'])
add_cmds(find_spec)

`ok_cmds` contains a generous set of read-only commands plus some safe git operations. Note that `find` uses a `CmdSpec` to block dangerous flags like `-exec`. Full list:

In [None]:
for k,v in cmd_groups.items(): print(k, ':', '; '.join(v))

File viewing : cat; head; tail; less; more; bat
Directory listing : ls; tree; locate
Search : grep; rg; ag; ack; fgrep; egrep
Text processing : cut; sort; uniq; wc; tr; column
File info : file; stat; du; df; which; whereis; type
Comparison : diff; cmp; comm
Archives : tar; unzip; gunzip; bunzip2; unrar
Network : curl; wget; ping; dig; nslookup; host
System info : date; cal; uptime; whoami; hostname; uname; env; printenv
Utilities : echo; printf; yes; seq; basename; dirname; realpath
Git (read-only) : git log; git show; git diff; git status; git branch; git tag; git remote; git stash list; git blame; git shortlog; git describe; git rev-parse; git ls-files; git ls-tree; git cat-file; git config --get; git config --list
Git (workspace) : git fetch; git add; git commit; git switch; git checkout


In addition, `find` is allowed by default, with a list of denied flags:

In [None]:
find_spec

find !{'-exec', '-ok', '-execdir', '-delete', '-okdir'}

In [None]:
#| export
ok_ops = {'|', '<', '&&', '||', ';'}

`ok_ops` permits pipes, input redirection, and logical/sequential operators—but blocks output redirection by default. Use standard set operations to clear, add, or remove items.

In [None]:
print(ok_ops)

{';', '||', '<', '&&', '|'}


### Safe Execution

In [None]:
#| export
def validate_cmd(toks, cmds=None):
    "Check if toks matches an allowed command; returns False if denied flags present"
    if cmds is None: cmds = ok_cmds
    return any(spec(toks) for spec in cmds)

`validate_cmd` checks whether a tokenized command matches any entry in the allowlist by calling each `CmdSpec` until one returns `True`.

In [None]:
assert validate_cmd(['ls', '-la'])
assert validate_cmd(['git', 'status'])
assert validate_cmd(['find', '.', '-name', '*.py'])
assert not validate_cmd(['find', '.', '-exec', 'rm'])
assert not validate_cmd(['rm', '-rf', '/'])
assert not validate_cmd(['git', 'push'])

In [None]:
#| export
class DisallowedOps(PermissionError):
    def __init__(self, ops): super().__init__(f"Disallowed operators: {ops}")

class DisallowedCmd(PermissionError):
    def __init__(self, cmd): super().__init__(f"Disallowed command: {' '.join(cmd)}")

def safe_run(cmd, cmds=None, ops=None):
    "Run `cmd` in shell if all commands and operators are in allowlists, else raise"
    if ops is None: ops = ok_ops
    commands, used_ops = extract_commands(cmd)
    if bad_ops := used_ops - ops: raise DisallowedOps(bad_ops)
    for c in commands:
        if not validate_cmd(c, cmds): raise DisallowedCmd(c)
    return run(cmd)

`safe_run` is the main entry point. It parses the bash command, validates all extracted commands and operators against the allowlists, and only executes if everything passes. `DisallowedOps` and `DisallowedCmd` are raised for violations, giving clear error messages about what was blocked.

In [None]:
test_eq(safe_run('ls'), run('ls'))
test_eq(safe_run('echo hello | cat'), 'hello')
test_fail(lambda: safe_run('rm -rf /'), contains='Disallowed command')
test_fail(lambda: safe_run('echo hi > file'), contains='Disallowed operators')
test_fail(lambda: safe_run('find . -exec rm'), contains='Disallowed command')

In [None]:
def clear_cmds():
    "Remove all commands from ok_cmds"
    ok_cmds.clear()

`clear_cmds` empties the allowlist, useful if you want to start fresh and define your own set of permitted commands.