In [None]:
#| default_exp safecmd.bashxtract

In [None]:
!cat 00_bashxtract.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a58dff87",
   "metadata": {
    "time_run": "2025-12-20T09:57:49.390199+00:00"
   },
   "outputs": [],
   "source": [
    "#| default_exp safecmd.bashxtract"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bce08b53",
   "metadata": {
    "time_run": "2025-12-20T09:57:49.805410+00:00"
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "import shlex,subprocess,json,shutil\n",
    "from fastcore.utils import *"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fbfc9f1",
   "metadata": {},
   "source": [
    "# bashxtract API\n",
    "> Extract commands used from bash command lines"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3654af97",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c94eb28",
   "metadata": {},
   "source": [
    "`safecmd.bashxtract` 

In [None]:
#| export
import shlex,subprocess,json,shutil
from fastcore.utils import *

# core
> Core API for safecmd

## Introduction

`safecmd.core` provides a safe execution layer for shell commands. It's designed for situations where you need to run bash commands from untrusted sourcesâ€”such as LLM-generated commandsâ€”while ensuring they can't modify your system in dangerous ways.

The module builds on top of `safecmd.bashxtract` (which parses bash into an AST and extracts commands) to validate commands against an allowlist before execution. The key insight is that rather than trying to blacklist dangerous commands (which is error-prone), we whitelist a generous set of read-only commands that are safe to run.

The core workflow is:

1. Parse the bash command string using `extract_commands()` from bashxtract
2. Check each extracted command against `ok_cmds` (the allowlist)
3. Check that only safe operators are used (pipes, semicolons, etc.â€”but not redirects like `>`)
4. If everything passes, execute the command and return the result

This approach handles complex bash syntax correctlyâ€”pipelines, command substitutions, subshells, and moreâ€”because it uses a proper bash parser rather than regex or string splitting.

The allowlist (`ok_cmds`) uses **prefix matching** to determine if a command is permitted. A simple entry like `'ls'` matches any command starting with `ls`â€”so `ls`, `ls -la`, and `ls /home/user` are all allowed. A multi-word entry like `'git status'` only matches commands that start with both those wordsâ€”so `git status` and `git status --short` are allowed, but `git push` is not.

This prefix approach lets you be precise about which subcommands are safe. For instance, you might allow `git log`, `git status`, and `git diff` (all read-only) while blocking `git push` and `git reset` (which modify state).

Some commands are mostly safe but have a few dangerous flags. For example, `find` is useful for searching files, but its `-exec` flag can run arbitrary commandsâ€”which defeats our safety guarantees. For these cases, you can specify a **denied list** of flags that will cause the command to be rejected. So we allow `find . -name '*.py'` but block `find . -exec rm {} \;` because `-exec` is in the denied list.

The operators in a command are also checked. By default, pipes (`|`), logical operators (`&&`, `||`), semicolons (`;`), and input redirection (`<`) are allowed. But output redirection (`>`, `>>`) is blocked by default since it writes to files.

## How to use

The simplest way to use safecmd is to call `safe_run()` with a bash command string. This function validates the command against the built-in allowlist and executes it if safe, returning the combined stdout/stderr output as a string. If the command fails, it raises an `IOError`. If the command or operators aren't allowed, it raises either `DisallowedCmd` or `DisallowedOps`.

For example: `safe_run('ls -la | grep py')` will execute and return the filtered directory listing, while `safe_run('rm -rf /')` will raise a `DisallowedCmd` exception before anything dangerous happens.

The module comes with a predefined set of safe commands in `ok_cmds`. This includes common read-only utilities like `cat`, `grep`, `ls`, `diff`, as well as safe git subcommands like `git log`, `git status`, and `git diff`. The `find` command is included with a denied list that blocks `-exec`, `-delete`, and similar dangerous flags.

If you want to start with a clean slate, call `clear_cmds()` to empty the allowlist. Then use `add_cmds()` to add your own commands. You can pass simple command names as strings (e.g., `add_cmds('cat', 'ls')`), multi-word prefixes as space-separated strings (e.g., `add_cmds('git log', 'git status')`), or `CmdSpec` objects for commands that need denied flags (e.g., `add_cmds(CmdSpec('find', denied=['-exec', '-delete']))`).

You can also customize the allowed operators by passing an `ops` parameter to `safe_run()`. The default set is `ok_ops = {'|', '<', '&&', '||', ';'}`, which allows pipes, input redirection, logical operators, and command sequences, but blocks output redirection. If you want to allow writing to files, you could call `safe_run(cmd, ops=ok_ops | {'>', '>>'})`.

In [None]:
def run(cmd, ignore_ex=False):
    "Run `cmd` in shell; return stdout (+ stderr if any); raise IOError on failure"
    res = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    out = res.stdout.strip()
    if res.stderr: out += ('\n' if out else '') + res.stderr.strip()
    if ignore_ex: return (res.returncode, out)
    if res.returncode: raise IOError(out)
    return out

In [None]:
from fastcore.test import test_fail,test_eq

In [None]:
test_eq(run('echo hello'), 'hello')
test_eq(run('echo out; echo err >&2'), 'out\nerr')
test_eq(run('exit 1', ignore_ex=True), (1, ''))
test_eq(run('echo fail >&2; exit 1', ignore_ex=True), (1,'fail'))
test_fail(lambda: run('exit 1'))

In [None]:
def add_cmds(*cmds):
    "Add commands to `ok_cmds`; strings are split into tuples, CmdSpec passed through"
    ok_cmds.update(tuple(c.split()) if isinstance(c,str) else c for c in cmds)

In [None]:
class CmdSpec(BasicRepr):
    def __init__(self,
        name,  # the command (str, will be split into tuple)
        denied=None):  # if set, these flags blocked
        self.name = tuple(name.split())
        self.denied = set(denied or [])

    def __hash__(self): return hash(self.name)
    def __eq__(self, b): return self.name==b.name
    
    def __repr__(self):
        s = ' '.join(self.name)
        if self.denied: s += f' !{self.denied}'
        return s
    
    def __call__(self, toks):
        "Returns True if allowed, False if no match, raises if denied flag found"
        if tuple(toks[:len(self.name)]) != self.name: return False
        return not (self.denied and self.denied & set(toks))

In [None]:
find = CmdSpec('find', denied=['-exec', '-delete'])
find

find !{'-delete', '-exec'}

In [None]:
assert find(['find', '.', '-name', '*.py'])
assert not find(['find', '.', '-exec', 'rm'])
assert not find(['ls', '-la'])

In [None]:
def add_cmds(*cmds):
    ok_cmds.update(c if isinstance(c, CmdSpec) else CmdSpec(c) for c in cmds)

In [None]:
ok_ops = {'|', '<', '&&', '||', ';'}

ok_cmds = set()

add_cmds(
    'cat', 'head', 'tail', 'less', 'more', 'bat',
    'ls', 'tree', 'locate',
    'grep', 'rg', 'ag', 'ack', 'fgrep', 'egrep',
    'cut', 'sort', 'uniq', 'wc', 'tr', 'column',
    'file', 'stat', 'du', 'df', 'which', 'whereis', 'type',
    'diff', 'cmp', 'comm',
    'tar', 'unzip', 'gunzip', 'bunzip2', 'unrar',
    'curl', 'wget', 'ping', 'dig', 'nslookup', 'host',
    'date', 'cal', 'uptime', 'whoami', 'hostname', 'uname', 'env', 'printenv',
    'echo', 'printf', 'yes', 'seq', 'basename', 'dirname', 'realpath',
    CmdSpec('find', denied=['-exec', '-execdir', '-delete', '-ok', '-okdir']),
    'git log', 'git show', 'git diff', 'git status', 'git branch',
    'git tag', 'git remote', 'git stash list', 'git blame', 'git shortlog',
    'git describe', 'git rev-parse', 'git ls-files', 'git ls-tree',
    'git cat-file', 'git config --get', 'git config --list',
    'git fetch', 'git add', 'git commit', 'git switch', 'git checkout',
)

In [None]:
def validate_cmd(toks, cmds=None):
    "Check if toks matches an allowed command"
    if cmds is None: cmds = ok_cmds
    return any(spec(toks) for spec in cmds)

In [None]:
assert validate_cmd(['ls', '-la'])
assert validate_cmd(['git', 'status'])
assert validate_cmd(['find', '.', '-name', '*.py'])
assert not validate_cmd(['find', '.', '-exec', 'rm'])
assert not validate_cmd(['rm', '-rf', '/'])
assert not validate_cmd(['git', 'push'])

In [None]:
class DisallowedOps(PermissionError):
    def __init__(self, ops): super().__init__(f"Disallowed operators: {ops}")

class DisallowedCmd(PermissionError):
    def __init__(self, cmd): super().__init__(f"Disallowed command: {' '.join(cmd)}")

def safe_run(cmd, cmds=None, ops=None):
    "Run `cmd` in shell if all commands and operators are in allowlists, else raise"
    if ops is None: ops = ok_ops
    commands, used_ops = extract_commands(cmd)
    if bad_ops := used_ops - ops: raise DisallowedOps(bad_ops)
    for c in commands:
        if not validate_cmd(c, cmds): raise DisallowedCmd(c)
    return run(cmd)

In [None]:
test_eq(safe_run('ls'), run('ls'))
test_eq(safe_run('echo hello | cat'), 'hello')
test_fail(lambda: safe_run('rm -rf /'), contains='Disallowed command')
test_fail(lambda: safe_run('echo hi > file'), contains='Disallowed operators')
test_fail(lambda: safe_run('find . -exec rm'), contains='Disallowed command')

NameError: name 'extract_commands' is not defined

In [None]:
def clear_cmds():
    "Remove all commands from ok_cmds"
    ok_cmds.clear()

At the top of this dialog I've printed the json nb source of the module we use here for parsing. Let's start writing our notebook to export now, following this stylistic lead. Create messages at the bottom of this dialog with an H2 header 'introduction', and then explain what this module does. Note there's a separate index page explaining the overall lib, so we're just documenting this module. Be generous and thorough in your explanations.

##### ðŸ¤–ReplyðŸ¤–<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ ðŸ§ 



<details class='tool-usage-details'>

```json
{
  "id": "toolu_01SXhLUEzWi77SkLykHvmXs4",
  "call": {
    "function": "add_msg",
    "arguments": {
      "content": "## Introduction",
      "placement": "at_end"
    }
  },
  "result": "'_e89821a4'"
}
```

</details>



<details class='tool-usage-details'>

```json
{
  "id": "toolu_01DjWHfVjceiGdX7E8V267fi",
  "call": {
    "function": "add_msg",
    "arguments": {
      "content": "`safecmd.core` provides a safe execution layer for shell commands. It's designed for situations where you need to run bash commands from untrusted sources\u2014such as LLM-generated commands\u2014while ensuring they can't modify your system in dangerous ways.\n\nThe module builds on top of `safecmd.bashxtract` (which parses bash into an AST and extracts commands) to validate commands against an allowlist before execution. The key insight is that rather than trying to blacklist dangerous commands (which is error-prone), we whitelist a generous set of read-only commands that are safe to run.\n\nThe core workf<TRUNCATED>",
      "placement": "at_end"
    }
  },
  "result": "'_25dc51ea'"
}
```

</details>

I've added the Introduction section. Shall I continue with the next sectionâ€”perhaps documenting `run()` and `CmdSpec`?