Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
8da6764
Implement awk phase 4 practical features
matt-dz May 8, 2026
a1a2bfe
Add awk user-defined functions
matt-dz May 8, 2026
03de854
Add awk output command pipes
matt-dz May 8, 2026
7fd4a3b
Expand enabled awk scenario coverage
matt-dz May 8, 2026
65e924e
Drop diagnostic-only awk oracle scenario
matt-dz May 8, 2026
b5e7b0f
Preserve awk gsub anchors across empty matches
matt-dz May 8, 2026
29b3a4d
Decode awk octal string escapes
matt-dz May 8, 2026
70421f3
Decode awk regex octal escapes
matt-dz May 8, 2026
2d70556
Preserve awk regex high-bit octal bytes
matt-dz May 8, 2026
0599d1e
Handle awk dynamic regex high-bit bytes
matt-dz May 8, 2026
628b168
Encode mixed awk regex byte-mode patterns
matt-dz May 8, 2026
1435540
Map awk byte regex match offsets to runes
matt-dz May 8, 2026
1489cdd
Fix awk parameter alias and pipe close semantics
matt-dz May 8, 2026
e5d1505
Fix awk ternary assignments and shared params
matt-dz May 8, 2026
870dd10
Mark awk parameter scalar reads
matt-dz May 8, 2026
c5e78b9
Fix awk length of lazy ENVIRON
matt-dz May 8, 2026
73ef1b1
Reserve awk special names in functions
matt-dz May 9, 2026
5d628d8
Reject awk function names as variables
matt-dz May 9, 2026
03c6ea3
Reject awk calls through shadowing params
matt-dz May 9, 2026
51e472c
Reject awk loop control outside lexical loops
matt-dz May 9, 2026
18da798
docs(awk): expand help profile
matt-dz May 11, 2026
05c6376
feat(awk): support getline input streams
matt-dz May 11, 2026
a8b450c
fix(awk): pass stdin to command getline pipes
matt-dz May 11, 2026
27744bf
fix(awk): preserve stdin for file getline pipes
matt-dz May 11, 2026
a952ae6
ci: trigger phase 4 checks
matt-dz May 11, 2026
6f18714
fix(awk): allow command pipe stdin reader
matt-dz May 11, 2026
5e9908c
fix(awk): stabilize rewritten getline tests
matt-dz May 11, 2026
9acf7c3
fix(awk): avoid disallowed prefix helper
matt-dz May 11, 2026
02812ef
docs(awk): clarify help sandbox wording
matt-dz May 11, 2026
632d9c7
docs(awk): refine file redirection help
matt-dz May 11, 2026
5bd542f
fix(awk): address qa investigation gaps
matt-dz May 13, 2026
2b542d5
fix(awk): align match captures and strtonum prefixes
matt-dz May 14, 2026
85c6e3b
fix(awk): parse invalid octal strtonum prefixes
matt-dz May 14, 2026
5b3fbfe
fix(awk): preserve output pipe ordering
matt-dz May 14, 2026
3182fdb
fix(awk): keep output pipes open across empty stdout writes
matt-dz May 14, 2026
e8540cc
fix(awk): avoid disallowed context fallback
matt-dz May 14, 2026
00b2733
fix(awk): keep reused output pipes open across stdout
matt-dz May 14, 2026
2e88b73
fix(awk): keep loop-reused output pipes open
matt-dz May 14, 2026
e77a79b
fix(awk): preserve pipe context across functions
matt-dz May 14, 2026
47847af
fix(awk): close inputs on runtime errors
matt-dz May 14, 2026
01543d9
fix(awk): honor ignorecase truthiness in sorting
matt-dz May 14, 2026
9c15197
fix(awk): delay stdout around reused command pipes
matt-dz May 14, 2026
798982c
fix(awk): keep command pipes open across records
matt-dz May 14, 2026
77f2b4e
fix(awk): handle dynamic command pipe close lookahead
matt-dz May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion SHELL_FEATURES.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The in-shell `help` command mirrors these feature categories: run `help` for a c

## Builtins

- ✅ `awk [-F SEP] [-v NAME=VALUE] ['PROGRAM'|-f PROGRAM-FILE] [FILE]...` — pattern scanning and text processing; supports BEGIN/main/END rules, fields and field mutation (`$0`, `$1`, `$NF`), `NF`/`NR`/`FNR`/`FILENAME`, `FS`/`OFS`/`ORS`, regex `FS`, `print`, `printf`, scalar and associative array assignment, `split`, `in`, `delete`, `for`, `while`, `break`, `continue`, range patterns, arithmetic/comparison/boolean expressions, regex patterns and `~`/`!~`, string concatenation, `if`/`else`, `next`, `ENVIRON`, and scalar builtins (`length`, `substr`, `index`, `tolower`, `toupper`, `int`); `system()`, command pipes, output redirection, `getline`, user-defined functions, and many POSIX/GNU awk builtins remain rejected or deferred
- ✅ `awk [-F SEP] [-v NAME=VALUE] ['PROGRAM'|-f PROGRAM-FILE] [FILE]...` — pattern scanning and text processing; supports BEGIN/main/END rules, fields and field mutation (`$0`, `$1`, `$NF`), `NF`/`NR`/`FNR`/`FILENAME`, `FS`/`RS`/`OFS`/`ORS`/`SUBSEP`, `RSTART`/`RLENGTH`, regex `FS`, single-character `RS`, `IGNORECASE`, `print`, `printf`, `sprintf`, scalar and associative array assignment, composite array keys, `split`, `sub`, `gsub`, `gensub`, `match` with capture arrays, `strtonum`, `asorti`, `in`, `delete`, `for`, `while`, `break`, `continue`, `exit`, range patterns, arithmetic/comparison/boolean/ternary expressions, regex patterns and `~`/`!~`, string concatenation, `if`/`else`, `next`, `ENVIRON`, user-defined functions with `return` and scalar or array parameters, current/file/command-pipe `getline`, output command pipes through rshell builtins, and scalar builtins (`length`, `substr`, `index`, `tolower`, `toupper`, `int`); `system()`, file output redirection, ARGV/ARGC mutation, BEGINFILE/ENDFILE, `nextfile`, include/load, namespaces, indirect calls, FIELDWIDTHS/FPAT/CSV mode, PROCINFO/SYMTAB/FUNCTAB, extension loading, and many POSIX/GNU awk utility builtins remain rejected or deferred
- ✅ `break` — exit the innermost `for` loop
- ✅ `cat [-AbeEnstTuv] [FILE]...` — concatenate files to stdout; supports line numbering, blank squeezing, and non-printing character display
- ✅ `continue` — skip to the next iteration of the innermost `for` loop
Expand Down
4 changes: 4 additions & 0 deletions analysis/symbols_builtins.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ package analysis
var builtinPerCommandSymbols = map[string][]string{
"awk": {
"bufio.NewScanner", // 🟢 line-by-line record reading; no write or exec capability.
"bufio.Scanner", // 🟢 scanner type retained for incremental getline state; no write or exec capability.
"bytes.Buffer", // 🟢 in-memory command pipe buffer; no filesystem/network/exec side effects.
"bytes.NewReader", // 🟢 wraps buffered command-pipe bytes as stdin; pure in-memory, no I/O.
"context.Context", // 🟢 deadline/cancellation plumbing; pure interface, no side effects.
"errors.Is", // 🟢 error comparison; pure function, no I/O.
"errors.New", // 🟢 creates a simple error value; pure function, no I/O.
Expand All @@ -37,6 +40,7 @@ var builtinPerCommandSymbols = map[string][]string{
"io.EOF", // 🟢 sentinel error value; pure constant.
"io.NopCloser", // 🟢 wraps a Reader with a no-op Close; no side effects.
"io.ReadCloser", // 🟢 interface type; no side effects.
"io.Reader", // 🟢 interface type for command-pipe stdin; no side effects.
"math/big.Float", // 🟢 arbitrary-precision float type used to convert large awk printf integers; pure in-memory arithmetic.
"math/big.Int", // 🟢 arbitrary-precision integer type used for large awk printf integers; pure in-memory arithmetic.
"math/big.NewInt", // 🟢 constructs an in-memory integer value; pure function, no I/O.
Expand Down
79 changes: 67 additions & 12 deletions builtins/awk/ast.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,14 @@
package awk

type program struct {
rules []rule
rules []rule
functions map[string]*functionDef
}

type functionDef struct {
name string
params []string
body []stmt
}

type ruleKind int
Expand All @@ -29,12 +36,14 @@ type stmt interface {

type printStmt struct {
args []expr
pipe expr
}

func (*printStmt) stmtNode() {}

type printfStmt struct {
args []expr
pipe expr
}

func (*printfStmt) stmtNode() {}
Expand All @@ -43,6 +52,7 @@ type ifStmt struct {
cond expr
thenStmts []stmt
elseStmts []stmt
endsBlock bool
}

func (*ifStmt) stmtNode() {}
Expand All @@ -51,22 +61,25 @@ type forInStmt struct {
varName string
arrayName string
body []stmt
endsBlock bool
}

func (*forInStmt) stmtNode() {}

type forStmt struct {
init expr
cond expr
post expr
body []stmt
init expr
cond expr
post expr
body []stmt
endsBlock bool
}

func (*forStmt) stmtNode() {}

type whileStmt struct {
cond expr
body []stmt
cond expr
body []stmt
endsBlock bool
}

func (*whileStmt) stmtNode() {}
Expand All @@ -75,6 +88,18 @@ type nextStmt struct{}

func (*nextStmt) stmtNode() {}

type exitStmt struct {
status expr
}

func (*exitStmt) stmtNode() {}

type returnStmt struct {
value expr
}

func (*returnStmt) stmtNode() {}

type breakStmt struct{}

func (*breakStmt) stmtNode() {}
Expand All @@ -84,9 +109,9 @@ type continueStmt struct{}
func (*continueStmt) stmtNode() {}

type deleteStmt struct {
name string
index expr
all bool
name string
indices []expr
all bool
}

func (*deleteStmt) stmtNode() {}
Expand Down Expand Up @@ -127,12 +152,18 @@ type varExpr struct {
func (*varExpr) exprNode() {}

type arrayRefExpr struct {
name string
index expr
name string
indices []expr
}

func (*arrayRefExpr) exprNode() {}

type compositeExpr struct {
parts []expr
}

func (*compositeExpr) exprNode() {}

type fieldExpr struct {
index expr
}
Expand Down Expand Up @@ -160,6 +191,14 @@ type binaryExpr struct {

func (*binaryExpr) exprNode() {}

type ternaryExpr struct {
cond expr
then expr
els expr
}

func (*ternaryExpr) exprNode() {}

type rangeExpr struct {
start expr
end expr
Expand Down Expand Up @@ -189,3 +228,19 @@ type callExpr struct {
}

func (*callExpr) exprNode() {}

type getlineSourceKind int

const (
getlineMain getlineSourceKind = iota
getlineFile
getlineCommand
)

type getlineExpr struct {
target expr
source expr
kind getlineSourceKind
}

func (*getlineExpr) exprNode() {}
42 changes: 34 additions & 8 deletions builtins/awk/awk.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,22 @@
// This implements a practical, intentionally restricted awk profile: program
// loading from an inline argument or -f files, -F field
// separators, -v scalar variables, BEGIN/main/END rules, print and printf,
// scalar and associative array assignment, if/else, for/while loops, next,
// arithmetic/comparison/boolean expressions, regex patterns and match
// operators, regex field separators, string concatenation, scalar built-in
// functions, split, delete, ENVIRON, and field/built-in variables such as $0,
// $1, NF, NR, FNR, FILENAME, FS, OFS, and ORS.
// scalar and associative array assignment, composite array keys, if/else,
// for/while loops, next, exit, arithmetic/comparison/boolean/ternary
// expressions, regex patterns and match operators, regex field separators,
// string concatenation, scalar built-in functions, split, sub, gsub, gensub,
// match, sprintf, strtonum, asorti, delete, ENVIRON, IGNORECASE,
// user-defined functions with return and scalar or
// array parameters, and field/built-in variables such as $0, $1, NF, NR, FNR,
// FILENAME, FS, RS, OFS, ORS, SUBSEP, RSTART, and RLENGTH.
//
// Blocked or deferred features include system(), command pipes, output
// redirection, getline, user-defined functions, and many additional POSIX/GNU
// awk builtins.
// Command strings in awk pipes are parsed and executed by rshell under the
// active sandbox. Blocked or deferred features include system(), awk file
// output redirection,
// ARGV/ARGC, BEGINFILE/ENDFILE,
// nextfile, include/load, namespaces, FIELDWIDTHS/FPAT/CSV mode, introspection
// variables such as PROCINFO/SYMTAB/FUNCTAB, indirect calls, and many
// additional POSIX/GNU awk builtins.
package awk

import (
Expand Down Expand Up @@ -139,7 +146,26 @@ func registerFlags(fs *builtins.FlagSet) builtins.HandlerFunc {
func printHelp(callCtx *builtins.CallContext, fs *builtins.FlagSet) {
callCtx.Out("Usage: awk [OPTION]... 'program' [FILE]...\n")
callCtx.Out("Pattern scanning and text processing.\n")
callCtx.Out("This is a practical rshell awk profile, not a full GNU awk clone.\n")
callCtx.Out("With no FILE, or when FILE is -, read standard input.\n\n")

callCtx.Out("Supported profile:\n")
callCtx.Out(" - Inline programs, -f program files, -F separators, -v assignments, FILE args, and - for stdin.\n")
callCtx.Out(" - BEGIN/main/END rules; regex, comparison, boolean, and range patterns.\n")
callCtx.Out(" - Fields and records: $0, $1..$NF, NF, NR, FNR, FILENAME, FS, RS, OFS, ORS, SUBSEP, RSTART, RLENGTH.\n")
callCtx.Out(" - Scalars, associative arrays, composite keys, ENVIRON, IGNORECASE, arithmetic, comparisons, regex match, ternary, and string concatenation.\n")
callCtx.Out(" - if/else, for, for-in, while, break, continue, next, exit, and user-defined functions with return.\n")
callCtx.Out(" - print, printf, sprintf, length, substr, index, tolower, toupper, int, split, sub, gsub, gensub, match, strtonum, asorti, delete, and close.\n")
callCtx.Out(" - Output command pipes such as print x | \"sort\" and rshell command strings such as print x | \"cat | sort\".\n")
callCtx.Out(" - getline, getline var, getline var < file, and \"cmd\" | getline var; file reads use rshell path policy and command strings run through rshell.\n\n")

callCtx.Out("Not supported:\n")
callCtx.Out(" - system(). Use supported awk command pipes/getline pipes instead; command strings run through rshell and its active sandbox.\n")
callCtx.Out(" - print/printf file output redirection to file targets, such as print x > \"file\" or printf ... >> \"file\". Output command pipes remain supported and their command strings follow normal rshell policy.\n")
callCtx.Out(" - ARGV/ARGC mutation, BEGINFILE/ENDFILE, nextfile, do/while, switch, include/load, namespaces, and indirect function calls.\n")
callCtx.Out(" - GNU awk CSV mode, FIELDWIDTHS, FPAT, PROCINFO, SYMTAB, FUNCTAB, typed regexps, and extension loading.\n")
callCtx.Out(" - Many GNU/POSIX utility builtins are intentionally absent, including asort, patsplit, math/time/random helpers, bitwise, typeof, and i18n functions.\n\n")

fs.SetOutput(callCtx.Stdout)
fs.PrintDefaults()
}
Expand Down
Loading