feat(awk): add sandboxed awk builtin with review-fix-loop tool by AlexandreYang · Pull Request #218 · DataDog/rshell

AlexandreYang · 2026-05-04T08:42:40Z

What does this PR do?

Implements awk as a sandboxed builtin command in the restricted shell interpreter. Also adds a review-fix-loop developer tool that automates iterative PR review cycles.

awk builtin

Full POSIX awk: lexer, parser, AST, interpreter, and printf formatting
Sandboxed by design: blocks getline, system(), close(), pipe redirects (|), file redirects (>, >>, <), fflush(), user-defined functions, and ENVIRON access
Memory-safe: bounded record size, output size cap, OFS/ORS length caps, printf precision cap
Arithmetic, string, array, pattern/action, BEGIN/END, built-in functions (split, sub, gsub, match, sprintf, etc.)
-F field separator, -v variable assignments, -f program file
GNU compat tests, fuzz tests, hardening tests, pentest integration tests
~100 scenario tests covering arithmetic, arrays, builtins, patterns, printf, security blocks, and edge cases

review-fix-loop tool (`tools/review-fix-loop/`)

Go CLI that drives an automated review → fix → re-review loop on a GitHub PR
Calls the alex-code-review and address-pr-comments Claude Code skills in sequence
Converges when review finds no new issues (configurable clean-streak threshold)
Loop guard prevents infinite cycles; verbose log written to temp file
Unit-tested pure functions

Testing

Unit tests: builtins/tests/awk/
Fuzz tests: builtins/tests/awk/awk_fuzz_test.go
Hardening tests: builtins/tests/awk/awk_hardening_test.go
Pentest integration: interp/builtin_awk_pentest_test.go
Scenario tests: tests/scenarios/cmd/awk/ (~100 YAML files, bash-compared)

Checklist

Tests added/updated
Documentation updated (SHELL_FEATURES.md)

AlexandreYang · 2026-05-04T10:50:46Z

@codex review this PR

chatgpt-codex-connector · 2026-05-04T10:51:53Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AlexandreYang

Review Summary

This PR implements a sandboxed awk builtin — a full custom lexer, parser, and interpreter written in Go with no external awk binary dependency. The implementation is large (~2500 lines across 9 files) and well-structured. Security blocking (system(), getline, redirects, ENVIRON) is done at parse time, which is the correct approach. Memory limits, loop iteration caps, and context cancellation are properly threaded through. The RE2 regex engine is used throughout, eliminating ReDoS risk.

Overall assessment: needs fixes — two P1 correctness/safety issues should be addressed before merge.

Findings Summary

#	File	Finding
1	`builtins/awk/printf.go:145`	`printf` literal-width cap is `> 1<<30` — a width of exactly `1<<30` (1 GiB) is allowed and causes `strings.Repeat` to allocate ~1 GiB before the 1 MiB output cap fires
2	`builtins/awk/printf.go:48-55`	``-width from awk expression has no cap — `printf "%s\n", 2000000000, "x"` causes a ~2 GiB `strings.Repeat` allocation
3	`builtins/awk/interp.go:96+431`	`-F'\t'` (command-line backslash-t) does not split on tab — bash compat divergence
4	`builtins/awk/builtins.go:166`	`index(s, "")` returns `0` but GNU awk/gawk (bash) returns `1`
5	`tests/scenarios/cmd/awk/flags/`	Missing scenario test for `-F'\t'` (command-line tab separator)
6	`builtins/awk/interp.go:502`	`StatFile` failure causes the file to be treated as non-regular (capped at 256 MiB read) — safe but undocumented behaviour

Detailed Findings

Finding 1 — Printf literal width cap allows ~1 GiB allocation

Location: builtins/awk/printf.go, parsePrintfSpec, width-parsing loop (line ~145)

Description: The guard if spec.width > 1<<30 is exclusive, so width == 1<<30 (1,073,741,824) is accepted. padString/padNumber then call strings.Repeat(" ", padCount) where padCount can be up to ~1 GiB. This strings.Repeat call allocates the full padding string before the sb.Len() > MaxStringBytes guard in awkSprintf ever fires.

Proof-of-concept:

awk 'BEGIN { printf "%1073741824s\n", "x" }'
# Attempts to allocate ~1 GiB of spaces

Remediation: Lower the width cap to MaxStringBytes (or a practical maximum like 8192):

if spec.width > MaxStringBytes {
    return j, spec, fmt.Errorf("printf: width %d exceeds maximum %d", spec.width, MaxStringBytes)
}

Finding 2 — `*`-width from awk expression has no upper bound

Location: builtins/awk/printf.go, awkSprintf, *-width resolution (lines 48–55)

Description: When width is taken from a * placeholder, nextInt returns int(v.toNumber()) with no cap. A large positive value flows directly into padString/padNumber → strings.Repeat.

Proof-of-concept:

awk 'BEGIN { printf "%*s\n", 2000000000, "x" }'
# Tries to allocate ~2 GiB

Remediation: Clamp spec.width after the *-width resolution:

spec.width = w
if w < 0 {
    spec.flagMinus = true
    spec.width = -w
}
if spec.width > MaxStringBytes {
    spec.width = MaxStringBytes
}

Finding 3 — `-F'\t'` does not split on tab

Location: builtins/awk/interp.go, setFS (line 96) + splitFields (line 431)

Description: When the user passes -F'\t', the shell delivers the 2-char string \t (backslash + t) to setFS. setFS special-cases s != "\\t" to avoid compiling it as a regex, but stores it verbatim as r.fs = "\\t". splitFields checks r.fs == "\t" (1-byte actual tab), which does not match the 2-byte "\\t", so it falls through to strings.Split(rec, "\\t") — splitting on a literal \t string instead of tab.

Bash awk (gawk) expands backslash-escapes in FS:

printf 'a\tb\tc\n' | awk -F'\t' '{print $2}'  # → "b"

Our implementation produces $1 = "a\tb\tc" (no split).

Remediation: Expand common backslash escapes in setFS before storing:

func (r *runtime) setFS(s string) error {
    // Expand common backslash escapes (gawk/mawk behaviour for command-line FS).
    switch s {
    case "\\t":
        s = "\t"
    case "\\n":
        s = "\n"
    case "\\r":
        s = "\r"
    }
    // ... rest unchanged

Note: the existing splitFields special-case for r.fs == "\t" already handles the corrected value correctly. Add a scenario test in tests/scenarios/cmd/awk/flags/ (without skip_assert_against_bash) to lock this in.

Finding 4 — `index(s, "")` returns 0 instead of 1

Location: builtins/awk/builtins.go, bIndex (line 166)

Description: Our code returns 0 for an empty needle, matching the POSIX "unspecified" allowance. However GNU awk (the reference used by bash) returns 1:

docker run --rm debian:bookworm-slim bash -c 'awk "BEGIN { print index(\"abc\", \"\") }"'
# → 1

Since scenario tests run against bash by default, any future scenario test covering this edge case will fail.

Remediation:

if t == "" {
    return numValue(1), nil
}

Finding 5 — Missing scenario test for `-F'\t'`

Location: tests/scenarios/cmd/awk/flags/

Description: fs_tab.yaml covers FS = "\t" set inside the program (which goes through the awk lexer), but there is no test for -F'\t' on the command line (which bypasses the lexer and hits setFS directly). Once Finding 3 is fixed, this path should be regression-tested.

Remediation: Add a scenario tests/scenarios/cmd/awk/flags/fs_tab_flag.yaml:

description: -F'\t' splits tab-separated input correctly.
setup:
  files:
    - path: tsv.txt
      content: "name\tage\tcity\nalice\t30\tNYC\n"
input:
  allowed_paths: ["$DIR"]
  script: |+
    awk -F'\t' 'NR==2 {print $2}' tsv.txt
expect:
  stdout: |+
    30
  stderr: ""
  exit_code: 0

Test Coverage Summary

Code path	Scenario test	Go test	Status
`system()` blocked	`security/system_blocked.yaml`	`TestAwkPentestSystemCaseVariations`	Covered
`print > file` blocked	`security/redirect_write_blocked.yaml`	`TestAwkPentestRedirectVariations`	Covered
`getline` blocked	`security/getline_blocked.yaml`	`TestAwkPentestGetlineVariations`	Covered
`ENVIRON` blocked	`security/environ_blocked.yaml`	—	Covered
Loop iteration cap	—	`TestAwkPentestInfiniteWhile`	Covered
Array entry cap	—	`TestHardeningArrayEntryLimit`	Covered
String length cap	—	`TestHardeningStringConcatLimit`	Covered
Record size cap	—	`TestAwkPentestLineBeyondCap`	Covered
Deep nesting rejected	—	`TestAwkPentestDeepNestingRejected`	Covered
Printf large literal width OOM	—	—	Missing
Printf `*`-width OOM	—	—	Missing
`-F'\t'` command-line tab	—	—	Missing
`index(s, "")` empty needle	—	—	Missing

Coverage verdict: Gaps found (P1 and P2 paths untested).

Positive Observations

Parse-time blocking for all dangerous constructs is correct and comprehensive.
RE2 engine throughout eliminates ReDoS risk.
Context cancellation is threaded through every loop and I/O read.
callCtx.OpenFile used exclusively — sandbox invariant is intact.
floatToInt64Safe correctly handles NaN/Inf before int64 cast.
capReader properly bounds non-regular-file reads at 256 MiB.
Import allowlist in analysis/symbols_builtins.go is complete and accurate.
Fuzz tests cover program text, input data, field separators, and variable assignments.

AlexandreYang · 2026-05-04T10:57:48Z

Iteration 1 self-review result

Findings by severity:

P1: 2 (printf width OOM — literal-width guard off-by-one allows ~1 GiB allocation; *-width placeholder unbounded)
P2: 3 (-F'\\t' from CLI not splitting on tab; index(s,"") returns 0 instead of 1; missing scenario test for CLI -F'\\t')
P3: 1 (StatFile failure silently degrades to capped reads)

Summary: Two resource-exhaustion bugs (printf width OOM) must be fixed. Tab-FS handling and index() edge case are bash-compatibility issues.

Implements a complete awk interpreter (lexer, parser, evaluator) as a native Go builtin — no external binary. Covers arithmetic, arrays, patterns, printf, and all standard built-in functions, with security restrictions on getline, system, pipe, redirect, and ENVIRON.

…edle - Cap printf literal width at MaxStringBytes (was > 1<<30, off by one allowing 1 GiB allocation before output guard fires) - Clamp *-width after arg resolution to MaxStringBytes (prevents ~2 GiB allocation from large awk expression values) - Expand \t/\n/\r backslash escapes in setFS so -F'\t' splits on actual tab (matches gawk/mawk behaviour) - Return 1 for index(s,"") empty needle (GNU awk compatibility) - Add fs_tab_flag.yaml scenario test for command-line -F'\t' path

AlexandreYang · 2026-05-04T11:01:01Z

[Claude Sonnet 4.6] Addressed all review findings from self-review (review #4219389257):

Finding 1 (P1) — Printf literal-width cap: Changed guard from > 1<<30 to > MaxStringBytes (1 MiB) in parsePrintfSpec. The off-by-one allowed a 1 GiB width to slip through.
Finding 2 (P1) — *-width OOM: Added if spec.width > MaxStringBytes { spec.width = MaxStringBytes } after *-width resolution in awkSprintf. Clamps large awk expression values before they reach strings.Repeat.
Finding 3 (P2) — -F'\t' command-line tab: Added backslash-escape expansion (\t→\t, \n→\n, \r→\r) in setFS before storing, matching gawk/mawk behaviour. Removed the now-unnecessary s != "\\t" special-case in the regex-compile guard.
Finding 4 (P2) — index(s, "") empty needle: Changed return value from 0 to 1 to match GNU awk behaviour.
Finding 5 (P2) — Missing -F'\t' scenario test: Added tests/scenarios/cmd/awk/flags/fs_tab_flag.yaml covering the command-line -F'\t' path (not skip_assert_against_bash).
Finding 6 (P3) — StatFile failure silent degradation: Acknowledged — this is safe (falls back to bounded 256 MiB cap) and the code path is non-user-controllable. No change needed.

All tests pass.

AlexandreYang · 2026-05-04T11:42:04Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 294aa22842

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AlexandreYang · 2026-05-04T11:49:15Z

@codex review this PR

AlexandreYang · 2026-05-04T11:52:52Z

@codex review this PR

AlexandreYang · 2026-05-04T11:53:58Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5ec6f1d9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ess visibility

AlexandreYang · 2026-05-04T12:00:26Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f67e090753

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AlexandreYang · 2026-05-04T12:10:28Z

@codex review this PR

AlexandreYang · 2026-05-04T12:16:45Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4f165a5b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…unds

AlexandreYang · 2026-05-04T12:18:13Z

@codex review this PR

… --verbose flag

AlexandreYang

Review Summary

Scope: Full sandboxed awk builtin (~10 K lines across lexer, parser, AST, interpreter, printf engine, value system, hardening tests, ~100 scenario tests, pentest integration tests) plus a review-fix-loop developer tool and a minor tail cap reduction.

Overall assessment: safe to merge. The sandbox is correctly implemented — no direct filesystem access bypasses callCtx.OpenFile, all dangerous constructs (system(), getline, ENVIRON, redirects, pipes, user functions) are blocked at parse time, and memory/loop/output caps are thorough. The implementation is impressive in scope and quality. The findings below are all P2/P3.

Finding Summary

#	File	Finding
1	`builtins/awk/builtins.go:331`	`gsub` match-count cap not covered by a test
2	`tests/scenarios/cmd/awk/security/`	No scenario test for `-f progfile` being rejected
3	`builtins/awk/interp.go:604`	`splitFields` silently truncates to `MaxFields` without an error
4	`builtins/awk/interp.go:75,77`	`callDepth` is shared between `execStmt` and `evalExpr`, halving the effective recursion budget

Test Coverage Table

Code path	Scenario test	Go test	Status
`system()` blocked	`security/system_blocked.yaml`	`TestAwkPentestSystemCaseVariations`	Covered
`getline` blocked	`security/getline_blocked.yaml`	`TestAwkPentestGetlineVariations`	Covered
`ENVIRON` blocked (all forms)	6× `environ_*.yaml`	—	Covered
redirects/pipes blocked	`redirect_*.yaml`, `pipe_blocked.yaml`	`TestAwkPentestRedirectVariations`	Covered
user-defined functions blocked	`function_blocked.yaml`	—	Covered
`-f progfile` blocked	none	`TestAwkPentestUnknownFlags` (covers `--no-such` but not `-f`)	Missing scenario
Record cap (1 MiB)	—	`TestAwkPentestLineBeyondCap`	Covered
Array entry cap	—	`TestAwkPentestArrayCap`	Covered
Array total byte cap	—	(implicit via above)	Covered
String concat cap	—	`TestAwkPentestStringDoublingCap`	Covered
Loop iteration cap	—	`TestAwkPentestInfiniteWhile`	Covered
gsub match-count cap (>1M matches)	none	none	Missing
`/dev/zero` read cap	—	`TestAwkPentestDevZeroInfiniteStream`	Covered
Deep nesting rejected	—	`TestAwkPentestDeepNestingRejected`	Covered
Printf precision cap	`security/printf_precision_cap.yaml`	—	Covered
Path traversal	—	`TestHardeningPathTraversal`	Covered

Positive Observations

No direct filesystem access — every file open in the awk implementation goes through callCtx.OpenFile, which enforces AllowedPaths. The single "os" import in interp.go is used exclusively for the os.O_RDONLY constant. Excellent discipline.
Parse-time blocking — system(), getline, ENVIRON, redirects, pipes, and user-defined functions are all rejected at the lexer/parser level before any runtime code runs, preventing any runtime bypass via creative expression evaluation.
RE2 for all regex — regexp.Compile (RE2 engine) is used throughout; ReDoS attacks are structurally impossible.
Thorough memory caps — MaxRecordBytes, MaxStringBytes, MaxArrayEntries, MaxArrayTotalBytes, MaxLoopIterations, MaxTotalReadBytes, MaxFields all have independent guards with clear error messages.
floatToInt64Safe — NaN/±Inf are handled before every int64() cast, preventing UB from the Go spec's implementation-defined conversion behaviour.
TOCTOU race documented — the race between StatFile and OpenFile in openInput is clearly documented with correct analysis: it only affects the non-regular-file cap (defence-in-depth), not the sandbox itself.
SetInterspersed(false) — correctly prevents flag injection via arguments after the program text.
Exit code mod-256 wrapping — uint8(n & 0xFF) matches POSIX/mawk semantics for negative exit codes.
Deterministic for (k in arr) iteration — sorted key snapshot prevents nondeterministic output across runs.

AlexandreYang · 2026-05-05T03:37:28Z

[AI Generated] Self-review iteration 24 complete — see inline review comments for findings.

AlexandreYang · 2026-05-05T03:44:48Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14de466c5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T03:50:29Z

+	case 'g', 'G', 'f', 'F', 'e', 'E':
+		if prec < 0 {
+			prec = 6
+		}
+		return strconv.FormatFloat(f, byte(verb), prec, 64), true


Map %F before formatting OFMT/CONVFMT numbers

When a script sets OFMT or CONVFMT to %F, numeric conversion now returns the literal string %F because strconv.FormatFloat does not implement an uppercase F verb. This affects ordinary awk programs such as awk 'BEGIN{OFMT="%F"; print 1.25}', which GNU/mawk print as 1.250000 but this implementation prints %F; fold F to f here as awkSprintf already does for printf.

Useful? React with 👍 / 👎.

AlexandreYang

Review Summary

This PR introduces a sandboxed awk builtin with a full POSIX-subset implementation (lexer → parser → AST → interpreter), plus a developer review-fix-loop automation tool. I reviewed the full diff — all new files plus the tail.go doc-fix.

Overall Assessment: ✅ Safe to merge

The security architecture is sound. All dangerous awk constructs (system(), getline, redirects, pipes, ENVIRON, user-defined functions, -f progfile) are blocked at parse time via blockedNames, before any execution occurs. File access is exclusively routed through callCtx.OpenFile/callCtx.StatFile — no direct os.Open/os.Stat calls exist in the awk package. Memory safety is well-guarded with MaxRecordBytes, MaxStringBytes, MaxArrayEntries, MaxArrayTotalBytes, and MaxLoopIterations. RE2 (linear-time, no backtracking) eliminates ReDoS. Test coverage is excellent: ~100 scenario YAML tests, fuzz corpus, hardening tests, and pentest integration tests.

Only minor P3 observations follow.

Findings Summary

#	File	Finding
1	`tools/review-fix-loop/loop.go:82–84`	Informal "no data race today" comment — correct but slightly misleading
2	`builtins/awk/interp.go:1097–1109`	`indexKey` counts N SUBSEP separators instead of N-1 (documented, conservative, safe)
3	`tests/scenarios/cmd/awk/`	No scenario test for `delete arr` (whole-array delete) — covered by Go test `TestArrayDeleteAll`
4	`tests/scenarios/cmd/awk/security/`	No scenario test for `ARGV`/`ARGC` in program text (only positional `ENVIRON=` is tested)

Positive Observations

Sandbox integrity: Every file read goes through callCtx.OpenFile; no os.Open/os.Stat anywhere in the awk package. The single os import is for the O_RDONLY constant only.
Parse-time rejection: system(), getline, ENVIRON, ARGV, ARGC, close, fflush, user functions, and all I/O redirects are rejected in the parser before any runtime execution — defense-in-depth against obfuscated inputs.
RE2 regex: compileERE routes through regexp.Compile (Go's RE2 engine), giving linear-time guarantees. ReDoS is structurally impossible.
Memory caps: The layered approach (per-record, per-string, per-array-entry, per-array-total, per-loop-iteration) is well-thought-out. The arrayTotalBytes accounting correctly handles large-key attacks that stay within the entry count limit.
gsub match-count cap: FindAllStringSubmatchIndex result count is checked against MaxLoopIterations before the replacement loop, preventing DoS via dense-match patterns on large strings.
TOCTOU race in openInput between StatFile and OpenFile is explicitly documented with a clear justification — the AllowedPaths check inside OpenFile is the true sandbox guard; the isRegular flag only controls the 256 MiB streaming cap, whose bypass carries no security consequence.
Intentional divergences from bash (division-by-zero as error, sorted for-in, RS="" paragraph mode fallback, byte-indexed string functions) are all properly marked skip_assert_against_bash: true with matching documentation in SHELL_FEATURES.md.
review-fix-loop tool: All shell commands use exec.Command with argument slices (no shell injection). The prRef integer is formatted with %d. File paths use filepath.Join. The concurrent code-review + triggerCodex goroutines are race-free because triggerCodex is a standalone subprocess that never touches the shared agent state.

AlexandreYang · 2026-05-05T03:54:12Z

[AI Generated] Self-review iteration 25 complete — see inline review comments for findings.

AlexandreYang · 2026-05-05T03:58:59Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14de466c5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T04:05:23Z

+		// as non-clean: if a review or fix agent fails, we cannot be confident the
+		// iteration is valid even when thread counts and CI appear clean.
+		agentErr := reviewAgentErr != nil || addrAgentErr != nil || fixCIAgentErr != nil
+		iterClean := reviewFindings == 0 && unresolved == 0 && ciClean && unresolvedErr == nil && ciErr == nil && !agentErr


Treat failed review counts as non-clean

When either the pre- or post-review countUnresolvedThreads call fails (for example due to a transient GitHub GraphQL/rate-limit/auth error), the zero values for beforeReview/afterReview still produce reviewFindings == 0, and this clean check only gates on the later unresolvedErr. If the final unresolved lookup and CI pass, the loop can advance the clean streak and eventually report CLEAN without ever measuring whether the review step opened new threads; include beforeErr/afterErr in the non-clean condition so unknown review results cannot be treated as clean.

Useful? React with 👍 / 👎.

AlexandreYang

Review Summary

This PR introduces a sandboxed awk builtin (plus a review-fix-loop development tool). The awk implementation is large (~1,800 LOC interpreter + parser + lexer + printf engine) with extensive test coverage (scenario YAML tests, Go unit/hardening/fuzz/pentest tests). Overall the security posture is strong.

Overall Assessment: ✅ Safe to merge — with minor notes

Security verdict: No exploitable vulnerabilities found. The sandbox perimeter is intact:

system(), ENVIRON, getline, | pipes, and all file redirections are rejected at parse time — no runtime bypass possible.
All file I/O goes through callCtx.OpenFile (AllowedPaths enforced); no direct os.Open/os.Stat calls outside the sandbox API.
RE2 engine prevents ReDoS.
Memory bounds are comprehensive: per-record cap (1 MiB), string cap (1 MiB), array entries cap (1 M), array total-bytes cap (256 MiB), loop iteration cap (1 M).
Context cancellation is checked every loop iteration, every statement, and every record.

Summary of Findings

#	File	Finding
1	`builtins/awk/interp.go:374`	Misleading comment: "gawk exits 0 instead" — gawk actually uses END's explicit `exit()` code (which may be 0, 5, etc.); gawk doesn't always exit 0
2	`builtins/awk/interp.go:459`	TOCTOU between `StatFile` and `OpenFile`: acknowledged in comments but `regular=true` is still possible via symlink swap, bypassing the 256 MiB `capReader` guard
3	`tests/scenarios/cmd/awk/errors/missing_file_continues.yaml`	Comment says "mawk stops; rshell follows gawk", but tested mawk on debian:bookworm-slim also stops at the first missing file — comment may be inaccurate about gawk difference
4	`builtins/awk/interp.go:459`	`capReader.remaining` is initialised as `MaxTotalReadBytes - r.totalReadBytes` but `r.totalReadBytes` starts at 0 for each new runtime; for multiple non-regular files the cumulative cap is shared correctly — no bug, but the field name `remaining` is misleading since it tracks the running window, not global remaining
5	`builtins/awk/interp.go`	No explicit cap on the number of rules in a program; a maximally crafted program with many rules could cause O(rules × records) work

Positive Observations

The blocked-names list (system, ENVIRON, getline, function, close, fflush, ARGV, ARGC) is checked in every parse path: parsePrimary, parseDelete, parseForIn, parseCall, parseInExpr, parsePrimary (tkIdent). Comprehensive.
The bSub (gsub) zero-width match handling is correct: Go RE2's FindAllStringSubmatchIndex returns non-overlapping matches at distinct byte positions, so the last cursor advances correctly between empty matches.
rebuildRecord pre-checks OFS × NF before calling strings.Join to prevent a transient allocation of O(ofsLen × NF) bytes before the cap fires.
floatToInt64Safe correctly handles NaN/±Inf before any int64 cast.
All intentional divergences from bash (for-in sorted order, division-by-zero fatal, RS="" paragraph mode, byte-indexed string functions, index(s,"")=1) are documented in SHELL_FEATURES.md and covered by skip_assert_against_bash: true scenario tests.
The deterministicRand (xorshift64*) avoids math/rand global state and is clearly scoped per-runtime.
Fuzz corpus seeds are committed (builtins/tests/awk/testdata/fuzz/).
interp/builtin_awk_pentest_test.go is thorough: exercises /dev/zero, deep nesting, NF overflow, string doubling, path traversal, and all security-blocked forms.

AlexandreYang · 2026-05-05T04:11:22Z

[AI Generated] Self-review iteration 26 complete — see inline review comments for findings.

AlexandreYang · 2026-05-05T04:16:15Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14de466c5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T04:21:34Z

+}
+
+func (p *parser) parseAssignment() (expr, error) {
+	lhs, err := p.parseTernary()


Parse assignments on boolean RHS

For expressions like ok && x = 1 or cond ? x = 1 : 0, mawk/gawk parse the assignment as part of the RHS/branch; this implementation parses the whole ternary/logical expression as lhs before looking for =, so awk 'BEGIN{ if (1&&x=1) print x }' is rejected with invalid assignment target instead of printing 1. This breaks awk idioms that assign inside conditions unless users add extra parentheses.

Useful? React with 👍 / 👎.

AlexandreYang

Review Summary

This PR implements a sandboxed awk builtin (~10 400 lines added) and a review-fix-loop developer tool (~1 400 lines). The review focused primarily on sandbox integrity, memory safety, bash compatibility, and correctness of the awk implementation.

Overall Assessment: ✅ Safe to merge with minor improvements

The security model is sound. The critical invariant — all file access goes through callCtx.OpenFile (not raw os.Open/os.Stat) — is correctly maintained throughout. The parser-level blocking of system(), getline, ENVIRON, file redirects, and pipe redirects is robust and tested. Memory bounds (record size, string size, array entries, array total bytes, loop iterations, gsub match count) are comprehensive and correctly enforced. The RE2 engine eliminates ReDoS risk.

Four findings are raised: two P2 (unintentional bash-compat divergences) and two P3 (code quality). No P0 or P1 issues found.

Findings Summary

#	File	Finding
1	`builtins/awk/builtins.go:253`	`split()` with empty-matching regex diverges from mawk behavior
2	`builtins/awk/interp.go:1294`	`NF=-N` silently clamps to 0 (mawk errors; gawk clamps — diverges from bash/mawk)
3	`builtins/awk/interp.go:1021`	Variable `v` shadows outer `*deleteStmt v` in delete-array loop
4	`tests/scenarios/cmd/awk/`	Missing scenario tests for `NF=0` clearing `$0`, and `NF=-N` behavior

Positive Observations

Sandbox file access: All file I/O routes through callCtx.OpenFile / callCtx.StatFile. No direct os.Open, os.Stat, or ioutil calls anywhere in the awk package. This is the most critical invariant and it is correctly maintained. ✅
Parser-level security blocks: system(), ENVIRON, getline (all forms), file redirects (>, >>, |), function, close, fflush are rejected at parse time — before any runtime execution — making them immune to any runtime bypass attempts. ✅
Multiple memory caps working in tandem: per-record (1 MiB), per-string (1 MiB), per-array (1 M entries + 256 MiB total bytes), gsub match count (1 M), loop iterations (1 M), and total non-regular-file reads (256 MiB). The caps are independently enforced and tested. ✅
RE2 / linear-time regex: All regex compilation uses Go's regexp package (RE2), making ReDoS impossible. ✅
TOCTOU race acknowledged and acceptable: The StatFile → OpenFile race for the isRegular flag is well-commented. The worst case is losing the 256 MiB cap for that file; AllowedPaths inside OpenFile still enforces path boundaries. ✅
Parse-depth cap: maxParseDepth = 256 in the recursive-descent parser prevents stack-overflow DoS from deeply nested programs. Tested in both the hardening tests and pentest suite. ✅
Deterministic PRNG: Custom xorshift64* ensures rand()/srand() are reproducible in tests without exposing system state. ✅
Comprehensive test suite: ~100 YAML scenario tests, fuzz corpus, hardening tests, GNU compat tests, and a full pentest integration test covering all blocked constructs, integer edge cases, resource exhaustion, and binary inputs. ✅

AlexandreYang · 2026-05-05T04:25:09Z

[AI Generated] Self-review iteration 27 complete — see inline review comments for findings.

AlexandreYang · 2026-05-05T04:39:49Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14de466c5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T04:43:42Z

+	arr := r.arrays[v.arrayVar]
+	if arr == nil {
+		return numValue(0), nil


Preserve scalar-as-array errors for membership tests

When the right-hand side of an in expression has already been used as a scalar, this returns 0 instead of the same fatal scalar-as-array error enforced by index assignment, delete, and for (k in x). For example, awk 'BEGIN { x = 1; print ("a" in x); print "after" }' now prints 0 and continues, while awk treats this as an invalid scalar/array namespace reuse; scripts that rely on that error will silently take the false branch and keep running.

Useful? React with 👍 / 👎.

AlexandreYang

Review Summary

This PR adds a sandboxed awk builtin to the restricted shell plus a developer review-fix-loop tool. The awk implementation is a substantial piece of work: a hand-written lexer, recursive-descent parser, tree-walking interpreter, and a custom printf engine — all within the sandbox invariant.

Overall assessment: Safe to merge with minor fixes. No P0 sandbox-escape or P1 security/correctness issues found. The sandbox invariants are correctly upheld (all file access goes through callCtx.OpenFile/callCtx.StatFile; all dangerous awk constructs are blocked at parse time with multiple defence-in-depth checks; resource limits are consistently applied). The main findings are two P2 unintentional bash-divergence issues and a pair of P3 concerns.

Findings Summary

#	File	Finding
1	`builtins/awk/interp.go:339–395`	`next` in BEGIN/END propagates raw sentinel error instead of a proper runtime error message
2	`builtins/awk/interp.go:359–393`	Directory-as-input exits 1 while mawk exits 2 (unintentional bash divergence, no `skip_assert_against_bash`)
3	`builtins/awk/printf.go:195–210`	`printf "%.*d", MaxStringBytes, 1` without trailing `\n` allocates exactly 1 MiB and bypasses the cap check
4	`tests/scenarios/cmd/awk/printf/unsigned_negative_clamp.yaml`	Description says "matching gawk" but the behaviour actually matches mawk

Coverage Table

Code path	Scenario test	Go test	Status
`system()` / `getline` / `ENVIRON` / redirects blocked at parse	security/*.yaml	awk_test.go	Covered
`print > "file"` / `>> "file"` / `\| cmd` blocked	security/*.yaml	awk_test.go	Covered
MaxRecordBytes per-record cap	pentest (lineBeyondCap)	awk_test.go:TestLongLineRejected	Covered
MaxArrayEntries / MaxArrayTotalBytes caps	hardening	awk_hardening_test.go	Covered
MaxLoopIterations per-loop cap	awk_test.go:TestLoopIterationLimit	pentest	Covered
MaxStringBytes concat cap	hardening	awk_hardening_test.go	Covered
MaxGsubMatches cap	—	—	Missing scenario test
gsub empty-pattern (zero-width match)	builtins_fn/gsub_empty_pattern.yaml	—	Covered
/dev/zero infinite stream → capReader	—	pentest:TestAwkPentestDevZeroInfiniteStream	Covered
`for (k in ENVIRON)` blocked	security/environ_for_in_blocked.yaml	awk_test.go	Covered
`delete ENVIRON` blocked	security/environ_delete_blocked.yaml	—	Covered
`(k in ENVIRON)` blocked	security/environ_in_expr_blocked.yaml	—	Covered
Deeply-nested expr parse-depth cap	—	pentest + hardening	Covered
TOCTOU StatFile→OpenFile (accepted risk)	—	—	Documented in code
`next` in BEGIN → propagates as error	edge/next_in_begin.yaml	—	Covered (skip_assert)
Directory-as-input → exit 1 vs mawk's 2	—	hardening + pentest	P2 divergence
`printf "%.*d", MaxStringBytes, 1` cap	security/printf_precision_cap.yaml (has \n)	pentest	Partially covered

Positive Observations

Complete sandbox coverage: every dangerous awk construct (system(), getline, ENVIRON, file redirects, pipes, user-defined functions) is rejected at parse time with clear error messages. Multiple defence-in-depth checks (blockedNames, parsePrintLike/checkNoRedirect, storeScalar) ensure blocked operations cannot slip through.
RE2 safety: all regexes compile through Go's regexp (RE2 engine), making ReDoS attacks structurally impossible.
Tight resource caps: per-record 1 MiB, per-string 1 MiB, arrays 1 M entries + 256 MiB total bytes, 1 M loop iterations, 1 M gsub matches, 256 MiB total non-regular-file read, parser depth 256, runtime depth 1024.
No direct os.* file access: all file I/O routes through callCtx.OpenFile/callCtx.StatFile — the single most critical invariant is upheld throughout.
Context cancellation: every tight loop checks ctx.Err() at least every 1024 iterations; the gsub match-processing loop also checks it; processFile's record loop checks it each iteration.
Deterministic for (k in a) iteration: sorted keys prevent output non-determinism in scripts.

AlexandreYang · 2026-05-05T04:45:56Z

[AI Generated] Self-review iteration 28 complete — see inline review comments for findings.

AlexandreYang · 2026-05-05T04:55:25Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14de466c5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T05:01:11Z

+	arr := r.arrays[v.arrayVar]
+	if arr == nil {
+		return numValue(0), nil


Evaluate in keys before checking the array map

When the right-hand array has not been materialized yet, this returns before evaluating the left-hand key expression. In awk the key expression still runs, so cases like awk 'BEGIN { print (++i in a), i }' should print 0 1; this implementation prints 0 with i still uninitialized. The same early return also skips the scalar/array conflict that other array paths enforce for x=1; (1 in x), so moving key evaluation and scalar checks before the nil-array fast path would avoid silently changing program behavior.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-05T05:01:11Z

+		if fs.Changed("field-separator") {
+			if err := runtime.setFS(*fieldSep); err != nil {
+				callCtx.Errf("awk: -F: %s\n", err)
+				return builtins.Result{Code: 1}
+			}
+		}
+		for _, a := range assignments {


Preserve command-line order between -F and -v

Applying -F before all -v assignments loses awk's left-to-right option semantics when both write FS. I checked GNU awk locally: awk -v FS=: -F, 'BEGIN{print FS}' prints ,, while awk -F, -v FS=: 'BEGIN{print FS}' prints :; this implementation prints : for both because the -v loop always runs after setFS, which also makes field splitting use the wrong separator for scripts that rely on option order.

Useful? React with 👍 / 👎.

AlexandreYang

Security Review — PR #218: `feat(awk): add sandboxed awk builtin`

Overall assessment: ✅ Safe to merge — no P0 or P1 findings. The implementation shows careful, layered security design with comprehensive testing. A few P2/P3 observations are noted below.

Summary Table

#	File	Finding
1	`builtins/awk/interp.go:773`	Known TOCTOU race between `StatFile` and `OpenFile`
2	`builtins/awk/interp.go:536`	`strings.Contains(scErr.Error(), "is a directory")` — fragile string match
3	`builtins/awk/ast.go:30,212`	`numExpr.src` and `regexExpr.src` fields are stored but never read
4	`builtins/awk/interp.go`	`for-in` over sorted key snapshot does not check `ctx.Err()` during key snapshot build
5	`tests/scenarios/cmd/awk/`	Missing scenario: OFMT/CONVFMT set to a string with very large `%d` width

Findings

Known TOCTOU race between `StatFile` and `OpenFile`

Severity: P2 · Security · Correctness
Location: builtins/awk/interp.go ~L773 (openInput)

The code calls StatFile to determine whether an input is a regular file (to skip the 256 MiB cumulative cap), then calls OpenFile separately. A file could be swapped to a symlink or special file between these two calls, causing isRegular=true and bypassing the read cap for that input.

The comment in the code correctly analyses this and concludes the risk is accepted:

AllowedPaths in OpenFile is the primary sandbox guard — it still fires.
The 256 MiB cap is defense-in-depth only; bypassing it does not unlock new capabilities.
Exploitation requires the attacker to also control the AllowedPaths directory (typically read-only in production).

This analysis is sound. Flagging as P2 only to ensure future readers are aware and so the acceptance is explicit in the review record. No code change required; the existing comment is sufficient documentation.

Fragile `"is a directory"` string match in scanner error path

Severity: P2 · Correctness · Portability
Location: builtins/awk/interp.go ~L536

if errors.Is(scErr, os.ErrInvalid) || strings.Contains(scErr.Error(), "is a directory") {
    return &fileOpenError{...}
}

The strings.Contains(scErr.Error(), "is a directory") match relies on the OS-specific error message string. On Linux this message comes from the kernel and is stable, but on macOS it is "is a directory" (lowercase) and on Windows it would be something different entirely. If the message ever diverges (e.g. a future OS, a translated locale, or a non-standard file system), this path silently falls through to the return fmt.Errorf(...) line, which is treated as a fatal runtime error (exit 1) rather than a non-fatal file-open error (exit 2), changing the observed exit code.

Remediation: Use os.Lstat or inspect the file mode before opening, or wrap the os.Open error with a syscall.EISDIR check:

var pathErr *os.PathError
if errors.As(scErr, &pathErr) && errors.Is(pathErr.Err, syscall.EISDIR) {
    return &fileOpenError{...}
}

Alternatively, since callCtx.StatFile is already called in openInput, you could stat the file and detect directories before scanning starts. This is a P2 (not P1) because the current behavior still fails safely — it just exits with the wrong code on non-Linux platforms.

Dead fields: `numExpr.src` and `regexExpr.src`

Severity: P3 · Code Quality
Location: builtins/awk/ast.go L30 and L212

type numExpr struct {
    val float64
    src string // original lexeme, preserved so toString can use it for OFMT
}

type regexExpr struct {
    re  *regexp.Regexp
    src string
}

Both src fields are set at parse time but never read anywhere in the codebase. The comment on numExpr.src suggests they were intended for OFMT formatting, but formatNumber / printableString use v.f directly. This is harmless but wastes a small amount of memory per AST node. Consider removing them or adding the planned usage.

Key snapshot in `for-in` not context-checked

Severity: P3 · DoS (very minor)
Location: builtins/awk/interp.go (forInStmt case)

keys := make([]string, 0, len(arr))
for k := range arr {
    keys = append(keys, k)
}
slices.Sort(keys)

For an array with MaxArrayEntries (1 M) keys, the snapshot build + sort takes O(N log N) time. ctx.Err() is not checked during the snapshot or sort phases. In practice this is fast (well under 1 s for 1 M string keys on modern hardware), and ctx.Err() is checked on each body iteration immediately after, so execution will be cancelled promptly on the next iteration. The comment in the code correctly notes this. Flagging as P3 only for completeness.

Missing scenario: very-large OFMT/CONVFMT width

Severity: P3 · Test Coverage
Location: tests/scenarios/cmd/awk/

The formatFloatWithFmt function in value.go only handles format strings of the form %[.prec][gGfeEd] and has a precision cap of 64. It silently falls back to %.6g for anything outside that subset (including strings with a width like %1024g or %s). This is safe, but there is no scenario test that exercises OFMT/CONVFMT set to an unusual value (e.g. %1024g, %s, empty string).

Suggested scenario:

description: OFMT set to a non-standard format string falls back to default %.6g.
skip_assert_against_bash: true
input:
  script: |
    awk -v OFMT="%1024g" 'BEGIN { x = 3.14159; print x }'
expect:
  stdout: "3.14159\n"
  exit_code: 0

Positive Observations

🔒 Layered sandbox enforcement: All I/O-capable awk constructs (system(), getline, print > file, print | cmd, close(), fflush(), ENVIRON, user-defined functions) are rejected at parse time — before any code runs. This is the correct approach: a single point of enforcement that cannot be bypassed by runtime tricks.

🔒 RE2 regex engine: All regex compilation goes through Go's regexp package (RE2), which guarantees linear-time matching with no backtracking. ReDoS attacks are structurally impossible.

🔒 Comprehensive resource caps: Record size (1 MiB), string length (1 MiB), array entries (1 M), array total bytes (256 MiB), loop iterations per construct (1 M), gsub matches (1 M), and total reads from non-regular files (256 MiB) — all independently bounded.

🔒 Context propagation: ctx.Err() is checked at every loop iteration in the main record loop, every statement in execBlock, and every iteration of while/for/do-while/for-in body execution. The 30 s shell execution timeout is reliably honoured.

🔒 Parse-time depth limiting: maxParseDepth = 256 prevents stack-overflow attacks via deeply nested expressions. The runtime adds a second, independent maxRuntimeDepth = 1024 guard.

🔒 Integer safety: floatToInt64Safe correctly maps NaN→0 and clamps ±Inf to int64 bounds before conversion, avoiding undefined behaviour from out-of-range float→int casts.

🔒 Array byte budget: The arrayTotalBytes counter is decremented correctly in delete and reset in split, preventing the subtle large-key DoS that entry-count limits alone cannot stop.

🔒 Fuzz + pentest coverage: The PR ships fuzz seeds, a hardening test file, and a full integration pentest suite covering integer edges, path traversal, resource exhaustion, and all blocked language constructs.

Coverage Summary

Code path	Scenario test	Go test	Status
All blocked constructs (system, getline, ENVIRON, functions, pipes, redirects)	`tests/scenarios/cmd/awk/security/`	pentest_test.go	✅ Covered
Record size cap	—	`awk_hardening_test.go` / pentest	✅ Covered
Array entry + memory cap	—	`awk_hardening_test.go` / pentest	✅ Covered
Loop iteration cap	—	`awk_hardening_test.go` / pentest	✅ Covered
Division by zero (intentional divergence)	`arithmetic/division_by_zero.yaml`	—	✅ Covered
`for-in` sorted order (intentional divergence)	`loops/for_in_sorted.yaml`	—	✅ Covered
RS="" paragraph mode not implemented (intentional divergence)	`edge/rs_empty_paragraph.yaml`	—	✅ Covered
OFMT/CONVFMT unusual format strings	—	—	⚠️ Missing (P3)
`strings.Contains("is a directory")` on non-Linux	—	—	⚠️ Not portable (P2)

AlexandreYang · 2026-05-05T05:04:33Z

[AI Generated] Self-review iteration 29 complete — see inline review comments for findings.

AlexandreYang · 2026-05-05T05:08:22Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14de466c5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T05:14:23Z

+		if fs.Changed("field-separator") {
+			if err := runtime.setFS(*fieldSep); err != nil {


Preserve command-line ordering for -F and -v

When a caller passes -v FS=... before -F..., GNU awk applies these options in argv order, so the later -F value should win. This handler always applies -F here and then replays all -v assignments afterward, making -v FS=... win regardless of the original order; for example, awk -v FS=, -F: '{print $1}' will split on commas instead of colons. Please preserve the relative order of -F and -v assignments when initializing the runtime.

Useful? React with 👍 / 👎.

…f age

… and Codex

AlexandreYang · 2026-05-05T05:52:17Z

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca312898b0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-05T05:56:11Z

+	if i != len(spec)-1 {
+		return "", false


Honor OFMT/CONVFMT width and flags

When users set OFMT or CONVFMT to a normal awk printf format with width or flags, such as OFMT="%2.2f" or CONVFMT="%08.2f", this parser returns false and formatNumber silently falls back to %.6g. That makes common awk programs print or concatenate the wrong numeric text; for example awk 'BEGIN{OFMT="%2.2f"; print 1.5}' should print 1.50, but this builtin prints 1.5.

Useful? React with 👍 / 👎.

empty

62ea21b

AlexandreYang commented May 4, 2026

View reviewed changes

AlexandreYang added 2 commits May 4, 2026 12:58

AlexandreYang added the verified/analysis Human-reviewed static analysis changes label May 4, 2026

feat(tools): add review-fix-loop Go CLI with deterministic loop control

6cb3502

AlexandreYang added 2 commits May 4, 2026 13:42

fmt: format skills.go

294aa22

feat(tools/review-fix-loop): reduce verbosity, add --verbose flag

db75833

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/awk/awk.go Outdated

Comment thread builtins/awk/interp.go

Comment thread tests/scenarios/cmd/awk/arithmetic/basic_ops.yaml Outdated

test(tools/review-fix-loop): add unit tests for pure functions

f5ec6f1

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/awk/parser.go Outdated

Comment thread builtins/awk/lexer.go

feat(tools/review-fix-loop): always show bash command lines for progr…

f67e090

…ess visibility

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/awk/parser.go Outdated

Comment thread builtins/awk/awk.go Outdated

feat(review-fix-loop): add colors, dot heartbeats, and temp log file

b4f165a

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread builtins/awk/interp.go Outdated

Comment thread builtins/awk/parser.go Outdated

Comment thread builtins/awk/lexer.go

fix(review-fix-loop): accumulate dots on one line across tool-call ro…

a4171a0

…unds

AlexandreYang added 2 commits May 4, 2026 14:20

feat(review-fix-loop): write verbose output to log file regardless of…

0278817

… --verbose flag

feat(review-fix-loop): log model name at startup

c93fab9

AlexandreYang commented May 5, 2026

View reviewed changes

Comment thread builtins/awk/builtins.go

Comment thread tests/scenarios/cmd/awk/security/system_blocked.yaml

Comment thread builtins/awk/interp.go

Comment thread builtins/awk/interp.go

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

AlexandreYang commented May 5, 2026

View reviewed changes

Comment thread tools/review-fix-loop/loop.go

Comment thread builtins/awk/interp.go

Comment thread tests/scenarios/cmd/awk/arrays/delete.yaml

Comment thread tests/scenarios/cmd/awk/security/environ_argv_blocked.yaml

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

AlexandreYang commented May 5, 2026

View reviewed changes

Comment thread builtins/awk/interp.go

Comment thread builtins/awk/interp.go

Comment thread tests/scenarios/cmd/awk/errors/missing_file_continues.yaml

Comment thread builtins/awk/interp.go

Comment thread builtins/awk/interp.go

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

AlexandreYang commented May 5, 2026

View reviewed changes

Comment thread builtins/awk/builtins.go

Comment thread builtins/awk/interp.go

Comment thread builtins/awk/interp.go

Comment thread tests/scenarios/cmd/awk/fields/nf.yaml

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

AlexandreYang commented May 5, 2026

View reviewed changes

Comment thread builtins/awk/interp.go

Comment thread builtins/awk/interp.go

Comment thread builtins/awk/printf.go

Comment thread tests/scenarios/cmd/awk/printf/unsigned_negative_clamp.yaml

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

AlexandreYang commented May 5, 2026

View reviewed changes

Comment thread builtins/awk/interp.go

Comment thread builtins/awk/ast.go

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

AlexandreYang added 2 commits May 5, 2026 07:46

fix(address-pr-comments): process all unresolved threads regardless o…

411643f

…f age

feat(review-fix-loop): gate streak on P0/P1 only for both self-review…

ca31289

… and Codex

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

AlexandreYang closed this May 5, 2026

AlexandreYang deleted the alex/impl-awk branch May 5, 2026 06:45

		if fs.Changed("field-separator") {
		if err := runtime.setFS(*fieldSep); err != nil {

Conversation

AlexandreYang commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

awk builtin

review-fix-loop tool (tools/review-fix-loop/)

Testing

Checklist

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 4, 2026

Uh oh!

AlexandreYang left a comment

Choose a reason for hiding this comment

Review Summary

Findings Summary

Detailed Findings

Finding 1 — Printf literal width cap allows ~1 GiB allocation

Finding 2 — *-width from awk expression has no upper bound

Finding 3 — -F'\t' does not split on tab

Finding 4 — index(s, "") returns 0 instead of 1

Finding 5 — Missing scenario test for -F'\t'

Test Coverage Summary

Positive Observations

Uh oh!

AlexandreYang commented May 4, 2026

Iteration 1 self-review result

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented May 4, 2026

Uh oh!

AlexandreYang left a comment

Choose a reason for hiding this comment

Review Summary

Finding Summary

Test Coverage Table

Positive Observations

Uh oh!

AlexandreYang commented May 4, 2026 •

edited

Loading

review-fix-loop tool (`tools/review-fix-loop/`)

Finding 2 — `*`-width from awk expression has no upper bound

Finding 3 — `-F'\t'` does not split on tab

Finding 4 — `index(s, "")` returns 0 instead of 1

Finding 5 — Missing scenario test for `-F'\t'`