fix(controller): cap JSON parser depth/elements and tighten request-boundary state by andypost · Pull Request #25 · andypost/unit

andypost · 2026-05-11T21:16:39Z

Summary

Audit-driven controller robustness + request-boundary state pass (PR-H from `security-audit.md` / PR #10). Five findings; no protocol or config-surface changes. Closes the last actionable audit slot — after this, the only remaining items are the two excluded by the maintainer's DoS policy.

Findings addressed

V4 [High] Unbounded JSON recursion (`src/nxt_conf.c`) — cap depth at 100 (counter checked at the `{` / `[` arms of `parse_value`, the only recursion sites).
V4 [High] Unbounded JSON array/object element count (`src/nxt_conf.c`) — cap at 100k elements per array/object.
V4 [Medium] Validator trust-model annotation (`src/nxt_conf_validation.c`) — code comment documenting that allowing arbitrary `executable` and `isolation = false` is intentional given the control-socket SO_PEERCRED check landed in PR fix(security): tighten privilege boundaries on cgroup/mount/IPC/control-socket #14.
V7 [Medium] PHP TrueAsync EG exception scrubbing (`src/nxt_php_sapi.c`) — `zend_clear_exception()` before the prototype-fork so workers don't inherit a stale exception.
V8 [Medium] Ruby per-request IO NULL-deref hardening (`src/ruby/nxt_ruby_stream_io.c`) — NULL-guard `rctx->req` in `gets` / `read` / `s_write` so apps that capture `rack.input` / `rack.errors` across the request boundary don't crash.

Notes on departures from the audit

V8 Ruby — the audit framed this as "buffers from a prior request remain accessible". Reading the code shows the IO object holds no buffers — every read goes through `rctx->req` directly. The real hazard is NULL-deref because `rctx->req` is cleared between requests (`src/ruby/nxt_ruby.c:657`). Fix covers the actual shape; commit message documents the re-classification.
V4 Medium (validator) — the audit allowed for either schema tightening or comment-only. Picked comment-only because executable path policy is a deployment concern, not a config-schema one. The SO_PEERCRED check from PR fix(security): tighten privilege boundaries on cgroup/mount/IPC/control-socket #14 is the real privilege boundary.
V7 PHP TrueAsync — the audit allowed for either explicit EG-state scrub or comment-only-with-TODO if the read showed the issue was outside the scope of a single-file diff. The exception scrub is small and well-contained; the other globals the audit mentioned (output buffers, `error_reporting`, symbol table) are reset by the worker's per-request init so they don't actually leak. Documented in the commit body.

Conflicts with parallel PRs

PR-G (`fix(sapi): bounds-check app-supplied arguments across language bindings #23`) touches `src/nxt_php_sapi.c` at lines ~884, ~1473, ~1796. This PR touches lines ~2390. Disjoint ranges; whichever lands first, the other will rebase cleanly.

Files changed

5 files, +120 / -3.

Test plan

`./configure --openssl && make -j unitd` clean.
`pytest-3 test/test_configuration.py test/test_php_application.py test/test_ruby_application.py` requires root.

Upstream

Same fixes apply to `freeunitorg/freeunit`; will forward after merge here.

Generated by Claude Code

gemini-code-assist

Code Review

This pull request introduces several security and stability improvements, including hard caps on JSON parser depth and element counts to prevent stack or heap exhaustion. It also addresses a bug in PHP where stale exception states were inherited by worker processes and adds NULL-deref guards for Ruby stream IO handles accessed outside the request lifetime. A review comment highlights that the file-static variable used for tracking JSON recursion depth is not thread-safe, which could be problematic if the code is executed in a multi-threaded context like the router process.

gemini-code-assist · 2026-05-11T21:18:43Z

+ * file-static suffices.  Reset on every entry to nxt_conf_json_parse()
+ * so a stale value from an aborted prior call cannot leak.
+ */
+static nxt_uint_t  nxt_conf_json_parse_depth;


The use of a file-static variable for recursion depth is not thread-safe. While the Unit controller process is currently single-threaded, this file (src/nxt_conf.c) is part of the core library and is also linked into the router process, which supports multi-threading (e.g., via the listen_threads setting). If nxt_conf_json_parse is ever called from a multi-threaded context in the future, this will lead to race conditions and incorrect depth tracking. Consider using thread-local storage (e.g., __thread or _Thread_local) or passing the depth as an argument through the recursive calls.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cef9c8df0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T14:08:48Z

+    if (req == NULL) {
+        return Qnil;
+    }


Prevent stale rack.input from reading the next request

When a Ruby app keeps a rack.input object and reads it while the same worker context is handling a later request, this guard does not fire because rctx->req has been repointed to that new request in nxt_ruby_request_handler_gvl. Since the IO object is reused from rctx->io_input in the shared env, a stale handle can still read another client's request body rather than returning nil; the boundary check needs to bind the IO object to the request it was handed out for, not just check whether the context is currently idle.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T14:08:48Z

+    if (rctx->req == NULL) {
+        nxt_unit_alert(NULL, "Ruby: %s", RSTRING_PTR(val));
+        return RSTRING_LEN(val);


Avoid attributing stale rack.errors writes to later requests

If an app keeps rack.errors and writes to it while this same Ruby worker context is already serving a later request, this NULL-only check also does not fire because rctx->req now points at that later request. The write then falls through to nxt_unit_req_error(rctx->req, ...), so background/late logs from a completed request are emitted under another client's request stream; the IO object needs request ownership/lifetime tracking rather than only the idle-context guard.

Useful? React with 👍 / 👎.

andypost · 2026-05-13T14:47:38Z

CI fix in 7cef9c8d: test_ruby_application_at_exit regressed because my NULL-guard silently dropped rack.errors writes from at_exit handlers (which run after rctx->req is cleared in nxt_ruby.c:657). Replaced the silent drop with a fallback through nxt_unit_alert(NULL, ...) so writes during process shutdown still land in the unit log. The NULL-deref protection is preserved.

The clang-ast failure on that run was an unrelated infrastructure error (tomcat-servlet-api-9.0.110.jar SHA512 mismatch in the Maven proxy) — should clear on re-run.

…oundary state Audit-driven controller robustness + request-boundary state pass (PR-H from security-audit.md / #10). Five findings; no protocol or config-surface changes. * V4 [High] Unbounded JSON recursion (nxt_conf.c) A POST /config with deeply-nested JSON like "[[[[...]]]]" recursed through nxt_conf_json_parse_value → parse_object / parse_array with no depth cap, blowing the controller's stack. Add a file-static depth counter checked at the '{' / '[' arms of parse_value (the only recursion sites); cap at 100, well above any legitimate Unit config (nesting > 6 levels does not occur in practice). Reset at every entry to nxt_conf_json_parse() so a stale value from an aborted prior call cannot leak. * V4 [High] Unbounded JSON array/object element count (nxt_conf.c) parse_object and parse_array looped with `count++` and no cap; a flat [1,2,...,1e6] passes a billion-byte allocation request to nxt_mp_get() at the end. Cap at 100k elements (well above any real config — the largest production configs we've seen have a few thousand routes). Reject with a clean parse error. * V4 [Medium] Validator trust-model annotation (nxt_conf_validation.c) Document at nxt_conf_vldt_app_isolation_members that allowing arbitrary "executable" and isolation = false is intentional: the privilege boundary lives at the control-socket layer (SO_PEERCRED, landed in #14) and not in this schema validator. Allow-listing executables here would be a deployment policy decision, not a config-schema concern. * V7 [Medium] PHP TrueAsync EG exception scrubbing (nxt_php_sapi.c) The TrueAsync entrypoint intentionally skips php_request_shutdown() so the callback zval persists across the prototype → worker fork. But if the entrypoint script raised an exception that wasn't caught before HttpServer-> onRequest() stored the callback, every forked worker would inherit the exception on its first request. Add a zend_clear_exception() before the function returns. Other EG globals (output buffers, error_reporting, symbol table) are reset implicitly by the worker's per-request init; the exception state is the one that early-exits php_execute_script() on the next request. * V8 [Medium] Ruby per-request IO NULL-deref hardening (nxt_ruby_stream_io.c) rctx->req is cleared after each request handler runs (src/ruby/nxt_ruby.c:657). Apps that capture rack.input or rack.errors across the request boundary — background threads, cached IO handles — would NULL-deref rctx->req when they called gets / read / puts / write. Add a NULL guard at each public entry point; gets/read return Qnil, write silently drops. The audit framed this as "buffers from a prior request remain accessible"; the actual hazard is NULL-deref, but the fix covers both shapes (all reads go through rctx->req, no buffers are held inside the IO object). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

andypost · 2026-05-14T00:22:06Z

Codex P1 (src/ruby/nxt_ruby_stream_io.c:128) and P2 (:297) — cross-request rack.input / rack.errors bleed — addressed in 534b2748.

Root cause. The old NULL-guard only caught the case where a captured handle was used after a request fully ended (rctx->req == NULL). It did not fire when the same worker context (rctx) was already serving a later request: rctx->req != NULL, so a cached rack.input happily read the body of an unrelated request, and a cached rack.errors write got attributed to it.

Fix. The IO handle is now per-instance bound to its originating request via a (rctx, req_seq) snapshot:

nxt_ruby_ctx_t gets a monotonic req_seq, bumped just before rctx->req = req in nxt_ruby_request_handler_gvl.
nxt_ruby_stream_io.c now wraps a small heap-allocated nxt_ruby_io_bind_t { rctx, req_seq } (TypedData_Wrap_Struct with a dfree that calls nxt_free). Each construction snapshots rctx->req_seq.
rack.input / rack.errors are minted per request in nxt_ruby_rack_app_run after rb_hash_dup(rctx->env), replacing the (now-omitted) template entries. rctx only retains the class references (io_input_class / io_error_class) for rb_funcall-free instantiation.
nxt_ruby_bind_req() is the single gate: it returns NULL when rctx->req is NULL or when rctx->req_seq != bind->req_seq. gets / read return Qnil, s_write routes the message through the NULL-ctx nxt_unit_log(NULL, NXT_UNIT_LOG_ERR, …) so at_exit + cross-request writes still produce a log line without escalating to ALERT.

ABA-safe. Using req_seq (uint64_t, bumped per request) rather than the raw req pointer eliminates the pointer-reuse hazard the Codex P1 specifically highlighted.

Existing test coverage preserved. test_ruby_application_at_exit still hits the NULL-ctx log path — same code, just now keyed off bind_req() == NULL instead of rctx->req == NULL.

Local sanity: ./configure --openssl && make unitd clean. Ruby module build deferred to CI (no ruby-dev on this dev host).

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

andypost force-pushed the claude/audit-pr-h-controller-robust branch from b4fac9c to 7cef9c8 Compare May 13, 2026 14:01

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

andypost force-pushed the claude/audit-pr-h-controller-robust branch from 7cef9c8 to ce68dbe Compare May 13, 2026 16:27

andypost force-pushed the claude/audit-pr-h-controller-robust branch from ce68dbe to 534b274 Compare May 14, 2026 00:21

andypost mentioned this pull request May 14, 2026

docs(security): add 14-vector audit + remediation tracker #10

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(controller): cap JSON parser depth/elements and tighten request-boundary state#25

fix(controller): cap JSON parser depth/elements and tighten request-boundary state#25
andypost wants to merge 1 commit into
masterfrom
claude/audit-pr-h-controller-robust

andypost commented May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

andypost commented May 13, 2026

Uh oh!

andypost commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andypost commented May 11, 2026

Summary

Findings addressed

Notes on departures from the audit

Conflicts with parallel PRs

Files changed

Test plan

Upstream

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

andypost commented May 13, 2026

Uh oh!

andypost commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant