Skip to content

fix(controller): cap JSON parser depth/elements and tighten request-boundary state#25

Open
andypost wants to merge 1 commit into
masterfrom
claude/audit-pr-h-controller-robust
Open

fix(controller): cap JSON parser depth/elements and tighten request-boundary state#25
andypost wants to merge 1 commit into
masterfrom
claude/audit-pr-h-controller-robust

Conversation

@andypost
Copy link
Copy Markdown
Owner

Summary

Audit-driven controller robustness + request-boundary state pass (PR-H from `security-audit.md` / PR #10). Five findings; no protocol or config-surface changes. Closes the last actionable audit slot — after this, the only remaining items are the two excluded by the maintainer's DoS policy.

Findings addressed

  • V4 [High] Unbounded JSON recursion (`src/nxt_conf.c`) — cap depth at 100 (counter checked at the `{` / `[` arms of `parse_value`, the only recursion sites).
  • V4 [High] Unbounded JSON array/object element count (`src/nxt_conf.c`) — cap at 100k elements per array/object.
  • V4 [Medium] Validator trust-model annotation (`src/nxt_conf_validation.c`) — code comment documenting that allowing arbitrary `executable` and `isolation = false` is intentional given the control-socket SO_PEERCRED check landed in PR fix(security): tighten privilege boundaries on cgroup/mount/IPC/control-socket #14.
  • V7 [Medium] PHP TrueAsync EG exception scrubbing (`src/nxt_php_sapi.c`) — `zend_clear_exception()` before the prototype-fork so workers don't inherit a stale exception.
  • V8 [Medium] Ruby per-request IO NULL-deref hardening (`src/ruby/nxt_ruby_stream_io.c`) — NULL-guard `rctx->req` in `gets` / `read` / `s_write` so apps that capture `rack.input` / `rack.errors` across the request boundary don't crash.

Notes on departures from the audit

  • V8 Ruby — the audit framed this as "buffers from a prior request remain accessible". Reading the code shows the IO object holds no buffers — every read goes through `rctx->req` directly. The real hazard is NULL-deref because `rctx->req` is cleared between requests (`src/ruby/nxt_ruby.c:657`). Fix covers the actual shape; commit message documents the re-classification.
  • V4 Medium (validator) — the audit allowed for either schema tightening or comment-only. Picked comment-only because executable path policy is a deployment concern, not a config-schema one. The SO_PEERCRED check from PR fix(security): tighten privilege boundaries on cgroup/mount/IPC/control-socket #14 is the real privilege boundary.
  • V7 PHP TrueAsync — the audit allowed for either explicit EG-state scrub or comment-only-with-TODO if the read showed the issue was outside the scope of a single-file diff. The exception scrub is small and well-contained; the other globals the audit mentioned (output buffers, `error_reporting`, symbol table) are reset by the worker's per-request init so they don't actually leak. Documented in the commit body.

Conflicts with parallel PRs

Files changed

5 files, +120 / -3.

Test plan

  • `./configure --openssl && make -j unitd` clean.
  • `pytest-3 test/test_configuration.py test/test_php_application.py test/test_ruby_application.py` requires root.

Upstream

Same fixes apply to `freeunitorg/freeunit`; will forward after merge here.


Generated by Claude Code

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several security and stability improvements, including hard caps on JSON parser depth and element counts to prevent stack or heap exhaustion. It also addresses a bug in PHP where stale exception states were inherited by worker processes and adds NULL-deref guards for Ruby stream IO handles accessed outside the request lifetime. A review comment highlights that the file-static variable used for tracking JSON recursion depth is not thread-safe, which could be problematic if the code is executed in a multi-threaded context like the router process.

Comment thread src/nxt_conf.c
* file-static suffices. Reset on every entry to nxt_conf_json_parse()
* so a stale value from an aborted prior call cannot leak.
*/
static nxt_uint_t nxt_conf_json_parse_depth;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of a file-static variable for recursion depth is not thread-safe. While the Unit controller process is currently single-threaded, this file (src/nxt_conf.c) is part of the core library and is also linked into the router process, which supports multi-threading (e.g., via the listen_threads setting). If nxt_conf_json_parse is ever called from a multi-threaded context in the future, this will lead to race conditions and incorrect depth tracking. Consider using thread-local storage (e.g., __thread or _Thread_local) or passing the depth as an argument through the recursive calls.

@andypost andypost force-pushed the claude/audit-pr-h-controller-robust branch from b4fac9c to 7cef9c8 Compare May 13, 2026 14:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cef9c8df0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +126 to +128
if (req == NULL) {
return Qnil;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent stale rack.input from reading the next request

When a Ruby app keeps a rack.input object and reads it while the same worker context is handling a later request, this guard does not fire because rctx->req has been repointed to that new request in nxt_ruby_request_handler_gvl. Since the IO object is reused from rctx->io_input in the shared env, a stale handle can still read another client's request body rather than returning nil; the boundary check needs to bind the IO object to the request it was handed out for, not just check whether the context is currently idle.

Useful? React with 👍 / 👎.

Comment thread src/ruby/nxt_ruby_stream_io.c Outdated
Comment on lines +293 to +295
if (rctx->req == NULL) {
nxt_unit_alert(NULL, "Ruby: %s", RSTRING_PTR(val));
return RSTRING_LEN(val);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid attributing stale rack.errors writes to later requests

If an app keeps rack.errors and writes to it while this same Ruby worker context is already serving a later request, this NULL-only check also does not fire because rctx->req now points at that later request. The write then falls through to nxt_unit_req_error(rctx->req, ...), so background/late logs from a completed request are emitted under another client's request stream; the IO object needs request ownership/lifetime tracking rather than only the idle-context guard.

Useful? React with 👍 / 👎.

@andypost
Copy link
Copy Markdown
Owner Author

CI fix in 7cef9c8d: test_ruby_application_at_exit regressed because my NULL-guard silently dropped rack.errors writes from at_exit handlers (which run after rctx->req is cleared in nxt_ruby.c:657). Replaced the silent drop with a fallback through nxt_unit_alert(NULL, ...) so writes during process shutdown still land in the unit log. The NULL-deref protection is preserved.

The clang-ast failure on that run was an unrelated infrastructure error (tomcat-servlet-api-9.0.110.jar SHA512 mismatch in the Maven proxy) — should clear on re-run.

@andypost andypost force-pushed the claude/audit-pr-h-controller-robust branch from 7cef9c8 to ce68dbe Compare May 13, 2026 16:27
…oundary state

Audit-driven controller robustness + request-boundary state pass
(PR-H from security-audit.md / #10).  Five findings;
no protocol or config-surface changes.

* V4 [High] Unbounded JSON recursion (nxt_conf.c)
  A POST /config with deeply-nested JSON like "[[[[...]]]]"
  recursed through nxt_conf_json_parse_value → parse_object /
  parse_array with no depth cap, blowing the controller's stack.
  Add a file-static depth counter checked at the '{' / '[' arms
  of parse_value (the only recursion sites); cap at 100, well
  above any legitimate Unit config (nesting > 6 levels does not
  occur in practice).  Reset at every entry to
  nxt_conf_json_parse() so a stale value from an aborted prior
  call cannot leak.

* V4 [High] Unbounded JSON array/object element count (nxt_conf.c)
  parse_object and parse_array looped with `count++` and no cap;
  a flat [1,2,...,1e6] passes a billion-byte allocation request
  to nxt_mp_get() at the end.  Cap at 100k elements (well above
  any real config — the largest production configs we've seen
  have a few thousand routes).  Reject with a clean parse error.

* V4 [Medium] Validator trust-model annotation (nxt_conf_validation.c)
  Document at nxt_conf_vldt_app_isolation_members that allowing
  arbitrary "executable" and isolation = false is intentional:
  the privilege boundary lives at the control-socket layer
  (SO_PEERCRED, landed in #14) and not in this
  schema validator.  Allow-listing executables here would be a
  deployment policy decision, not a config-schema concern.

* V7 [Medium] PHP TrueAsync EG exception scrubbing (nxt_php_sapi.c)
  The TrueAsync entrypoint intentionally skips
  php_request_shutdown() so the callback zval persists across
  the prototype → worker fork.  But if the entrypoint script
  raised an exception that wasn't caught before HttpServer->
  onRequest() stored the callback, every forked worker would
  inherit the exception on its first request.  Add a
  zend_clear_exception() before the function returns.  Other
  EG globals (output buffers, error_reporting, symbol table)
  are reset implicitly by the worker's per-request init; the
  exception state is the one that early-exits
  php_execute_script() on the next request.

* V8 [Medium] Ruby per-request IO NULL-deref hardening
  (nxt_ruby_stream_io.c)
  rctx->req is cleared after each request handler runs
  (src/ruby/nxt_ruby.c:657).  Apps that capture rack.input or
  rack.errors across the request boundary — background
  threads, cached IO handles — would NULL-deref rctx->req when
  they called gets / read / puts / write.  Add a NULL guard at
  each public entry point; gets/read return Qnil, write
  silently drops.  The audit framed this as "buffers from a
  prior request remain accessible"; the actual hazard is
  NULL-deref, but the fix covers both shapes (all reads go
  through rctx->req, no buffers are held inside the IO
  object).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@andypost andypost force-pushed the claude/audit-pr-h-controller-robust branch from ce68dbe to 534b274 Compare May 14, 2026 00:21
@andypost
Copy link
Copy Markdown
Owner Author

Codex P1 (src/ruby/nxt_ruby_stream_io.c:128) and P2 (:297) — cross-request rack.input / rack.errors bleed — addressed in 534b2748.

Root cause. The old NULL-guard only caught the case where a captured handle was used after a request fully ended (rctx->req == NULL). It did not fire when the same worker context (rctx) was already serving a later request: rctx->req != NULL, so a cached rack.input happily read the body of an unrelated request, and a cached rack.errors write got attributed to it.

Fix. The IO handle is now per-instance bound to its originating request via a (rctx, req_seq) snapshot:

  • nxt_ruby_ctx_t gets a monotonic req_seq, bumped just before rctx->req = req in nxt_ruby_request_handler_gvl.
  • nxt_ruby_stream_io.c now wraps a small heap-allocated nxt_ruby_io_bind_t { rctx, req_seq } (TypedData_Wrap_Struct with a dfree that calls nxt_free). Each construction snapshots rctx->req_seq.
  • rack.input / rack.errors are minted per request in nxt_ruby_rack_app_run after rb_hash_dup(rctx->env), replacing the (now-omitted) template entries. rctx only retains the class references (io_input_class / io_error_class) for rb_funcall-free instantiation.
  • nxt_ruby_bind_req() is the single gate: it returns NULL when rctx->req is NULL or when rctx->req_seq != bind->req_seq. gets / read return Qnil, s_write routes the message through the NULL-ctx nxt_unit_log(NULL, NXT_UNIT_LOG_ERR, …) so at_exit + cross-request writes still produce a log line without escalating to ALERT.

ABA-safe. Using req_seq (uint64_t, bumped per request) rather than the raw req pointer eliminates the pointer-reuse hazard the Codex P1 specifically highlighted.

Existing test coverage preserved. test_ruby_application_at_exit still hits the NULL-ctx log path — same code, just now keyed off bind_req() == NULL instead of rctx->req == NULL.

Local sanity: ./configure --openssl && make unitd clean. Ruby module build deferred to CI (no ruby-dev on this dev host).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant