[RPC][Tracker] Bound msg_size to MAX_TRACKER_MSG_BYTES to prevent unbounded buffer growth#19586
[RPC][Tracker] Bound msg_size to MAX_TRACKER_MSG_BYTES to prevent unbounded buffer growth#19586bl4cksku11 wants to merge 2 commits into
Conversation
…ounded buffer growth
There was a problem hiding this comment.
Code Review
This pull request introduces a 1 MiB limit on RPC tracker messages to prevent unbounded memory consumption from malformed or oversized size headers. The message handling logic was updated to validate the size header and immediately close connections that violate the constraint. Additionally, a regression test was added to ensure oversized messages trigger a connection closure. Feedback was provided regarding a missing try...except block near a pylint suppression, which should be implemented to safely handle potential errors during JSON parsing or message handling.
| # pylint: disable=broad-except | ||
| self.call_handler(json.loads(msg)) |
There was a problem hiding this comment.
The pylint: disable=broad-except comment suggests that an exception handler was intended here, but the try...except block is missing. Malformed JSON input or errors in call_handler could cause the tracker to raise an unhandled exception, potentially leading to an ungraceful connection termination or affecting the event loop. Given the focus on robustness in this PR, it would be better to wrap this call in a try...except block and close the connection on failure.
| # pylint: disable=broad-except | |
| self.call_handler(json.loads(msg)) | |
| try: | |
| self.call_handler(json.loads(msg)) | |
| except Exception: # pylint: disable=broad-except | |
| logger.warning("Error in handling message from %s", self.name()) | |
| self.close() | |
| return |
There was a problem hiding this comment.
Fair. Wrapped json.loads + call_handler in try/except and added exc_info=True so the traceback gets logged for diagnostics. Pushed as 8f6875a.
…ormed JSON Addresses gemini-code-assist review: the dangling pylint disable=broad-except hinted at a missing handler. Wrap json.loads + call_handler so a malformed frame or buggy handler logs and closes the connection instead of crashing the tornado event loop.
Fixes #.
Reads of
_msg_sizefrom the tracker socket are now bounded toMAX_TRACKER_MSG_BYTES = 1 MiB, and the 4-byte size header isconsumed at read time. Without these checks, a single TCP connection
from a peer can grow the tracker process buffer until OOM, and a wire
size of 0 starves the parser without ever freeing the bytes.
Per the TVM security model the tracker is deployed on trusted networks,
so this is filed as a robustness defect, not a security advisory.
Apache security team triage (private thread, 2026-05-17) confirmed this
is the right channel.
Test
Added regression test in tests/python/contrib/test_rpc_tracker.py that
completes the magic handshake, sends an oversized msg_size header
(0x7FFFFFFF), and asserts the tracker closes the connection.
Changes
_msg_sizeto (0, MAX_TRACKER_MSG_BYTES], consume size header on read.