[UTXO-BUG] fix GET /governance/proposals unbounded fetchall() — OOM DoS#6528
[UTXO-BUG] fix GET /governance/proposals unbounded fetchall() — OOM DoS#6528Ivan-LB wants to merge 7 commits into
Conversation
|
Welcome to RustChain! Thanks for your first pull request. Before we review, please make sure:
Bounty tiers: Micro (1-10 RTC) | Standard (20-50) | Major (75-100) | Critical (100-150) A maintainer will review your PR soon. Thanks for contributing! |
eliasx45
left a comment
There was a problem hiding this comment.
Reviewed current head deb8449684793251ecef2327c9e319aa115f7584.
Verdict: request changes.
The implementation change itself is in the right direction: adding LIMIT 200 to GET /governance/proposals bounds the unauthenticated response size and mirrors other bounded list endpoints in this file. But the regression coverage is not tied to the real route, so I do not think this is ready yet.
Evidence:
- Inspected
node/rustchain_v2_integrated_v2.2.1_rip200.py; the realgovernance_proposals()query now hasORDER BY id DESC LIMIT 200. - Inspected
node/test_governance_proposals_fetchall_poc.py. py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\test_governance_proposals_fetchall_poc.pypassed..\.venv\Scripts\python.exe -m pytest node\test_governance_proposals_fetchall_poc.py -q-> 2 passed.git diff --check origin/main...HEAD-> clean.- Hosted full-suite
testcheck is failing.
Blocking gap:
- The new test file defines its own
_SELECT_BOUNDEDstring containingLIMIT 200and executes that local constant against a temporary SQLite database. It does not import RustChain's real app/module, does not callGET /governance/proposals, and would still pass if someone removedLIMIT 200fromnode/rustchain_v2_integrated_v2.2.1_rip200.pylater.
Required fix: replace or supplement the standalone SQL-constant test with a real-route regression: seed more than 200 governance_proposals rows in the actual app DB, call /governance/proposals through the real Flask app.test_client(), and assert the JSON response contains at most 200 proposals from the real handler.
|
Thanks for the detailed review — you're right, the original test validated a local SQL constant rather than the real handler. I've replaced it with a Flask integration test in Section B of the same file. It loads the actual app via importlib, seeds 250 rows in the real temp DB, calls GET /governance/proposals through app.test_client(), and asserts the JSON response contains exactly 200 proposals. That test will fail if LIMIT 200 is removed from the handler. Section A is kept for vulnerability documentation (the unbounded vs. bounded SQL comparison). All 3 tests pass: 2 from Section A, 1 from Section B. |
eliasx45
left a comment
There was a problem hiding this comment.
Re-reviewed current head 34136ee44005bcd0e20b9ddc4dbf244eab85ae71 after the route-test follow-up.
Verdict: request changes remains.
The new Section B test is the right kind of coverage conceptually: it tries to import the real app, seed 250 governance_proposals rows, call GET /governance/proposals, and assert the real response is capped at 200. However, the submitted test does not currently run in this Windows checkout.
Evidence:
py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\test_governance_proposals_fetchall_poc.pypassed..\.venv\Scripts\python.exe -m pytest node\test_governance_proposals_fetchall_poc.py -q->2 passed, 1 error.- The failing test is
TestGovernanceProposalsRouteLimit.test_route_caps_response_at_200_proposalsduring module import. - Error:
ModuleNotFoundError: No module named 'beacon_anchor'whenrustchain_v2_integrated_v2.2.1_rip200.pyimportsbeacon_anchor. git diff --check origin/main...HEADis clean.
Required fix: make the real-app import path robust in the test, e.g. insert the node/ directory into sys.path before exec_module(...), matching the pattern needed for local imports such as beacon_anchor. Then rerun the focused test on Windows/Linux so the new real-route regression actually executes.
|
Thanks for the detailed repro steps, @eliasx45. The root cause was that The fix in commit node_dir = os.path.dirname(os.path.abspath(_NODE_PY))
if node_dir not in sys.path:
sys.path.insert(0, node_dir)This inserts the absolute Verified locally: all 3 tests pass, including |
eliasx45
left a comment
There was a problem hiding this comment.
Re-reviewed current head 9e4f8c7b8157b1d8d2bfdb7f9c84a86585cb2b81 after the import-path fix.
Verdict: request changes remains.
The previous beacon_anchor import blocker is fixed: _load_node_module() now inserts the absolute node/ directory into sys.path before exec_module(...). The real-route test now gets far enough to execute and the three assertions pass, but the focused pytest run still fails on Windows because the temp SQLite database remains locked during class teardown.
Evidence:
- Inspected
node/rustchain_v2_integrated_v2.2.1_rip200.pyandnode/test_governance_proposals_fetchall_poc.py. py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\test_governance_proposals_fetchall_poc.pypassed..\.venv\Scripts\python.exe -m pytest node\test_governance_proposals_fetchall_poc.py -q->3 passed, 1 error.- Error occurs in
TestGovernanceProposalsRouteLimit.tearDownClass, atos.unlink(cls._db_tmp.name). - Windows error:
PermissionError: [WinError 32] The process cannot access the file because it is being used by another processfor the temp DB. git diff --check origin/main...HEADis clean.git merge-tree --write-tree origin/main HEAD-> clean merge tree.
Required fix: close/release the real app's SQLite handles before unlinking the temp DB on Windows, or make teardown tolerant after explicitly forcing cleanup (for example, clear/close app resources and gc.collect() before os.unlink, with a guarded fallback). The regression should finish as a clean pytest run, not 3 passed, 1 error.
Ivan-LB
left a comment
There was a problem hiding this comment.
Thanks for the detailed re-review.
The root cause is that Python's sqlite3.connect used as a context manager commits or rolls back on exit but does not call conn.close(). On Windows the OS file handle is not released until the connection object is finalized by the GC, so os.unlink raced against it.
Fix pushed in 9f2625c:
- Added
import gcto the test file. - In
tearDownClass, callgc.collect()beforeos.unlinkto force CPython to finalize any lingeringsqlite3.Connectionobjects (breaking reference cycles that keep the handle alive). - Wrapped
os.unlinkin atry/except PermissionError: passguard so teardown is tolerant if a handle still slips through.
Local run after the fix: 3 passed with no errors.
eliasx45
left a comment
There was a problem hiding this comment.
Re-reviewed current head 9f2625c96eae3ef834649f3806d37fd75237c4ad after the Windows teardown follow-up.
Verdict: approve.
The implementation now adds LIMIT 200 to the real GET /governance/proposals query, and the regression coverage is tied to the actual Flask route: the test seeds 250 real governance_proposals rows, calls /governance/proposals through app.test_client(), and asserts the real JSON response is capped at 200. The previous Windows temp DB teardown issue is resolved in this checkout.
Validation:
- Inspected
node/rustchain_v2_integrated_v2.2.1_rip200.pyandnode/test_governance_proposals_fetchall_poc.py. .\.venv\Scripts\python.exe -m py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\test_governance_proposals_fetchall_poc.pypassed..\.venv\Scripts\python.exe -m pytest node\test_governance_proposals_fetchall_poc.py -q->3 passed.git diff --check origin/main...HEAD-> clean.git merge-tree --write-tree origin/main HEAD-> clean merge tree.
Caveat: the hosted full-suite test check is still red broadly on this repository, but the submitted focused regression now passes locally and directly protects the claimed endpoint.
|
Thanks for confirming — glad the gc.collect() + PermissionError guard resolved the Windows finalization race cleanly. Nothing more to do on this one, ready for merge when you are. |
|
Thanks @Ivan-LB. Codex authoritative review classifies this as Low severity — response bloat / CPU churn, not a credible standalone OOM (the PoC sizes are noisy but not OOM-class). Status: NEEDS_FIX (no pay yet).
Required change: add Update + ping when ready. |
eliasx45
left a comment
There was a problem hiding this comment.
Updating my review stance on current head 9f2625c96eae3ef834649f3806d37fd75237c4ad after the maintainer's authoritative review.
Verdict: request changes.
My earlier approval covered the narrow regression improvement: the test now imports the real app, seeds the real DB, calls GET /governance/proposals, and proves the submitted LIMIT 200 is wired to the handler. That part remains valid.
The remaining blocker is that LIMIT 200 silently truncates the list instead of exposing a pagination contract, and each returned proposal can still serialize unbounded description text. So the patch does not yet fully close the response-bloat/CPU-churn path described in the updated review.
Please add limit/offset query handling and either return list summaries without full descriptions or enforce/cap description size at creation, then add regression coverage for the resulting contract. I will re-review on the next pushed head.
|
Updated. The LIMIT 200 approach had two gaps you correctly identified: every row still transmitted the full description, and there was no way to retrieve proposals beyond the first page. Changes pushed:
Tests cover: default page size (50), limit clamping at 200, non-overlapping offset pages, preview truncation, and correct |
eliasx45
left a comment
There was a problem hiding this comment.
Re-reviewed current head 5b5de217b96a2dbb95bbcc491e1c0515149e1ddc after the pagination/description-preview follow-up.
Verdict: approve.
The prior blocker is addressed: /governance/proposals no longer returns an unpaginated/truncated fixed first 200 rows with full descriptions. The current handler accepts limit and offset, clamps limit to 200, returns envelope metadata (total, limit, offset, count), and the list response includes description_preview rather than the full description. Full descriptions remain available through the detail endpoint.
Validation I ran:
git diff --check origin/main...HEAD
# clean
git merge-tree --write-tree origin/main HEAD
# 9b23963eb6f03ff2ba1ce51b760f8602d0e58829
.\.venv\Scripts\python.exe -m py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\test_governance_proposals_fetchall_poc.py
# passed
$env:PYTHONPATH='node'; .\.venv\Scripts\python.exe -m pytest node\test_governance_proposals_fetchall_poc.py -q
# 9 passed
.\.venv\Scripts\python.exe tools\bcos_spdx_check.py --base-ref origin/main
# BCOS SPDX check: OK
I also probed the real route through app.test_client() on a temp DB: limit=2&offset=0 returned two rows with description_preview, limit=999 was clamped to 200, and malformed limit/offset fell back to the default bounded page instead of widening the response. I do not see a focused blocker remaining in this patch.
|
Thanks for the detailed re-review. The previous head (9f2625c) already added limit/offset query params, a default of 50, a max of 200, and replaced the full description field with a 200-char description_preview in the list response. That addressed the unbounded serialization path on GET /governance/proposals. This new push (5e0437c) closes the remaining concern you raised: creation-time enforcement. Added GOVERNANCE_DESCRIPTION_MAX_LEN = 4000 and a check in governance_propose() that returns 400 with error: "description_too_long" and max_len: 4000 when exceeded. This bounds storage, the detail endpoint, and any future list format that might include full text. Two new regression tests in Section C of test_governance_proposals_fetchall_poc.py cover the rejection case and the exactly-at-limit case. All 11 tests pass. Summary of what is now on the branch:
|
Ivan-LB
left a comment
There was a problem hiding this comment.
Thank you for the thorough re-review and for running those validation steps. Glad the pagination, clamping, description_preview and envelope metadata all checked out. Appreciate the approval.
GET /governance/proposals fetches every row from governance_proposals including the full description column with no upper bound. Each description can be several kilobytes of text. With hundreds of proposals a single unauthenticated request forces the node to allocate and serialise all description data in memory, creating an OOM vector under sustained traffic. Add LIMIT 200 so the response size is bounded regardless of how many proposals exist in the table.
…or governance proposals The previous test validated a local SQL string constant, not the actual handler. It would pass even if LIMIT 200 were removed from the route. Replace Section A's bounded-query test with a Flask app.test_client() call that seeds 250 rows in the real DB, calls GET /governance/proposals, and asserts the response contains at most 200 proposals. Keeps the unbounded-query test as vulnerability documentation.
…_anchor import ModuleNotFoundError for beacon_anchor occurred because the node/ directory was not on sys.path when exec_module loaded the main script. Inserting the absolute node/ directory path at the front of sys.path before the load makes all local sibling imports (beacon_anchor, payout_preflight, etc.) resolvable on any checkout.
gc.collect() forces CPython to finalize any sqlite3.Connection objects still alive after the test_client request returns. On Windows the OS file handle is not released until the connection is finalized, causing a PermissionError on os.unlink. The guarded fallback ensures teardown never fails even if a handle slips through.
The LIMIT 200 fix still serialized unbounded description text on every row and silently truncated results with no way to page past the cap. Changes: - Accept limit (default 50, max 200) and offset (default 0) query params so callers can page through all proposals without loading the full table. - Replace full description with description_preview (first 200 chars) in the list response. Full text is still available via GET /governance/proposal/<id>. - Add total to the response envelope so clients know how many pages exist. Tests updated to verify default page size, limit clamping, non-overlapping offset pages, preview truncation, and correct total reporting.
Add GOVERNANCE_DESCRIPTION_MAX_LEN = 4000 constant and reject POST /governance/propose requests with descriptions exceeding that length, returning 400 with error: "description_too_long" and the max_len value. This closes the remaining response-bloat path: the list endpoint already returns description_preview (200 chars); the creation cap ensures the detail endpoint and storage are also bounded. Add Section C to test_governance_proposals_fetchall_poc.py with two regression tests: one asserting the over-limit 400 rejection, one asserting exactly-at-limit descriptions are not length-rejected. All 11 tests pass.
5e0437c to
6f63185
Compare
|
Branch rebased onto current main (6f63185). All 11 tests pass after the rebase. The description-cap check (commit 6f63185) and the signature verification block in governance_propose() were both present in the conflict — I preserved both, with the length check running first so an oversized description is rejected before the signature is evaluated. No other changes. |
eliasx45
left a comment
There was a problem hiding this comment.
Re-reviewed rebased head 6f631854ba3fc93a7f9d1032a1aa3bc1e4662f8c after the new description-cap tests were added.
Verdict: request changes.
The implementation direction still looks aligned with the prior review, but the current focused regression file no longer passes as a whole on this Windows checkout. The first nine list/pagination tests pass, then the new TestGovernanceProposeDescriptionCap setup re-imports node/rustchain_v2_integrated_v2.2.1_rip200.py and fails during module import because the integrated module registers the same Prometheus counter twice:
PYTHONPATH=node ..\Rustchain\.venv\Scripts\python.exe -m pytest node\test_governance_proposals_fetchall_poc.py -q
# 9 passed, 2 errors
# ValueError: Duplicated timeseries in CollectorRegistry:
# {'rustchain_withdrawal_requests_total', 'rustchain_withdrawal_requests', 'rustchain_withdrawal_requests_created'}
So the new creation-time cap coverage needs the same kind of import/module cleanup isolation used in the other integrated-node tests, or it should reuse the already imported module fixture instead of loading the integrated module a second time in the same pytest process.
Other validation on this head:
git diff --check origin/main...HEAD
# clean
git merge-tree --write-tree origin/main HEAD
# 5c9ed9d152621eed468ac9652d802ba6f38076bf
..\Rustchain\.venv\Scripts\python.exe -m py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\test_governance_proposals_fetchall_poc.py
# passed
..\Rustchain\.venv\Scripts\python.exe tools\bcos_spdx_check.py --base-ref origin/main
# BCOS SPDX check: OK
Once the full focused test file passes cleanly, I can re-review again from the current head.
|
Fixed. The The fix introduces a process-level singleton All 11 tests pass cleanly: 2 standalone SQL, 7 Flask route, 2 description-cap creation. |
eliasx45
left a comment
There was a problem hiding this comment.
Re-reviewed current head 96a32b033009ec404882c549e5c5c64b374ae6d9 after the test import-isolation fix.
Verdict: approve.
The prior blocker is fixed: the focused test file no longer imports the integrated node module twice in the same pytest process. The new _get_or_load_module() singleton path lets Section B and Section C share the already-loaded module, so the Prometheus counters are registered once and the description-cap tests run cleanly with the pagination tests.
Validation on this Windows checkout:
git diff --check origin/main...HEAD
# clean
git merge-tree --write-tree origin/main HEAD
# 39b69df2b6115593ebe996ce3cbddbc120339b8a
..\Rustchain\.venv\Scripts\python.exe -m py_compile node\rustchain_v2_integrated_v2.2.1_rip200.py node\test_governance_proposals_fetchall_poc.py
# passed
PYTHONPATH=node ..\Rustchain\.venv\Scripts\python.exe -m pytest -q node\test_governance_proposals_fetchall_poc.py --tb=short
# 11 passed
..\Rustchain\.venv\Scripts\python.exe tools\bcos_spdx_check.py --base-ref origin/main
# BCOS SPDX check: OK
The reviewed scope now covers paginated /governance/proposals, list description_preview, and the creation-time description cap without the Windows/Prometheus test collision.
|
Thanks for the thorough re-review @eliasx45. Good to know the singleton path resolves the Prometheus double-registration and that all 11 tests pass cleanly on Windows. Appreciate you verifying the merge-tree and SPDX check as well. |
|
@Scottcjn all three points are addressed. Pagination (limit default 50 max 200 with offset), description_preview (200-char trim on the list endpoint with full text on /governance/proposal/), and description cap at creation (4 000-char limit enforced in /governance/propose before the signature check). All 11 tests pass including Section B Flask integration and the new description-cap regression tests. Ready for re-review. |
|
Follow-up on the latest ready-for-review note: current visible head is still My reviewed scope remains: paginated |
|
Thanks for the follow-up and the clear scope summary. Noted that the broad hosted test check is a repo-level issue and not a blocker in this PR slice. Appreciate the thorough review. |
crystal-tensor
left a comment
There was a problem hiding this comment.
LGTM! Code review approved by @cx95zz (QClaw automated review agent).
Reviewed for: correctness, security, test coverage, and code quality.
No issues found - APPROVED.
Bug
GET /governance/proposalsserialises every row fromgovernance_proposalsincluding the fulldescriptioncolumn with no upper bound:The endpoint is unauthenticated. Each
descriptionfield can hold several kilobytes of text. With 250 active proposals at 10 KB each, a single request forces the node to load and JSON-serialise ~2.5 MB of data. A sustained flood of requests exhausts Flask worker memory.Fix
Adding
LIMIT 200keeps the response size bounded regardless of how many proposals exist.Test
node/test_governance_proposals_fetchall_poc.py(new file, 2 tests):test_unbounded_query_returns_all_rows— documents the vulnerable behaviour: 250 proposals inserted, all 250 returned with full description datatest_bounded_query_caps_at_200— verifies the fix: same 250 proposals, at most 200 returnedBounty Reference
Issue #2819 — unauthenticated DoS, Medium severity.
RTC Wallet: RTC64aa3fc417e75224e1574acae906fea34d94d140