fix: validate /api/set-workspace path — reject traversal, symlink escape, non-existent (closes #15)#16
Conversation
…ape, non-existent (closes cppalliance#15) POST /api/set-workspace accepted any string in body.path, ran tilde expansion, and stored it in a module-global override consumed by every subsequent file lookup. Anyone reaching the endpoint (including a hostile JS payload — see XSS issue cppalliance#11) could repoint the app at /etc, /, ~/.ssh, or any other directory; the dashboard would then serve files from there as if they were Cursor chat data. Symlink-based escape (e.g. /tmp/cursor-link → /) bypassed any naive startswith-style check. New utils/path_validation.py with validate_workspace_path(raw): - Expands ~/. - os.path.realpath() — collapses `..` AND resolves symlinks in one step. Both classes of escape become equivalent to the canonical real path on disk; downstream marker check then operates on the truth. - Rejects with WorkspacePathError if the path doesn't exist, isn't a directory, or contains no immediate subdirectory with a state.vscdb marker (the same heuristic /api/validate-path already uses). - Returns the canonical real path so the override is stored in canonical form, not whatever the caller sent. api/config_api.py:set_workspace now calls the validator and returns 400 { error: "<reason>" } on rejection (was silently 200), and stores the canonical path on success: 200 { success: true, path: "<canonical>" }. Lives outside api/ so the test suite can import without Flask in scope (tests/test_cli_args.py convention). Tests: tests/test_workspace_path_validation.py — 11 cases covering: - happy path with marker file present - canonical path returned (`..` collapsed) - empty / whitespace / non-string rejected - non-existent / file-not-directory rejected - directory without Cursor markers rejected - traversal lands outside workspace → rejected on markers - symlink → / rejected on markers (POSIX-only) - symlink → real workspace canonicalised + accepted (POSIX-only) Full suite: 148/148 OK (was 137; +11 new). Live HTTP smoke against the running app verified all 9 documented behaviours (200 with canonical path on accept; 400 with the documented reason on each reject).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a canonicalizing workspace path validator and integrates it into the ChangesWorkspace Path Validation & API Integration
sequenceDiagram
participant Client
participant Server as API (/api/set-workspace)
participant Validator as validate_workspace_path
participant FS as Filesystem
Client->>Server: POST /api/set-workspace { "path": "<input>" }
Server->>Validator: validate_workspace_path("<input>")
Validator->>FS: expanduser + realpath + stat checks
FS-->>Validator: existence/type/marker results
alt validation OK
Validator-->>Server: return canonical_path
Server->>Server: set_workspace_path_override(canonical_path)
Server-->>Client: 200 {"success": true, "path": canonical_path}
else validation fails (WorkspacePathError)
Validator-->>Server: raise WorkspacePathError(reason)
Server-->>Client: 400 {"error": reason}
else unexpected error
Server-->>Client: 500 {"error": "internal server error"}
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
utils/path_validation.py (1)
30-37: ⚡ Quick winUnify the workspace-marker contract with discovery code.
This validator accepts based on
state.vscdb, whileapi/workspaces.pydiscovery currently keys offworkspace.json. That split can validate a path successfully but still yield no workspaces later. Please centralize one shared marker predicate (or shared marker set) and reuse it in both places.Also applies to: 77-81
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@utils/path_validation.py` around lines 30 - 37, The validator _has_cursor_workspace_markers currently checks for state.vscdb while api/workspaces.py discovery looks for workspace.json, causing inconsistent acceptance; refactor by extracting a shared predicate or marker set (e.g., WORKSPACE_MARKERS or a function like is_workspace_marker_present(path: str)) and replace the ad-hoc checks in _has_cursor_workspace_markers and the discovery code in api/workspaces.py to both call that single helper so both validation and discovery use the same marker(s) (ensure the helper checks immediate subdirectories the same way _has_cursor_workspace_markers intended and include both state.vscdb and workspace.json).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/config_api.py`:
- Around line 79-80: The current code assigns body =
request.get_json(silent=True) or {} then does raw = body.get("path", "") which
breaks if body is a non-dict JSON (array/string/number); update the handler to
check isinstance(body, dict) before accessing .get(): if not a dict, raise
WorkspacePathError so the existing exception handling returns a 400. Modify the
block that uses request.get_json, the body variable and raw assignment (and
mirror the pattern used in validate_workspace_path()) to ensure only dicts are
.get()-accessed.
In `@tests/test_workspace_path_validation.py`:
- Around line 101-106: The test test_traversal_into_non_workspace_is_rejected
relies on resolving to the system /tmp which is nondeterministic; change the
constructed escape target so it stays inside the test-owned temp tree: when
creating escape from storage (created by _make_cursor_workspace_dir(self.tmp)),
join ".." segments that resolve back into a new directory under self.tmp (e.g.,
create a sibling/parent folder inside self.tmp and use os.path.join(storage,
"..", "..", "<inside-temp-name>") or explicitly create an outside_dir under
self.tmp and point escape at that), then assert validate_workspace_path(escape)
raises WorkspacePathError; update any setup so the escaped target is created
under self.tmp to make the test deterministic.
---
Nitpick comments:
In `@utils/path_validation.py`:
- Around line 30-37: The validator _has_cursor_workspace_markers currently
checks for state.vscdb while api/workspaces.py discovery looks for
workspace.json, causing inconsistent acceptance; refactor by extracting a shared
predicate or marker set (e.g., WORKSPACE_MARKERS or a function like
is_workspace_marker_present(path: str)) and replace the ad-hoc checks in
_has_cursor_workspace_markers and the discovery code in api/workspaces.py to
both call that single helper so both validation and discovery use the same
marker(s) (ensure the helper checks immediate subdirectories the same way
_has_cursor_workspace_markers intended and include both state.vscdb and
workspace.json).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 31572f67-cd0b-4b53-9fbe-9953b971112f
📒 Files selected for processing (3)
api/config_api.pytests/test_workspace_path_validation.pyutils/path_validation.py
…n PR cppalliance#16) Two CodeRabbit follow-ups: 1. api/config_api.py:set_workspace — when get_json(silent=True) returned a non-dict (JSON array, string, number, null), the truthy fallback `or {}` was bypassed and `body.get("path", "")` raised AttributeError, which the outer Exception handler mis-reported as a 500 server error. Added isinstance(body, dict) guard that returns 400 { error: "request body must be a JSON object" } directly. Diverged from CodeRabbit's literal patch in one way: they had it raise WorkspacePathError("path is required"), but the actual problem here is a malformed body shape — the error message should match the cause so the client can fix it. 2. tests/test_workspace_path_validation.py — the traversal test escaped to /tmp itself, which is shared and could be flipped by any other test / process creating <something>/state.vscdb there. Pinned the escape target to an isolated root inside self.tmp. Also added 4 API-layer regression tests (TestSetWorkspaceApi) using Flask test_client: JSON array / string / number return 400 (not 500), plus a sanity end-to-end with a valid {path} body returning 200 and the canonical realpath. Full suite: 152/152 OK (was 148; +4 new API tests).
|
Closed in favour of the in-repo equivalent so CI can run. |
Avoid AttributeError on truthy JSON scalars/arrays (same class as PR #16). Return valid:false + workspaceCount:0 to match validation error shape. Co-authored-by: Cursor <cursoragent@cursor.com>
…ape, non-existent (closes #15) (#22) * fix: validate /api/set-workspace path — reject traversal, symlink escape, non-existent (closes #15) POST /api/set-workspace accepted any string in body.path, ran tilde expansion, and stored it in a module-global override consumed by every subsequent file lookup. Anyone reaching the endpoint (including a hostile JS payload — see XSS issue #11) could repoint the app at /etc, /, ~/.ssh, or any other directory; the dashboard would then serve files from there as if they were Cursor chat data. Symlink-based escape (e.g. /tmp/cursor-link → /) bypassed any naive startswith-style check. New utils/path_validation.py with validate_workspace_path(raw): - Expands ~/. - os.path.realpath() — collapses `..` AND resolves symlinks in one step. Both classes of escape become equivalent to the canonical real path on disk; downstream marker check then operates on the truth. - Rejects with WorkspacePathError if the path doesn't exist, isn't a directory, or contains no immediate subdirectory with a state.vscdb marker (the same heuristic /api/validate-path already uses). - Returns the canonical real path so the override is stored in canonical form, not whatever the caller sent. api/config_api.py:set_workspace now calls the validator and returns 400 { error: "<reason>" } on rejection (was silently 200), and stores the canonical path on success: 200 { success: true, path: "<canonical>" }. Lives outside api/ so the test suite can import without Flask in scope (tests/test_cli_args.py convention). Tests: tests/test_workspace_path_validation.py — 11 cases covering: - happy path with marker file present - canonical path returned (`..` collapsed) - empty / whitespace / non-string rejected - non-existent / file-not-directory rejected - directory without Cursor markers rejected - traversal lands outside workspace → rejected on markers - symlink → / rejected on markers (POSIX-only) - symlink → real workspace canonicalised + accepted (POSIX-only) Full suite: 148/148 OK (was 137; +11 new). Live HTTP smoke against the running app verified all 9 documented behaviours (200 with canonical path on accept; 400 with the documented reason on each reject). * fix: harden /api/set-workspace handler + traversal test (CodeRabbit on PR #16) Two CodeRabbit follow-ups: 1. api/config_api.py:set_workspace — when get_json(silent=True) returned a non-dict (JSON array, string, number, null), the truthy fallback `or {}` was bypassed and `body.get("path", "")` raised AttributeError, which the outer Exception handler mis-reported as a 500 server error. Added isinstance(body, dict) guard that returns 400 { error: "request body must be a JSON object" } directly. Diverged from CodeRabbit's literal patch in one way: they had it raise WorkspacePathError("path is required"), but the actual problem here is a malformed body shape — the error message should match the cause so the client can fix it. 2. tests/test_workspace_path_validation.py — the traversal test escaped to /tmp itself, which is shared and could be flipped by any other test / process creating <something>/state.vscdb there. Pinned the escape target to an isolated root inside self.tmp. Also added 4 API-layer regression tests (TestSetWorkspaceApi) using Flask test_client: JSON array / string / number return 400 (not 500), plus a sanity end-to-end with a valid {path} body returning 200 and the canonical realpath. Full suite: 152/152 OK (was 148; +4 new API tests). * fix(set-workspace): wrap override persistence to keep 500 as JSON (CodeRabbit on PR #22) validate_workspace_path() failures were already returning structured JSON, but set_workspace_path_override(canonical) was outside the try block — a persistence failure would have surfaced as Flask's HTML 500 page instead of {"error": "..."}. Wraps the call in its own try/except so the response shape stays structured for any consumer parsing JSON. * test(set-workspace): segment-aware `..` assert + reset override after API tests Two test-hygiene follow-ups from a structured re-review of PR #22; no production code changes. 1. tests/test_workspace_path_validation.py — test_returns_canonical_path_collapsing_dotdot The canonical-path assertion was a substring check against `..` on the raw realpath string. That would spuriously fail if the OS-supplied tempdir name ever embedded `..` in a folder component — substring presence is the wrong primitive for the invariant we actually care about, which is "no `..` *segment* in the canonical path." Switched to `Path(result).parts`, which handles `\` vs `/` natively and asserts on path components. 2. tests/test_workspace_path_validation.py — TestSetWorkspaceApi.setUp The 200-path test mutates the module-global _workspace_path_override in utils/workspace_path.py via the API, and the tempdir it then points at is rmtree'd by the existing cleanup. Without an explicit reset, the global stays pinned at a now-deleted path across tests. Added addCleanup(set_workspace_path_override, None) so any future sibling test inspecting the override sees a clean None. Full suite: 152/152 OK (skipped=2 POSIX-only symlink tests on Windows). No behaviour change; ReadLints clean. Co-authored-by: Cursor <cursoragent@cursor.com> * Align /api/validate-path with validate_workspace_path (PR #22) - POST /api/validate-path now uses the same realpath + marker checks as set-workspace; returns canonical path and structured errors on failure. - README documents WORKSPACE_PATH as trusted-operator tilde expansion only. - Config page shows server error text when validation fails. - Docstrings + symlink-test CI note; TOCTOU comment after realpath. Co-authored-by: Cursor <cursoragent@cursor.com> * validate-path: reject non-object JSON before body.get Avoid AttributeError on truthy JSON scalars/arrays (same class as PR #16). Return valid:false + workspaceCount:0 to match validation error shape. --------- Co-authored-by: yu-med <clean6378@gmail.com> Co-authored-by: Brad <headit74@hotmail.com>
Problem
POST /api/set-workspaceaccepted any string inbody.path, did~/expansion, and stored the result in a module-global consumed by every subsequent file lookup. A client (anyone who reaches the dashboard, including a hostile JS payload — see related XSS issue #11) could post{ "path": "/etc" }or{ "path": "/home/user/.ssh" }and the app would then serve those files as Cursor data. Symlink escape (e.g./tmp/cursor-link → /) bypassed any naivestartswithcheck.Change
utils/path_validation.py(new) —validate_workspace_path(raw)withWorkspacePathError.os.path.realpath()collapses..AND resolves symlinks in one step; rejects non-string/empty, non-existent, non-directory, no-Cursor-markers; returns the canonical real path so the override is stored canonically, not as the caller sent it. Lives outsideapi/so tests don't pull Flask into scope (existing test-suite convention).api/config_api.py— handler calls the validator. Returns400 { error: "<reason>" }on rejection (was silently200); on accept returns200 { success: true, path: "<canonical>" }.tests/test_workspace_path_validation.py(new) — 11 regression cases.Test plan
Unit (
python3 -m unittest discover tests): 148/148 OK (was 137; +11 new).Live HTTP smoke against the running app, all 9 cases:
400 path is required400 path is required400 path does not exist/etc/passwd400 path is not a directory/etc(no markers)400 no Cursor workspaceStorage…/storage/../..lands at/tmp400(..collapsed, then markers reject)/400(realpath unmasked it)state.vscdbin subdir)200 { path: "<canonical>", success: true }200 { path: "<realpath>" }(canonicalised — input was the symlink path)Case I is the meaningful one: even though the operator sent
…/good-link, the override stored is the realpath. Every subsequent read goes through canonical, not the link — a future swap of the link target can't redirect reads.Closes #15.
Summary by CodeRabbit
New Features
Tests