Skip to content

fix: validate /api/set-workspace path — reject traversal, symlink escape, non-existent (closes #15)#16

Closed
timon0305 wants to merge 2 commits intocppalliance:masterfrom
timon0305:fix/set-workspace-path-validation-15
Closed

fix: validate /api/set-workspace path — reject traversal, symlink escape, non-existent (closes #15)#16
timon0305 wants to merge 2 commits intocppalliance:masterfrom
timon0305:fix/set-workspace-path-validation-15

Conversation

@timon0305
Copy link
Copy Markdown
Collaborator

@timon0305 timon0305 commented May 5, 2026

Problem

POST /api/set-workspace accepted any string in body.path, did ~/ expansion, and stored the result in a module-global consumed by every subsequent file lookup. A client (anyone who reaches the dashboard, including a hostile JS payload — see related XSS issue #11) could post { "path": "/etc" } or { "path": "/home/user/.ssh" } and the app would then serve those files as Cursor data. Symlink escape (e.g. /tmp/cursor-link → /) bypassed any naive startswith check.

Change

  • utils/path_validation.py (new) — validate_workspace_path(raw) with WorkspacePathError. os.path.realpath() collapses .. AND resolves symlinks in one step; rejects non-string/empty, non-existent, non-directory, no-Cursor-markers; returns the canonical real path so the override is stored canonically, not as the caller sent it. Lives outside api/ so tests don't pull Flask into scope (existing test-suite convention).
  • api/config_api.py — handler calls the validator. Returns 400 { error: "<reason>" } on rejection (was silently 200); on accept returns 200 { success: true, path: "<canonical>" }.
  • tests/test_workspace_path_validation.py (new) — 11 regression cases.

Test plan

Unit (python3 -m unittest discover tests): 148/148 OK (was 137; +11 new).

Live HTTP smoke against the running app, all 9 cases:

Case Payload Expected Got
A empty body 400 path is required
B empty path string 400 path is required
C non-existent path 400 path does not exist
D file /etc/passwd 400 path is not a directory
E dir /etc (no markers) 400 no Cursor workspaceStorage
F traversal …/storage/../.. lands at /tmp 400 (.. collapsed, then markers reject)
G symlink → / 400 (realpath unmasked it)
H real fake workspace (state.vscdb in subdir) 200 { path: "<canonical>", success: true }
I symlink → real workspace 200 { path: "<realpath>" } (canonicalised — input was the symlink path)

Case I is the meaningful one: even though the operator sent …/good-link, the override stored is the realpath. Every subsequent read goes through canonical, not the link — a future swap of the link target can't redirect reads.

Closes #15.

Summary by CodeRabbit

  • New Features

    • Workspace path setting now strictly validates input, rejects non-object JSON with HTTP 400, enforces workspace heuristics, and stores the canonical (realpath-resolved) directory on success with a success response.
    • Validation failures return clear HTTP 400 error messages; other unexpected failures map to HTTP 500.
  • Tests

    • Added thorough tests covering path validation, canonicalization, symlink behavior, traversal protection, and API regression cases for input types and success responses.

…ape, non-existent (closes cppalliance#15)

POST /api/set-workspace accepted any string in body.path, ran tilde
expansion, and stored it in a module-global override consumed by every
subsequent file lookup. Anyone reaching the endpoint (including a
hostile JS payload — see XSS issue cppalliance#11) could repoint the app at /etc,
/, ~/.ssh, or any other directory; the dashboard would then serve files
from there as if they were Cursor chat data. Symlink-based escape
(e.g. /tmp/cursor-link → /) bypassed any naive startswith-style check.

New utils/path_validation.py with validate_workspace_path(raw):

- Expands ~/.
- os.path.realpath() — collapses `..` AND resolves symlinks in one
  step. Both classes of escape become equivalent to the canonical real
  path on disk; downstream marker check then operates on the truth.
- Rejects with WorkspacePathError if the path doesn't exist, isn't a
  directory, or contains no immediate subdirectory with a state.vscdb
  marker (the same heuristic /api/validate-path already uses).
- Returns the canonical real path so the override is stored in
  canonical form, not whatever the caller sent.

api/config_api.py:set_workspace now calls the validator and returns
400 { error: "<reason>" } on rejection (was silently 200), and stores
the canonical path on success: 200 { success: true, path: "<canonical>" }.

Lives outside api/ so the test suite can import without Flask in scope
(tests/test_cli_args.py convention).

Tests: tests/test_workspace_path_validation.py — 11 cases covering:
  - happy path with marker file present
  - canonical path returned (`..` collapsed)
  - empty / whitespace / non-string rejected
  - non-existent / file-not-directory rejected
  - directory without Cursor markers rejected
  - traversal lands outside workspace → rejected on markers
  - symlink → / rejected on markers (POSIX-only)
  - symlink → real workspace canonicalised + accepted (POSIX-only)

Full suite: 148/148 OK (was 137; +11 new). Live HTTP smoke against the
running app verified all 9 documented behaviours (200 with canonical
path on accept; 400 with the documented reason on each reject).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ee93769-57f9-432e-a484-3df450f245b9

📥 Commits

Reviewing files that changed from the base of the PR and between 75468a2 and a63297f.

📒 Files selected for processing (2)
  • api/config_api.py
  • tests/test_workspace_path_validation.py

📝 Walkthrough

Walkthrough

Adds a canonicalizing workspace path validator and integrates it into the /api/set-workspace endpoint: the validator resolves ~ and symlinks via os.path.realpath(), enforces existence/directory-ness and a Cursor workspace marker (state.vscdb), returns the canonical path on success, and maps validation failures to HTTP 400.

Changes

Workspace Path Validation & API Integration

Layer / File(s) Summary
Exception & Helper
utils/path_validation.py
Adds WorkspacePathError and _has_cursor_workspace_markers() to detect Cursor workspaces by checking for state.vscdb in immediate subdirectories.
Core Validator
utils/path_validation.py
Adds validate_workspace_path(raw_path: str) -> str: rejects non-strings/empty, expands ~, canonicalizes with os.path.realpath(), verifies existence and directory-ness, enforces Cursor marker heuristic, returns canonical path or raises WorkspacePathError.
API Wiring / Error Mapping
api/config_api.py
/api/set-workspace now requires JSON object bodies, calls validate_workspace_path(), stores the returned canonical path via set_workspace_path_override(...), returns {"success": true, "path": canonical}, maps WorkspacePathError → HTTP 400 with {"error": ...}, other exceptions → HTTP 500.
Tests
tests/test_workspace_path_validation.py
Adds _make_cursor_workspace_dir() and tests: TestValidateWorkspacePath (multiple acceptance/rejection cases, canonicalization, POSIX symlink cases) and TestSetWorkspaceApi (API returns 400 for non-dict JSON, 200 with canonical path for valid dict).
sequenceDiagram
    participant Client
    participant Server as API (/api/set-workspace)
    participant Validator as validate_workspace_path
    participant FS as Filesystem

    Client->>Server: POST /api/set-workspace { "path": "<input>" }
    Server->>Validator: validate_workspace_path("<input>")
    Validator->>FS: expanduser + realpath + stat checks
    FS-->>Validator: existence/type/marker results
    alt validation OK
        Validator-->>Server: return canonical_path
        Server->>Server: set_workspace_path_override(canonical_path)
        Server-->>Client: 200 {"success": true, "path": canonical_path}
    else validation fails (WorkspacePathError)
        Validator-->>Server: raise WorkspacePathError(reason)
        Server-->>Client: 400 {"error": reason}
    else unexpected error
        Server-->>Client: 500 {"error": "internal server error"}
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

🐇 Through symlinks, tildes, and dot‑dot fray,
I chase the real path, keep stray bytes at bay.
State markers guide my nimble feet,
Canonical trails make workspaces neat.
Hop on, safe paths — the rabbit says hooray!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding path validation to /api/set-workspace to reject traversal, symlink escapes, and non-existent paths, addressing the security vulnerability in issue #15.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
utils/path_validation.py (1)

30-37: ⚡ Quick win

Unify the workspace-marker contract with discovery code.

This validator accepts based on state.vscdb, while api/workspaces.py discovery currently keys off workspace.json. That split can validate a path successfully but still yield no workspaces later. Please centralize one shared marker predicate (or shared marker set) and reuse it in both places.

Also applies to: 77-81

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/path_validation.py` around lines 30 - 37, The validator
_has_cursor_workspace_markers currently checks for state.vscdb while
api/workspaces.py discovery looks for workspace.json, causing inconsistent
acceptance; refactor by extracting a shared predicate or marker set (e.g.,
WORKSPACE_MARKERS or a function like is_workspace_marker_present(path: str)) and
replace the ad-hoc checks in _has_cursor_workspace_markers and the discovery
code in api/workspaces.py to both call that single helper so both validation and
discovery use the same marker(s) (ensure the helper checks immediate
subdirectories the same way _has_cursor_workspace_markers intended and include
both state.vscdb and workspace.json).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/config_api.py`:
- Around line 79-80: The current code assigns body =
request.get_json(silent=True) or {} then does raw = body.get("path", "") which
breaks if body is a non-dict JSON (array/string/number); update the handler to
check isinstance(body, dict) before accessing .get(): if not a dict, raise
WorkspacePathError so the existing exception handling returns a 400. Modify the
block that uses request.get_json, the body variable and raw assignment (and
mirror the pattern used in validate_workspace_path()) to ensure only dicts are
.get()-accessed.

In `@tests/test_workspace_path_validation.py`:
- Around line 101-106: The test test_traversal_into_non_workspace_is_rejected
relies on resolving to the system /tmp which is nondeterministic; change the
constructed escape target so it stays inside the test-owned temp tree: when
creating escape from storage (created by _make_cursor_workspace_dir(self.tmp)),
join ".." segments that resolve back into a new directory under self.tmp (e.g.,
create a sibling/parent folder inside self.tmp and use os.path.join(storage,
"..", "..", "<inside-temp-name>") or explicitly create an outside_dir under
self.tmp and point escape at that), then assert validate_workspace_path(escape)
raises WorkspacePathError; update any setup so the escaped target is created
under self.tmp to make the test deterministic.

---

Nitpick comments:
In `@utils/path_validation.py`:
- Around line 30-37: The validator _has_cursor_workspace_markers currently
checks for state.vscdb while api/workspaces.py discovery looks for
workspace.json, causing inconsistent acceptance; refactor by extracting a shared
predicate or marker set (e.g., WORKSPACE_MARKERS or a function like
is_workspace_marker_present(path: str)) and replace the ad-hoc checks in
_has_cursor_workspace_markers and the discovery code in api/workspaces.py to
both call that single helper so both validation and discovery use the same
marker(s) (ensure the helper checks immediate subdirectories the same way
_has_cursor_workspace_markers intended and include both state.vscdb and
workspace.json).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31572f67-cd0b-4b53-9fbe-9953b971112f

📥 Commits

Reviewing files that changed from the base of the PR and between f8b3cb3 and 75468a2.

📒 Files selected for processing (3)
  • api/config_api.py
  • tests/test_workspace_path_validation.py
  • utils/path_validation.py

Comment thread api/config_api.py Outdated
Comment thread tests/test_workspace_path_validation.py
…n PR cppalliance#16)

Two CodeRabbit follow-ups:

1. api/config_api.py:set_workspace — when get_json(silent=True) returned
   a non-dict (JSON array, string, number, null), the truthy fallback
   `or {}` was bypassed and `body.get("path", "")` raised AttributeError,
   which the outer Exception handler mis-reported as a 500 server error.
   Added isinstance(body, dict) guard that returns
   400 { error: "request body must be a JSON object" } directly. Diverged
   from CodeRabbit's literal patch in one way: they had it raise
   WorkspacePathError("path is required"), but the actual problem here is
   a malformed body shape — the error message should match the cause so
   the client can fix it.

2. tests/test_workspace_path_validation.py — the traversal test escaped
   to /tmp itself, which is shared and could be flipped by any other
   test / process creating <something>/state.vscdb there. Pinned the
   escape target to an isolated root inside self.tmp.

Also added 4 API-layer regression tests (TestSetWorkspaceApi) using
Flask test_client: JSON array / string / number return 400 (not 500),
plus a sanity end-to-end with a valid {path} body returning 200 and the
canonical realpath.

Full suite: 152/152 OK (was 148; +4 new API tests).
@timon0305
Copy link
Copy Markdown
Collaborator Author

Closed in favour of the in-repo equivalent so CI can run.

@timon0305 timon0305 closed this May 7, 2026
bradjin8 added a commit that referenced this pull request May 8, 2026
Avoid AttributeError on truthy JSON scalars/arrays (same class as PR #16).
Return valid:false + workspaceCount:0 to match validation error shape.

Co-authored-by: Cursor <cursoragent@cursor.com>
bradjin8 added a commit that referenced this pull request May 8, 2026
…ape, non-existent (closes #15) (#22)

* fix: validate /api/set-workspace path — reject traversal, symlink escape, non-existent (closes #15)

POST /api/set-workspace accepted any string in body.path, ran tilde
expansion, and stored it in a module-global override consumed by every
subsequent file lookup. Anyone reaching the endpoint (including a
hostile JS payload — see XSS issue #11) could repoint the app at /etc,
/, ~/.ssh, or any other directory; the dashboard would then serve files
from there as if they were Cursor chat data. Symlink-based escape
(e.g. /tmp/cursor-link → /) bypassed any naive startswith-style check.

New utils/path_validation.py with validate_workspace_path(raw):

- Expands ~/.
- os.path.realpath() — collapses `..` AND resolves symlinks in one
  step. Both classes of escape become equivalent to the canonical real
  path on disk; downstream marker check then operates on the truth.
- Rejects with WorkspacePathError if the path doesn't exist, isn't a
  directory, or contains no immediate subdirectory with a state.vscdb
  marker (the same heuristic /api/validate-path already uses).
- Returns the canonical real path so the override is stored in
  canonical form, not whatever the caller sent.

api/config_api.py:set_workspace now calls the validator and returns
400 { error: "<reason>" } on rejection (was silently 200), and stores
the canonical path on success: 200 { success: true, path: "<canonical>" }.

Lives outside api/ so the test suite can import without Flask in scope
(tests/test_cli_args.py convention).

Tests: tests/test_workspace_path_validation.py — 11 cases covering:
  - happy path with marker file present
  - canonical path returned (`..` collapsed)
  - empty / whitespace / non-string rejected
  - non-existent / file-not-directory rejected
  - directory without Cursor markers rejected
  - traversal lands outside workspace → rejected on markers
  - symlink → / rejected on markers (POSIX-only)
  - symlink → real workspace canonicalised + accepted (POSIX-only)

Full suite: 148/148 OK (was 137; +11 new). Live HTTP smoke against the
running app verified all 9 documented behaviours (200 with canonical
path on accept; 400 with the documented reason on each reject).

* fix: harden /api/set-workspace handler + traversal test (CodeRabbit on PR #16)

Two CodeRabbit follow-ups:

1. api/config_api.py:set_workspace — when get_json(silent=True) returned
   a non-dict (JSON array, string, number, null), the truthy fallback
   `or {}` was bypassed and `body.get("path", "")` raised AttributeError,
   which the outer Exception handler mis-reported as a 500 server error.
   Added isinstance(body, dict) guard that returns
   400 { error: "request body must be a JSON object" } directly. Diverged
   from CodeRabbit's literal patch in one way: they had it raise
   WorkspacePathError("path is required"), but the actual problem here is
   a malformed body shape — the error message should match the cause so
   the client can fix it.

2. tests/test_workspace_path_validation.py — the traversal test escaped
   to /tmp itself, which is shared and could be flipped by any other
   test / process creating <something>/state.vscdb there. Pinned the
   escape target to an isolated root inside self.tmp.

Also added 4 API-layer regression tests (TestSetWorkspaceApi) using
Flask test_client: JSON array / string / number return 400 (not 500),
plus a sanity end-to-end with a valid {path} body returning 200 and the
canonical realpath.

Full suite: 152/152 OK (was 148; +4 new API tests).

* fix(set-workspace): wrap override persistence to keep 500 as JSON (CodeRabbit on PR #22)

validate_workspace_path() failures were already returning structured
JSON, but set_workspace_path_override(canonical) was outside the try
block — a persistence failure would have surfaced as Flask's HTML 500
page instead of {"error": "..."}. Wraps the call in its own try/except
so the response shape stays structured for any consumer parsing JSON.

* test(set-workspace): segment-aware `..` assert + reset override after API tests

Two test-hygiene follow-ups from a structured re-review of PR #22; no
production code changes.

1. tests/test_workspace_path_validation.py — test_returns_canonical_path_collapsing_dotdot
   The canonical-path assertion was a substring check against `..` on
   the raw realpath string. That would spuriously fail if the OS-supplied
   tempdir name ever embedded `..` in a folder component — substring
   presence is the wrong primitive for the invariant we actually care
   about, which is "no `..` *segment* in the canonical path." Switched
   to `Path(result).parts`, which handles `\` vs `/` natively and
   asserts on path components.

2. tests/test_workspace_path_validation.py — TestSetWorkspaceApi.setUp
   The 200-path test mutates the module-global _workspace_path_override
   in utils/workspace_path.py via the API, and the tempdir it then
   points at is rmtree'd by the existing cleanup. Without an explicit
   reset, the global stays pinned at a now-deleted path across tests.
   Added addCleanup(set_workspace_path_override, None) so any future
   sibling test inspecting the override sees a clean None.

Full suite: 152/152 OK (skipped=2 POSIX-only symlink tests on Windows).
No behaviour change; ReadLints clean.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Align /api/validate-path with validate_workspace_path (PR #22)

- POST /api/validate-path now uses the same realpath + marker checks as
  set-workspace; returns canonical path and structured errors on failure.
- README documents WORKSPACE_PATH as trusted-operator tilde expansion only.
- Config page shows server error text when validation fails.
- Docstrings + symlink-test CI note; TOCTOU comment after realpath.

Co-authored-by: Cursor <cursoragent@cursor.com>

* validate-path: reject non-object JSON before body.get

Avoid AttributeError on truthy JSON scalars/arrays (same class as PR #16).
Return valid:false + workspaceCount:0 to match validation error shape.


---------

Co-authored-by: yu-med <clean6378@gmail.com>
Co-authored-by: Brad <headit74@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: validate /api/set-workspace path (reject traversal, symlink escape, non-existent paths)

1 participant