Skip to content

feat: cloud backend API — Supabase schema, FastAPI sync endpoint, Railway deploy#22

Merged
Gradata merged 13 commits intomainfrom
feat/cloud-backend
Apr 10, 2026
Merged

feat: cloud backend API — Supabase schema, FastAPI sync endpoint, Railway deploy#22
Gradata merged 13 commits intomainfrom
feat/cloud-backend

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 10, 2026

Summary

  • Supabase schema: 8 tables (workspaces, brains, corrections, lessons, meta_rules, events, rule_patches, workspace_members) with RLS policies, indexes, auto-create workspace trigger
  • FastAPI server: POST /api/v1/sync (receives brain events from SDK), POST /api/v1/brains/connect, GET /api/v1/brains, GET /health
  • Dual auth: SDK API key (gd_xxx) + Supabase JWT (dashboard frontend)
  • Pydantic models: Full validation — severity enum, confidence bounds, event type required
  • Railway deployment: Dockerfile + railway.toml with health check
  • 17 tests, 6 commits, zero SDK dependencies

Architecture

SDK (CloudClient) --POST /api/v1/sync--> FastAPI (Railway) --PostgREST--> Supabase Postgres
Dashboard (Next.js) --JWT auth----------> FastAPI (Railway) ---------------^

Test plan

  • 17 tests pass (auth: 5, models: 5, sync: 5, brains: 2)
  • Schema deployed to Supabase (run 001_initial_schema.sql in SQL Editor)
  • FastAPI starts locally with real Supabase credentials
  • Health endpoint responds at /health
  • Railway deployment succeeds

Generated with Gradata

Greptile Summary

This PR introduces the full cloud backend for Gradata: a Supabase schema (8 tables, RLS policies, auto-create workspace trigger), a FastAPI server with dual auth (SDK API keys + Supabase JWT), and Railway deployment config. The architecture is clean and the overall design is solid, but there are three concrete bugs that need to be addressed before the endpoints work correctly against a real Supabase instance.

Key findings:

  • P0 — GET /api/v1/brains will always 400 in production: list_brains queries columns brain_name, domain, and manifest which do not exist in the schema (the column is name). The mock-based tests pass because MockSupabaseClient.select ignores the columns argument.
  • P1 — Lessons are never actually upserted: db.upsert() sends Prefer: resolution=merge-duplicates but omits the on_conflict query parameter. PostgREST defaults to the primary key (id), and since new lesson rows have no id, every sync inserts fresh duplicates. The migration also has no unique constraint on (brain_id, description).
  • P1 — JWT audience not validated: jose_jwt.decode is called without audience=\"authenticated\", so any JWT signed with the same HMAC key passes regardless of which service issued it.

Confidence Score: 3/5

Not safe to merge — the brains list endpoint will 400 in production and every sync creates duplicate lesson rows.

Two P0/P1 bugs make two of the four API endpoints either broken or silently incorrect against a real database. The column-name mismatch in list_brains is straightforward to fix, as is the upsert issue (though it also requires a migration change). The JWT audience omission is a security gap. All three are concrete, verified problems.

cloud/app/routes/brains.py (wrong column names), cloud/app/db.py (missing on_conflict), cloud/app/auth.py (missing audience check), and cloud/migrations/001_initial_schema.sql (missing unique constraint on lessons).

Security Review

  • JWT audience bypass (cloud/app/auth.py:33): jose_jwt.decode omits audience=\"authenticated\", accepting any token signed with the Supabase HMAC key regardless of its intended purpose. Fix: pass audience=\"authenticated\" to decode().

Important Files Changed

Filename Overview
cloud/app/routes/brains.py P0 bug: list_brains queries columns brain_name, domain, manifest which don't exist in the schema — PostgREST will 400 on every real call.
cloud/app/db.py P1 bug: upsert lacks an on_conflict query param, so lessons are always inserted as new rows rather than merged — defeats idempotency.
cloud/app/auth.py P1 security: JWT decoded without audience validation; otherwise well-structured dual-auth (API key + JWT) dependency.
cloud/app/routes/sync.py Sync handler correctly writes corrections, events, meta-rules, and updates last_sync_at; upsert idempotency depends on fixing db.upsert.
cloud/migrations/001_initial_schema.sql Well-structured schema with RLS policies; handle_new_user trigger creates workspace + brain on signup; missing unique constraint on (brain_id, description) needed for lesson upsert.
cloud/app/main.py Clean app factory with lifespan handler for httpx client cleanup, docs disabled in production.
cloud/app/config.py Settings are cached with lru_cache, env prefix correctly scoped to GRADATA_.
cloud/app/models.py Solid Pydantic v2 models with enums, confidence bounds validation, and required event type validator.
cloud/tests/conftest.py Well-implemented MockSupabaseClient with monkeypatch injection; test env vars correctly set to non-real placeholders.

Sequence Diagram

sequenceDiagram
    participant SDK as SDK (CloudClient)
    participant Dashboard as Dashboard (Next.js)
    participant FastAPI as FastAPI (Railway)
    participant Supabase as Supabase (PostgREST)

    SDK->>FastAPI: POST /api/v1/sync with SDK API key
    FastAPI->>Supabase: SELECT brains WHERE api_key matches
    Supabase-->>FastAPI: brain record
    FastAPI->>Supabase: INSERT corrections / events / meta_rules
    FastAPI->>Supabase: UPSERT lessons
    FastAPI->>Supabase: PATCH brains SET last_sync_at
    FastAPI-->>SDK: status ok with counts

    Dashboard->>FastAPI: GET /api/v1/brains with Supabase JWT
    FastAPI->>FastAPI: verify_jwt extracts user_id
    FastAPI->>Supabase: SELECT brains WHERE user_id matches
    Supabase-->>FastAPI: brain rows
    FastAPI-->>Dashboard: list of brains

    SDK->>FastAPI: POST /api/v1/brains/connect with SDK API key
    FastAPI->>Supabase: SELECT brains WHERE api_key matches
    Supabase-->>FastAPI: brain record
    FastAPI-->>SDK: brain_id and connected status
Loading

Fix All in Claude Code

Prompt To Fix All With AI
This is a comment left during a code review.
Path: cloud/app/routes/brains.py
Line: 42-46

Comment:
**Non-existent columns queried — will 400 in production**

The `columns` string references `brain_name`, `domain`, and `manifest`, but none of those columns exist in the `brains` table. The schema defines the column as `name`, with no `domain` or `manifest` columns at all. PostgREST will return a `400 Bad Request` for every real call to this endpoint, even though the mocked test passes (the mock ignores the `columns` argument).

```suggestion
    rows = await db.select(
        "brains",
        columns="id,user_id,name,last_sync_at,created_at",
        filters={"user_id": brain["user_id"]},
    )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: cloud/app/db.py
Line: 50-55

Comment:
**Upsert has no `on_conflict` target — lessons will always be duplicated**

`resolution=merge-duplicates` tells PostgREST to do an upsert, but without an `on_conflict` query parameter it defaults to the primary key (`id`). Since lesson rows have no pre-set `id`, there is never a conflict and every row is inserted fresh. Every `sync()` call appends duplicate lesson records — breaking the stated "idempotent by description" behaviour.

Two things are needed:
1. A unique constraint on `(brain_id, description)` in the migration.
2. Pass `on_conflict` as a query parameter in `db.upsert()`.

```python
async def upsert(
    self, table: str, data: dict | list[dict], on_conflict: str | None = None,
) -> list[dict]:
    headers = {**self._headers, "Prefer": "return=representation,resolution=merge-duplicates"}
    params: dict[str, str] = {}
    if on_conflict:
        params["on_conflict"] = on_conflict
    resp = await self._http.post(f"/{table}", json=data, headers=headers, params=params)
    resp.raise_for_status()
    return resp.json()
```

Then in `sync.py`:
```python
await db.upsert("lessons", rows, on_conflict="brain_id,description")
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: cloud/app/auth.py
Line: 33-35

Comment:
**JWT `aud` claim not validated**

`jose_jwt.decode` is called without an `audience` argument. Supabase JWTs carry `aud: "authenticated"`, and python-jose skips audience validation when the parameter is omitted. This means any JWT signed with the same HMAC key passes regardless of which service issued it.

```suggestion
        claims = jose_jwt.decode(
            signed_jwt, key, algorithms=["HS256"], audience="authenticated"
        )
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "merge: resolve conflicts with main after..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

Oliver Le and others added 6 commits April 9, 2026 21:39
Co-Authored-By: Gradata <noreply@gradata.ai>
8 tables (workspaces, workspace_members, brains, corrections, lessons,
meta_rules, events, rule_patches), indexes, RLS policies for data
isolation, and auto-create-workspace trigger on user signup.

Co-Authored-By: Gradata <noreply@gradata.ai>
Async httpx wrapper for PostgREST (select/insert/upsert), API key
lookup for SDK clients, HS256 JWT verification for dashboard users,
and 5 passing tests with MockSupabaseClient fixture.

Co-Authored-By: Gradata <noreply@gradata.ai>
SyncRequest, SyncResponse, CorrectionPayload, LessonPayload,
EventPayload, MetaRulePayload with Severity/LessonState enums,
confidence bounds validation, and 5 passing tests.

Co-Authored-By: Gradata <noreply@gradata.ai>
CORS + request logging middleware, GET /health endpoint, route
aggregation with stub sync/brains routers for Tasks 6-7.

Co-Authored-By: Gradata <noreply@gradata.ai>
Implements POST /api/v1/sync (corrections, lessons, events, meta-rules),
POST /api/v1/brains/connect, GET /api/v1/brains, and Railway deployment
config. All 17 tests pass (auth: 5, models: 5, sync: 5, brains: 2).

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Gradata Cloud FastAPI service (config, settings, auth), an async Supabase REST client, middleware and app factory, sync/brains/health routes, DB migration with RLS/trigger, hook base utilities, packaging/deploy files, and a pytest suite with a Supabase mock. Also many small formatting/refactor edits across src/.

Changes

Cohort / File(s) Summary
Cloud config & packaging
cloud/.env.example, cloud/pyproject.toml, cloud/railway.toml
Env template for Supabase and app settings; package metadata and deps; Railway build/deploy + healthcheck config.
App factory & middleware
cloud/app/__init__.py, cloud/app/main.py, cloud/app/middleware.py
FastAPI create_app with lifespan cleanup, logging, conditional docs disabling, CORS and request-timing middleware.
Auth & settings
cloud/app/auth.py, cloud/app/config.py
Auth dependency validating gd_ API keys or HS256 Supabase JWTs; Pydantic Settings loader using GRADATA_ prefix and cached accessor.
Supabase async DB client
cloud/app/db.py
SupabaseClient wrapper (async httpx) with select/insert/upsert/update/close and cached get_db().
API models
cloud/app/models.py
Pydantic enums and payload/request/response models for sync API with validators and bounds.
Routes
cloud/app/routes/__init__.py, cloud/app/routes/brains.py, cloud/app/routes/health.py, cloud/app/routes/sync.py
Route aggregation; /brains/connect, /brains list, /health, and /sync handler that inserts/upserts corrections, lessons, events, meta-rules and updates brain last_sync_at.
DB migration
cloud/migrations/001_initial_schema.sql
Initial Supabase schema: workspaces, workspace_members, brains, corrections, lessons, meta_rules, events, rule_patches; indexes, RLS policies, and trigger to provision workspace/brain on auth.users insert.
Hook base utilities
src/gradata/hooks/_base.py
New hook helpers: profile gating, stdin JSON parsing, brain resolution/loading, safe run_hook orchestration and result/block output.
Tests & fixtures
cloud/tests/...
cloud/tests/__init__.py, cloud/tests/conftest.py, cloud/tests/test_auth.py, cloud/tests/test_brains.py, cloud/tests/test_models.py, cloud/tests/test_sync.py
Pytest fixtures including MockSupabaseClient, TestClient, auth headers; tests for auth, models, brains, and sync flows.
CI workflow
.github/workflows/sdk-ci.yml
New CI for SDK: lint, typecheck, bandit, pytest matrix builds and wheel artifact upload.
Core code edits & formatting
src/... (many files, e.g., src/gradata/brain.py, src/gradata/enhancements/behavioral_extractor.py, src/gradata/hooks/..., and many EOF/newline fixes)
Wide sweep: mostly EOF/newline fixes and small refactors; notable functional additions: Brain.patch_rule and behavioral extractor logic changes; other code-level refactors in a few modules.

Sequence Diagram(s)

sequenceDiagram
  participant SDK as SDK Client
  participant API as Gradata Cloud API
  participant Auth as Auth Verifier
  participant DB as Supabase REST

  SDK->>API: POST /api/v1/sync (Authorization, payload)
  API->>Auth: validate token (gd_* API key or HS256 JWT)
  Auth-->>API: resolved brain record / user_id
  API->>DB: select brain by api_key or user_id
  DB-->>API: brain row
  API->>DB: insert corrections/events/meta_rules
  API->>DB: upsert lessons (merge duplicates)
  DB-->>API: operation results
  API->>DB: update brains set last_sync_at = now()
  API-->>SDK: 200 { status: "ok", ...counts }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: introducing a cloud backend API with Supabase schema, FastAPI sync endpoint, and Railway deployment configuration.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering the Supabase schema, FastAPI endpoints, dual authentication, Pydantic models, deployment configuration, and test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cloud-backend

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label Apr 10, 2026
Comment thread cloud/railway.toml
@@ -0,0 +1,9 @@
[build]
builder = "DOCKERFILE"
dockerfilePath = "Dockerfile"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Missing Dockerfile

railway.toml points to Dockerfile but no Dockerfile is included in this PR. Railway will refuse to build and the deployment goal stated in the PR description cannot be met until this file is added.

A minimal working example:

FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir .
COPY app/ ./app/
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/railway.toml
Line: 3

Comment:
**Missing Dockerfile**

`railway.toml` points to `Dockerfile` but no `Dockerfile` is included in this PR. Railway will refuse to build and the deployment goal stated in the PR description cannot be met until this file is added.

A minimal working example:

```dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir .
COPY app/ ./app/
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment thread cloud/app/routes/brains.py Outdated
Comment thread cloud/app/routes/sync.py
Comment thread cloud/app/config.py
Comment thread cloud/app/main.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/.env.example`:
- Around line 2-5: The .env example has Supabase keys out of the expected
dotenv-linter order; reorder the environment variables so their keys follow the
linter's configured order (use the unique keys GRADATA_SUPABASE_URL,
GRADATA_SUPABASE_ANON_KEY, GRADATA_SUPABASE_SERVICE_KEY, and
GRADATA_SUPABASE_JWT_KEY) — update the sequence in cloud/.env.example to the
proposed canonical ordering so dotenv-linter no longer flags lines 3-5.

In `@cloud/app/config.py`:
- Around line 24-26: get_settings currently claims to be a "Cached settings
loader" but returns a new Settings() on every call; modify get_settings (and/or
introduce a module-level cached instance) to return a single cached Settings
object instead of recreating it each call—e.g., use functools.lru_cache on
get_settings or create a private _settings: Optional[Settings] variable and
initialize it once and return it thereafter; ensure references to Settings and
the get_settings function are updated so callers receive the same cached
Settings instance.

In `@cloud/app/main.py`:
- Around line 15-19: The FastAPI app instantiation currently only disables
Swagger UI via docs_url but leaves the OpenAPI JSON and ReDoc exposed; update
the FastAPI(...) call (the app variable) to set openapi_url=None and
redoc_url=None as well when settings.environment == "production" (i.e., set
docs_url=None, redoc_url=None, openapi_url=None in that branch) so that app =
FastAPI(...) fully disables OpenAPI, ReDoc, and Swagger in production.

In `@cloud/app/middleware.py`:
- Around line 25-31: The middleware log_requests currently only logs after a
successful call_next; modify log_requests to wrap await call_next(request) in
try/except/finally so that if call_next raises we still compute elapsed time and
emit a log (use _log.error or _log.exception) including request.method,
request.url.path, elapsed and the exception details, then re-raise the
exception; in the success path keep the existing info log with
response.status_code and elapsed. Ensure you reference the existing function
name log_requests and variables start, elapsed, request, call_next, and response
when adding the try/except/finally.

In `@cloud/app/routes/brains.py`:
- Around line 28-35: The connect_brain handler currently only logs and echoes
brain["id"] without creating or upserting a row in the brains table and it
depends on get_current_brain which prevents first-time JWT flows; modify
connect_brain to accept a possibly-missing brain (allow None from
get_current_brain or remove the strict dependency), then perform an upsert: if
brain is None create a new brains row (populate owner info from the auth
token/API key, set brain_name from ConnectRequest) and return its id; if brain
exists update its name/metadata as needed and return the existing id; use your
DB session helper (e.g., get_db or existing ORM utilities) and ensure the
handler still returns ConnectResponse and preserves logging.
- Around line 38-42: The endpoint list_brains currently returns raw brain rows
(via db.select("brains") which defaults to select=*) and only filters by
brain["user_id"], exposing secrets like api_key and omitting brains shared via
workspace membership; update list_brains to (1) explicitly pass a select list of
safe public fields (e.g., id, name, workspace_id, created_at, updated_at) to
db.select instead of the default *, and (2) expand the filter logic to return
brains where user_id == current_user_id OR workspace_id IN (workspaces where
current_user_id is a member) by querying workspace_members (or using a
join/subquery) before calling db.select so shared-workspace brains are included.
Ensure to reference the current user ID from the brain/dependency
(get_current_brain) and use the "brains" table and "workspace_members" relation
names when implementing.

In `@cloud/app/routes/health.py`:
- Around line 9-11: The health endpoint hardcodes the service version in the
health() function; update health() to read the version from a centralized source
(e.g., your configuration object or build metadata) instead of the literal
"0.1.0": import/use the app settings/config (for example a Settings class or env
var like settings.VERSION or os.getenv/BUILD_VERSION or app.state.version
depending on your project conventions), return that value in the payload and
include a sensible fallback if the config value is missing; keep the endpoint
signature (health) and response keys unchanged so callers are not broken.

In `@cloud/app/routes/sync.py`:
- Around line 35-97: The sync handler is performing independent writes
(body.corrections, body.lessons, body.events, body.meta_rules) using
db.insert/db.upsert which is non-idempotent and not atomic; wrap all writes in a
single database transaction (use your db.transaction or equivalent) and add a
per-sync idempotency key to each row (e.g., idempotency_key derived from the
request) so retries can be deduplicated; change inserts/upserts to include the
idempotency_key column for corrections/events/meta_rules and use upsert/ON
CONFLICT logic keyed on (idempotency_key, brain_id, <natural_key>) or check
existence within the transaction before inserting, then commit or rollback the
whole unit and set corrections_count/lessons_count/events_count/meta_rules_count
after the transaction completes.

In `@cloud/migrations/001_initial_schema.sql`:
- Around line 109-110: Remove the redundant non-unique index creation for
idx_brains_api_key: delete the CREATE INDEX idx_brains_api_key ON
brains(api_key); line from the migration so you rely on the UNIQUE btree index
already created by the api_key TEXT NOT NULL UNIQUE column (do not remove the
UNIQUE constraint itself).
- Around line 212-227: The SECURITY DEFINER function handle_new_user uses
unqualified table names (workspaces, workspace_members) so you must pin the
search_path to a safe value to avoid name-resolution hijacking; modify
handle_new_user to set a fixed search_path at start (for example SET LOCAL
search_path = pg_catalog, public) so all subsequent INSERTs target the intended
schema, and keep the function declared LANGUAGE plpgsql SECURITY DEFINER as
before.
- Around line 28-36: The brains table claims "one per user per workspace" but
only enforces uniqueness on api_key; add a uniqueness constraint on the pair
(workspace_id, user_id) to prevent duplicate brains for the same user/workspace.
Update the CREATE TABLE for brains (or add an ALTER TABLE) to add a UNIQUE
constraint named e.g. unique_brains_workspace_user covering workspace_id and
user_id so lookups by (workspace_id, user_id) are unambiguous.

In `@cloud/tests/conftest.py`:
- Around line 30-31: The mock select method currently ignores the filters
parameter; update async def select(self, table: str, columns: str = "*",
filters: dict | None = None) -> list[dict]: to fetch rows =
self._responses[table].get("select", []) and, if filters is not None, return
only rows that match all key==value pairs in filters (simple equality match on
dict keys), otherwise return rows unchanged; preserve return type list[dict] and
leave columns handling as-is. This ensures filter-dependent tests (auth/data
scoping) are exercised using the existing self._responses data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a3d50ebc-2df4-41fa-8a15-ffef45a920ad

📥 Commits

Reviewing files that changed from the base of the PR and between b24950e and f72d0df.

📒 Files selected for processing (21)
  • cloud/.env.example
  • cloud/app/__init__.py
  • cloud/app/auth.py
  • cloud/app/config.py
  • cloud/app/db.py
  • cloud/app/main.py
  • cloud/app/middleware.py
  • cloud/app/models.py
  • cloud/app/routes/__init__.py
  • cloud/app/routes/brains.py
  • cloud/app/routes/health.py
  • cloud/app/routes/sync.py
  • cloud/migrations/001_initial_schema.sql
  • cloud/pyproject.toml
  • cloud/railway.toml
  • cloud/tests/__init__.py
  • cloud/tests/conftest.py
  • cloud/tests/test_auth.py
  • cloud/tests/test_brains.py
  • cloud/tests/test_models.py
  • cloud/tests/test_sync.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
🪛 dotenv-linter (4.0.0)
cloud/.env.example

[warning] 3-3: [UnorderedKey] The GRADATA_SUPABASE_ANON_KEY key should go before the GRADATA_SUPABASE_URL key

(UnorderedKey)


[warning] 4-4: [UnorderedKey] The GRADATA_SUPABASE_SERVICE_KEY key should go before the GRADATA_SUPABASE_URL key

(UnorderedKey)


[warning] 5-5: [UnorderedKey] The GRADATA_SUPABASE_JWT_KEY key should go before the GRADATA_SUPABASE_SERVICE_KEY key

(UnorderedKey)

🪛 Ruff (0.15.9)
cloud/app/middleware.py

[warning] 26-26: Missing return type annotation for private function log_requests

(ANN202)

cloud/app/routes/sync.py

[warning] 20-20: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


[error] 64-64: Ambiguous variable name: l

(E741)

cloud/tests/conftest.py

[warning] 23-23: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 30-30: Unused method argument: columns

(ARG002)


[warning] 30-30: Unused method argument: filters

(ARG002)


[warning] 33-33: Unused method argument: table

(ARG002)


[warning] 38-38: Unused method argument: table

(ARG002)

cloud/app/auth.py

[warning] 38-38: Consider moving this statement to an else block

(TRY300)


[warning] 44-44: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

cloud/app/routes/brains.py

[warning] 31-31: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


[warning] 39-39: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

cloud/app/db.py

[warning] 18-18: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 61-61: Dynamically typed expressions (typing.Any) are disallowed in get_db

(ANN401)


[warning] 63-63: Using the global statement to update _client is discouraged

(PLW0603)

cloud/app/models.py

[warning] 10-10: Class Severity inherits from both str and enum.Enum

Inherit from enum.StrEnum

(UP042)


[warning] 18-18: Class LessonState inherits from both str and enum.Enum

Inherit from enum.StrEnum

(UP042)


[warning] 61-61: Avoid specifying long messages outside the exception class

(TRY003)

🪛 SQLFluff (4.1.0)
cloud/migrations/001_initial_schema.sql

[error] 28-231: Line 28, Position 1: Found unparsable section: 'CREATE TABLE brains (\n id UUID PRIMAR...'

(PRS)

🔇 Additional comments (10)
cloud/app/__init__.py (1)

1-1: Looks good.

Clear package-level module docstring; no concerns.

cloud/tests/__init__.py (1)

1-1: Looks good.

Test package docstring is clear and harmless.

cloud/.env.example (1)

7-9: App defaults look good.

GRADATA_ENVIRONMENT and GRADATA_LOG_LEVEL defaults are sensible for local setup.

cloud/railway.toml (1)

1-9: Railway config is consistent with app routes.

healthcheckPath = "/health" matches cloud/app/routes/health.py and router registration in cloud/app/main.py.

cloud/app/routes/__init__.py (1)

6-11: Good router aggregation.

Route composition is straightforward and keeps API module boundaries clear.

cloud/tests/test_auth.py (2)

10-37: API key auth tests are solid.

Good coverage of success path and both key failure modes (missing + unknown).


39-61: JWT tests are well-targeted.

Valid-token extraction and expired-token rejection are both verified.

cloud/tests/test_brains.py (1)

7-35: Good endpoint-level coverage for brains connect/list flows.

These tests correctly stub auth lookup and verify the response contract for both POST /brains/connect and GET /brains.

cloud/tests/test_models.py (1)

10-63: Model validation tests are solid and target the right constraints.

Enum rejection, confidence bounds, and non-empty event type checks are all covered in focused unit tests.

cloud/tests/test_sync.py (1)

7-103: Nice baseline coverage for sync happy-path + auth/validation failures.

The suite exercises core counters and key rejection paths (401/403, 422) for POST /sync.

Comment thread cloud/.env.example Outdated
Comment on lines +2 to +5
GRADATA_SUPABASE_URL=https://YOURS.supabase.co
GRADATA_SUPABASE_ANON_KEY=YOURS
GRADATA_SUPABASE_SERVICE_KEY=YOURS
GRADATA_SUPABASE_JWT_KEY=YOURS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reorder Supabase keys to satisfy dotenv-linter.

Lines 3-5 currently violate the configured key ordering checks. Please reorder these keys to avoid lint noise/failures.

♻️ Proposed reorder
-GRADATA_SUPABASE_URL=https://YOURS.supabase.co
 GRADATA_SUPABASE_ANON_KEY=YOURS
-GRADATA_SUPABASE_SERVICE_KEY=YOURS
 GRADATA_SUPABASE_JWT_KEY=YOURS
+GRADATA_SUPABASE_SERVICE_KEY=YOURS
+GRADATA_SUPABASE_URL=https://YOURS.supabase.co
🧰 Tools
🪛 dotenv-linter (4.0.0)

[warning] 3-3: [UnorderedKey] The GRADATA_SUPABASE_ANON_KEY key should go before the GRADATA_SUPABASE_URL key

(UnorderedKey)


[warning] 4-4: [UnorderedKey] The GRADATA_SUPABASE_SERVICE_KEY key should go before the GRADATA_SUPABASE_URL key

(UnorderedKey)


[warning] 5-5: [UnorderedKey] The GRADATA_SUPABASE_JWT_KEY key should go before the GRADATA_SUPABASE_SERVICE_KEY key

(UnorderedKey)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/.env.example` around lines 2 - 5, The .env example has Supabase keys
out of the expected dotenv-linter order; reorder the environment variables so
their keys follow the linter's configured order (use the unique keys
GRADATA_SUPABASE_URL, GRADATA_SUPABASE_ANON_KEY, GRADATA_SUPABASE_SERVICE_KEY,
and GRADATA_SUPABASE_JWT_KEY) — update the sequence in cloud/.env.example to the
proposed canonical ordering so dotenv-linter no longer flags lines 3-5.

Comment thread cloud/app/config.py
Comment thread cloud/app/main.py
Comment thread cloud/app/middleware.py Outdated
Comment on lines +25 to +31
@app.middleware("http")
async def log_requests(request: Request, call_next):
start = time.monotonic()
response = await call_next(request)
elapsed = (time.monotonic() - start) * 1000
_log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, response.status_code, elapsed)
return response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Error-path requests are not logged.

When call_next raises, this middleware exits before logging. That drops observability for failing requests, which is high-impact during incidents.

🛠️ Proposed fix
     `@app.middleware`("http")
     async def log_requests(request: Request, call_next):
         start = time.monotonic()
-        response = await call_next(request)
-        elapsed = (time.monotonic() - start) * 1000
-        _log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, response.status_code, elapsed)
-        return response
+        status_code = 500
+        try:
+            response = await call_next(request)
+            status_code = response.status_code
+            return response
+        finally:
+            elapsed = (time.monotonic() - start) * 1000
+            _log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, status_code, elapsed)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@app.middleware("http")
async def log_requests(request: Request, call_next):
start = time.monotonic()
response = await call_next(request)
elapsed = (time.monotonic() - start) * 1000
_log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, response.status_code, elapsed)
return response
`@app.middleware`("http")
async def log_requests(request: Request, call_next):
start = time.monotonic()
status_code = 500
try:
response = await call_next(request)
status_code = response.status_code
return response
finally:
elapsed = (time.monotonic() - start) * 1000
_log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, status_code, elapsed)
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 26-26: Missing return type annotation for private function log_requests

(ANN202)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/middleware.py` around lines 25 - 31, The middleware log_requests
currently only logs after a successful call_next; modify log_requests to wrap
await call_next(request) in try/except/finally so that if call_next raises we
still compute elapsed time and emit a log (use _log.error or _log.exception)
including request.method, request.url.path, elapsed and the exception details,
then re-raise the exception; in the success path keep the existing info log with
response.status_code and elapsed. Ensure you reference the existing function
name log_requests and variables start, elapsed, request, call_next, and response
when adding the try/except/finally.

Comment on lines +28 to +35
@router.post("/brains/connect", response_model=ConnectResponse)
async def connect_brain(
body: ConnectRequest,
brain: dict = Depends(get_current_brain),
) -> ConnectResponse:
"""Register/reconnect a brain. Called by CloudClient.connect()."""
_log.info("Brain connected: %s (name=%s)", brain["id"], body.brain_name)
return ConnectResponse(brain_id=brain["id"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Implement actual connect/create behavior here.

This handler never inserts or upserts a brains row. Because it depends on app.auth.get_current_brain, JWT callers with no existing brain will fail before reaching the handler, so the first-time connect flow cannot succeed. For API-key callers, this endpoint is effectively just echoing the current brain id.

🧰 Tools
🪛 Ruff (0.15.9)

[warning] 31-31: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/brains.py` around lines 28 - 35, The connect_brain handler
currently only logs and echoes brain["id"] without creating or upserting a row
in the brains table and it depends on get_current_brain which prevents
first-time JWT flows; modify connect_brain to accept a possibly-missing brain
(allow None from get_current_brain or remove the strict dependency), then
perform an upsert: if brain is None create a new brains row (populate owner info
from the auth token/API key, set brain_name from ConnectRequest) and return its
id; if brain exists update its name/metadata as needed and return the existing
id; use your DB session helper (e.g., get_db or existing ORM utilities) and
ensure the handler still returns ConnectResponse and preserves logging.

Comment thread cloud/app/routes/sync.py
Comment on lines +35 to +97
# Insert corrections
if body.corrections:
rows = [
{
"brain_id": brain_id,
"session": c.session,
"category": c.category,
"severity": c.severity.value,
"description": c.description,
"draft_preview": c.draft_preview[:500],
"final_preview": c.final_preview[:500],
}
for c in body.corrections
]
await db.insert("corrections", rows)
corrections_count = len(rows)

# Upsert lessons (idempotent by description)
if body.lessons:
rows = [
{
"brain_id": brain_id,
"category": l.category,
"description": l.description,
"state": l.state.value,
"confidence": float(l.confidence),
"fire_count": l.fire_count,
"recurrence_days": l.recurrence_days,
}
for l in body.lessons
]
await db.upsert("lessons", rows)
lessons_count = len(rows)

# Insert events
if body.events:
rows = [
{
"brain_id": brain_id,
"type": e.type,
"source": e.source,
"data": e.data,
"tags": e.tags,
"session": e.session,
}
for e in body.events
]
await db.insert("events", rows)
events_count = len(rows)

# Insert meta-rules
if body.meta_rules:
rows = [
{
"brain_id": brain_id,
"title": m.title,
"description": m.description,
"source_lesson_ids": [], # Resolved server-side later
}
for m in body.meta_rules
]
await db.insert("meta_rules", rows)
meta_rules_count = len(rows)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Sync persistence is non-idempotent and non-atomic.

Current flow issues independent writes for corrections/lessons/events/meta-rules without a transaction or dedupe key. Retries can duplicate records, and mid-flight failures can persist partial state.

Please add an idempotency key (per sync batch) and execute all section writes in a single transactional unit (or single RPC) so retries are safe and failures roll back consistently.

🧰 Tools
🪛 Ruff (0.15.9)

[error] 64-64: Ambiguous variable name: l

(E741)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/sync.py` around lines 35 - 97, The sync handler is
performing independent writes (body.corrections, body.lessons, body.events,
body.meta_rules) using db.insert/db.upsert which is non-idempotent and not
atomic; wrap all writes in a single database transaction (use your
db.transaction or equivalent) and add a per-sync idempotency key to each row
(e.g., idempotency_key derived from the request) so retries can be deduplicated;
change inserts/upserts to include the idempotency_key column for
corrections/events/meta_rules and use upsert/ON CONFLICT logic keyed on
(idempotency_key, brain_id, <natural_key>) or check existence within the
transaction before inserting, then commit or rollback the whole unit and set
corrections_count/lessons_count/events_count/meta_rules_count after the
transaction completes.

Comment thread cloud/migrations/001_initial_schema.sql
Comment thread cloud/migrations/001_initial_schema.sql Outdated
Comment thread cloud/migrations/001_initial_schema.sql Outdated
Comment thread cloud/tests/conftest.py Outdated
- Cache get_settings() with @lru_cache(maxsize=1)
- Disable openapi_url and redoc_url in production
- Add asynccontextmanager lifespan to clean up httpx client
- Wrap log_requests middleware in try/except so errors get logged
- Project out api_key column from brains list endpoint
- Update last_sync_at after successful sync
- Reorder .env.example vars alphabetically
- Add update() method to SupabaseClient and mock

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai coderabbitai Bot added security and removed feature labels Apr 10, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
cloud/app/middleware.py (1)

30-33: ⚠️ Potential issue | 🟠 Major

Include exception details in error-path request logs.

Line 30-Line 33 logs only a generic ERROR line; traceback/context is dropped. Switch to _log.exception(...) (or exc_info=True) before re-raising.

Proposed fix
         except Exception:
             elapsed = (time.monotonic() - start) * 1000
-            _log.error("%s %s -> ERROR (%.0fms)", request.method, request.url.path, elapsed)
+            _log.exception("%s %s -> ERROR (%.0fms)", request.method, request.url.path, elapsed)
             raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/middleware.py` around lines 30 - 33, The except block in
middleware.py currently calls _log.error(...) and re-raises, which loses the
traceback; change that logging call to include exception details by using
_log.exception("%s %s -> ERROR (%.0fms)", request.method, request.url.path,
elapsed) (or _log.error(..., exc_info=True)) so the stacktrace and exception
context are recorded before the existing raise; keep the computed elapsed and
the re-raise behavior intact.
cloud/app/routes/brains.py (2)

28-35: ⚠️ Potential issue | 🟠 Major

POST /brains/connect still does not create or upsert a brain record.

The handler only echoes the current brain id, and Depends(get_current_brain) still blocks first-time JWT connect flows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/brains.py` around lines 28 - 35, The connect_brain handler
currently just echoes the existing brain id and uses Depends(get_current_brain),
which blocks first-time JWT connect flows and never creates/upserts a brain
record; change connect_brain to accept the ConnectRequest and extract/validate
the JWT (or use a non-blocking/optional dependency rather than
get_current_brain), derive the brain id from the token, then perform a
create-or-update (upsert) into your brains store using the brain id and
body.brain_name (and any metadata), and finally return
ConnectResponse(brain_id=brain_id); remove or replace the hard dependency on
get_current_brain so the endpoint allows initial connections and ensures a brain
record is created/updated on every call.

39-46: ⚠️ Potential issue | 🟠 Major

Accessible-brains contract does not match query behavior.

Line 40 says “accessible,” but the query only returns brains where user_id == current_user_id; shared-workspace brains are excluded.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/brains.py` around lines 39 - 46, The docstring for
list_brains claims it returns "accessible" brains but the db.select call only
filters by user_id (brain["user_id"]) which excludes shared/workspace brains;
update list_brains to actually return brains accessible to the current user by
either expanding the filters passed to db.select to include OR conditions for
workspace/shared membership (e.g., include workspace_id in a list of workspaces
the user belongs to or join against the memberships table), or call a helper
that computes accessible brain IDs and query with filters={"id": {"$in":
accessible_ids}}; locate the logic in list_brains (and the use of
get_current_brain and db.select) and implement the appropriate filter/join so
shared-workspace brains are included.
cloud/tests/conftest.py (1)

30-31: ⚠️ Potential issue | 🟠 Major

Mock select() should apply filters.

Line 30-Line 31 ignores filters, so auth/scoping tests don’t exercise filter-dependent behavior.

Proposed fix
     async def select(self, table: str, columns: str = "*", filters: dict | None = None) -> list[dict]:
-        return self._responses[table].get("select", [])
+        rows = self._responses[table].get("select", [])
+        if not filters:
+            return rows
+        return [row for row in rows if all(row.get(k) == v for k, v in filters.items())]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/tests/conftest.py` around lines 30 - 31, The mock select method
currently ignores the filters parameter; update async def select(self, table:
str, columns: str = "*", filters: dict | None = None) to return only rows from
self._responses[table].get("select", []) that match the provided filters (if
filters is None return all rows). Implement simple key->value exact-match
filtering across each row dict (skip rows missing a filter key or with non-equal
values), and ensure safe handling when table or "select" key is absent by
returning an empty list. Use the existing select method and self._responses
identifiers when making the change.
cloud/app/routes/sync.py (1)

36-105: ⚠️ Potential issue | 🔴 Critical

Make sync persistence atomic and idempotent.

The handler still does independent writes without a transactional unit or request idempotency key. Retries can duplicate data, and failures can persist partial batches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/sync.py` around lines 36 - 105, The sync handler currently
performs independent writes (db.insert on "corrections","events","meta_rules",
db.upsert on "lessons", and db.update on "brains") which can produce
partial/duplicate state on retries; wrap all persistence in a single transaction
(use the DB's transaction API, e.g. db.transaction / begin/commit/rollback) so
either all inserts/upserts and the last_sync_at update succeed or all are rolled
back, and add request-level idempotency by accepting a stable sync identifier
(e.g. body.sync_id or request_id) and persist it with each row or into a
"sync_requests" table (or add a unique constraint+upsert key) to detect/reject
duplicates so repeated requests do not re-insert the same
corrections/events/meta_rules; ensure corrections/events/meta_rules insertion
uses the idempotency key or unique constraints and lessons continue to use
upsert by description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/sync.py`:
- Around line 39-47: The payload dicts built for corrections/events drop the
caller timestamp so timeline fidelity is lost; include the
SyncRequest.created_at value when constructing each row (i.e., add a
"created_at": c.created_at entry alongside brain_id, session, category,
severity.value, description, draft_preview, final_preview) and do the same in
the other similar payload block so both correction and event rows carry the
original timestamp (preserve the value/format expected downstream).

---

Duplicate comments:
In `@cloud/app/middleware.py`:
- Around line 30-33: The except block in middleware.py currently calls
_log.error(...) and re-raises, which loses the traceback; change that logging
call to include exception details by using _log.exception("%s %s -> ERROR
(%.0fms)", request.method, request.url.path, elapsed) (or _log.error(...,
exc_info=True)) so the stacktrace and exception context are recorded before the
existing raise; keep the computed elapsed and the re-raise behavior intact.

In `@cloud/app/routes/brains.py`:
- Around line 28-35: The connect_brain handler currently just echoes the
existing brain id and uses Depends(get_current_brain), which blocks first-time
JWT connect flows and never creates/upserts a brain record; change connect_brain
to accept the ConnectRequest and extract/validate the JWT (or use a
non-blocking/optional dependency rather than get_current_brain), derive the
brain id from the token, then perform a create-or-update (upsert) into your
brains store using the brain id and body.brain_name (and any metadata), and
finally return ConnectResponse(brain_id=brain_id); remove or replace the hard
dependency on get_current_brain so the endpoint allows initial connections and
ensures a brain record is created/updated on every call.
- Around line 39-46: The docstring for list_brains claims it returns
"accessible" brains but the db.select call only filters by user_id
(brain["user_id"]) which excludes shared/workspace brains; update list_brains to
actually return brains accessible to the current user by either expanding the
filters passed to db.select to include OR conditions for workspace/shared
membership (e.g., include workspace_id in a list of workspaces the user belongs
to or join against the memberships table), or call a helper that computes
accessible brain IDs and query with filters={"id": {"$in": accessible_ids}};
locate the logic in list_brains (and the use of get_current_brain and db.select)
and implement the appropriate filter/join so shared-workspace brains are
included.

In `@cloud/app/routes/sync.py`:
- Around line 36-105: The sync handler currently performs independent writes
(db.insert on "corrections","events","meta_rules", db.upsert on "lessons", and
db.update on "brains") which can produce partial/duplicate state on retries;
wrap all persistence in a single transaction (use the DB's transaction API, e.g.
db.transaction / begin/commit/rollback) so either all inserts/upserts and the
last_sync_at update succeed or all are rolled back, and add request-level
idempotency by accepting a stable sync identifier (e.g. body.sync_id or
request_id) and persist it with each row or into a "sync_requests" table (or add
a unique constraint+upsert key) to detect/reject duplicates so repeated requests
do not re-insert the same corrections/events/meta_rules; ensure
corrections/events/meta_rules insertion uses the idempotency key or unique
constraints and lessons continue to use upsert by description.

In `@cloud/tests/conftest.py`:
- Around line 30-31: The mock select method currently ignores the filters
parameter; update async def select(self, table: str, columns: str = "*",
filters: dict | None = None) to return only rows from
self._responses[table].get("select", []) that match the provided filters (if
filters is None return all rows). Implement simple key->value exact-match
filtering across each row dict (skip rows missing a filter key or with non-equal
values), and ensure safe handling when table or "select" key is absent by
returning an empty list. Use the existing select method and self._responses
identifiers when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6cd85c0d-36b7-4f51-a8fd-d4c2cc6f3d09

📥 Commits

Reviewing files that changed from the base of the PR and between f72d0df and ebb7966.

📒 Files selected for processing (8)
  • cloud/.env.example
  • cloud/app/config.py
  • cloud/app/db.py
  • cloud/app/main.py
  • cloud/app/middleware.py
  • cloud/app/routes/brains.py
  • cloud/app/routes/sync.py
  • cloud/tests/conftest.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
🪛 Ruff (0.15.9)
cloud/app/middleware.py

[warning] 26-26: Missing return type annotation for private function log_requests

(ANN202)

cloud/app/main.py

[warning] 14-14: Unused function argument: app

(ARG001)

cloud/app/routes/sync.py

[warning] 21-21: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


[error] 65-65: Ambiguous variable name: l

(E741)


[warning] 103-103: Use datetime.UTC alias

Convert to datetime.UTC alias

(UP017)

cloud/tests/conftest.py

[warning] 23-23: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 30-30: Unused method argument: columns

(ARG002)


[warning] 30-30: Unused method argument: filters

(ARG002)


[warning] 33-33: Unused method argument: table

(ARG002)


[warning] 38-38: Unused method argument: table

(ARG002)


[warning] 43-43: Unused method argument: table

(ARG002)


[warning] 43-43: Unused method argument: filters

(ARG002)

cloud/app/routes/brains.py

[warning] 31-31: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


[warning] 39-39: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


[warning] 47-47: Unnecessary assignment to rows before return statement

Remove unnecessary assignment

(RET504)

cloud/app/db.py

[warning] 18-18: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


[warning] 73-73: Dynamically typed expressions (typing.Any) are disallowed in get_db

(ANN401)


[warning] 75-75: Using the global statement to update _client is discouraged

(PLW0603)

🔇 Additional comments (4)
cloud/.env.example (1)

2-9: Env template looks correct and complete.

Supabase/app keys are present and ordered consistently; this addresses prior dotenv ordering noise.

cloud/app/config.py (1)

26-29: Settings caching fix is correctly applied.

get_settings() now returns a cached singleton instance, matching its contract.

cloud/app/main.py (1)

28-35: Production docs/schema gating is correctly implemented.

docs_url, openapi_url, and redoc_url are all disabled in production, and lifespan wiring is in place.

cloud/app/db.py (1)

26-68: Supabase client request flow looks solid.

Timeouts and raise_for_status() are consistently applied across CRUD methods.

Comment thread cloud/app/routes/sync.py
Comment on lines +28 to +35
@router.post("/brains/connect", response_model=ConnectResponse)
async def connect_brain(
body: ConnectRequest,
brain: dict = Depends(get_current_brain),
) -> ConnectResponse:
"""Register/reconnect a brain. Called by CloudClient.connect()."""
_log.info("Brain connected: %s (name=%s)", brain["id"], body.brain_name)
return ConnectResponse(brain_id=brain["id"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 connect_brain can never register a new brain

This endpoint is documented as "Register/reconnect a brain", but its Depends(get_current_brain) dependency already requires a valid brain to exist in the database before the handler runs. For the API-key path, verify_api_key looks up brains by api_key and raises 401 if not found; for the JWT path, get_current_brain returns 404 if no brain exists for the user.

Because the handle_new_user() trigger only creates a workspace and a membership (not a brain), a brand-new user calling POST /api/v1/brains/connect will always receive a 404 before the endpoint body executes. There is no code path that creates the initial brain row.

The fix is to upsert the brain inside the handler instead of relying on the dependency to have already resolved one:

@router.post("/brains/connect", response_model=ConnectResponse)
async def connect_brain(
    body: ConnectRequest,
    credentials: HTTPAuthorizationCredentials = Security(_bearer),
) -> ConnectResponse:
    user_id = verify_jwt(credentials.credentials)
    db = get_db()
    rows = await db.upsert("brains", {"user_id": user_id, "name": body.brain_name})
    brain_id = rows[0]["id"]
    return ConnectResponse(brain_id=brain_id)
Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/routes/brains.py
Line: 28-35

Comment:
**`connect_brain` can never register a new brain**

This endpoint is documented as "Register/reconnect a brain", but its `Depends(get_current_brain)` dependency already requires a valid brain to exist in the database before the handler runs. For the API-key path, `verify_api_key` looks up `brains` by `api_key` and raises 401 if not found; for the JWT path, `get_current_brain` returns 404 if no brain exists for the user.

Because the `handle_new_user()` trigger only creates a workspace and a membership (not a brain), a brand-new user calling `POST /api/v1/brains/connect` will always receive a 404 before the endpoint body executes. There is no code path that creates the initial brain row.

The fix is to upsert the brain inside the handler instead of relying on the dependency to have already resolved one:

```python
@router.post("/brains/connect", response_model=ConnectResponse)
async def connect_brain(
    body: ConnectRequest,
    credentials: HTTPAuthorizationCredentials = Security(_bearer),
) -> ConnectResponse:
    user_id = verify_jwt(credentials.credentials)
    db = get_db()
    rows = await db.upsert("brains", {"user_id": user_id, "name": body.brain_name})
    brain_id = rows[0]["id"]
    return ConnectResponse(brain_id=brain_id)
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Same fixes as other branches: trailing newlines, import sorting,
raise-from, collapsible-if, contextlib.suppress, pyright type ignore.
Added ruff>=0.4 to dev deps and ignore rules for intentional patterns.

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai coderabbitai Bot added feature and removed security labels Apr 10, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
src/gradata/_doctor.py (1)

236-264: ⚠️ Potential issue | 🟠 Major

Guideline violation: Use logging instead of print() statements.

The function uses multiple print() statements (lines 246, 247, 249, 253, 255, 257, 259, 260, 262, 263, 264), which violates the coding guideline for src/gradata/**/*.py that requires using logging instead of print().

While this is a diagnostic CLI tool designed for human-readable output, the guideline is explicit. Consider one of these alternatives:

  • Refactor to use logging with a StreamHandler configured for stdout and appropriate formatters for the CLI output.
  • Move to a CLI-specific module outside src/gradata/ if print() is essential for this tool's UX.
  • Use sys.stdout.write() if direct stdout output is required while maintaining a distinction from debug print statements.

As per coding guidelines: "no print() statements (use logging)".

♻️ Example refactor using logging
 def print_diagnosis(report: dict) -> None:
-    """Print human-readable diagnosis report."""
+    """Print human-readable diagnosis report via logging."""
+    import logging
+    logger = logging.getLogger(__name__)
+    
     status = report["status"]
     icons = {"ok": "+", "skip": "-", "missing": "!", "warn": "!", "fail": "X", "error": "X"}
     labels = {
         "healthy": "HEALTHY",
         "degraded": "DEGRADED",
         "broken": "BROKEN",
     }
 
-    print(f"\n  Gradata Doctor — {labels.get(status, status.upper())}")
-    print(f"  {'=' * 46}")
+    logger.info(f"\n  Gradata Doctor — {labels.get(status, status.upper())}")
+    logger.info(f"  {'=' * 46}")
 
     for check in report["checks"]:
         icon = icons.get(check["status"], "?")
         name = check["name"].replace("_", " ").title()
         detail = check["detail"]
-        print(f"  [{icon}] {name:<25} {detail}")
+        logger.info(f"  [{icon}] {name:<25} {detail}")
 
-    print()
+    logger.info("")
     if status == "healthy":
-        print("  All checks passed. Brain environment is ready.")
+        logger.info("  All checks passed. Brain environment is ready.")
     elif status == "degraded":
-        print("  Some optional dependencies missing. Core functionality may be limited.")
-        print("  Run the suggested pip install commands above to fix.")
+        logger.info("  Some optional dependencies missing. Core functionality may be limited.")
+        logger.info("  Run the suggested pip install commands above to fix.")
     else:
-        print("  Critical issues found. Brain cannot operate correctly.")
-        print("  Fix the [X] items above before continuing.")
+        logger.info("  Critical issues found. Brain cannot operate correctly.")
+        logger.info("  Fix the [X] items above before continuing.")
-    print()
+    logger.info("")

Note: You'll need to configure logging with an appropriate handler in the CLI entry point to ensure output goes to stdout with the desired format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_doctor.py` around lines 236 - 264, The function print_diagnosis
uses multiple print() calls which violate the project's "no print()" guideline;
replace prints with logging (import logging, get a module logger via
logging.getLogger(__name__)) and change each print(...) to logger.info(...) (use
logger.error or logger.warning where appropriate for non-healthy statuses),
preserving the exact formatted strings and the use of report/status/icons/labels
in print_diagnosis; do not configure handlers here—ensure the CLI entry point
configures a StreamHandler that writes to stdout with the desired format so
output remains human-readable (alternatively, move print_diagnosis to a CLI-only
module if direct stdout semantics are required).
src/gradata/enhancements/self_improvement.py (1)

748-755: 🧹 Nitpick | 🔵 Trivial

Update the comment to match the implementation.

The comment at line 748 states "ONE_OFF scoped lessons never graduate past INSTINCT", but the code at line 754 blocks promotion for lessons in both INSTINCT and PATTERN states. This is correct defensive behavior (handles cases where a one_off lesson might already be in PATTERN state due to manual intervention or legacy data), but the comment should reflect this.

📝 Suggested comment update
-        # ONE_OFF scoped lessons never graduate past INSTINCT
+        # ONE_OFF scoped lessons never promote beyond their current state (INSTINCT or PATTERN)
         block_promotion = False
         if lesson.scope_json:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/self_improvement.py` around lines 748 - 755, Update
the comment above the block_promotion logic to accurately describe current
behavior: it should state that ONE_OFF (correction_scope == "one_off") lessons
are prevented from being promoted when they are in either LessonState.INSTINCT
or LessonState.PATTERN (to defensively handle cases where a one_off lesson may
already be in PATTERN due to manual/legacy changes), matching the condition that
checks lesson.scope_json, parses _scope, and tests
_scope.get("correction_scope") == "one_off" and lesson.state in
(LessonState.INSTINCT, LessonState.PATTERN).
src/gradata/_fact_extractor.py (1)

271-271: ⚠️ Potential issue | 🟡 Minor

Add the required future import for type safety.

The type annotation BrainContext | None uses the union syntax that requires Python 3.10+ without the future import. To ensure compatibility with Python 3.9+ and maintain consistency with the codebase, add the required import at the top of the file.

As per coding guidelines: "from __future__ import annotations required" for files matching src/gradata/**/*.py.

🔧 Proposed fix

Add this import at the beginning of the file (after the docstring, before other imports):

 """
 Structured fact extraction pipeline.
 =====================================
 Reads brain entity files (prospects/, candidates/, etc.) and extracts
 structured facts. Domain-agnostic: entity directory and fact types are
 configurable.
 """
+from __future__ import annotations
 
 import json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_fact_extractor.py` at line 271, The function get_stats uses the
PEP 604 union syntax (BrainContext | None) which requires postponed evaluation
of annotations for Python 3.9 compatibility; add the required future import by
inserting from __future__ import annotations at the top of this module
(immediately after the module docstring and before any other imports) so type
annotations like BrainContext | None are safe across supported Python versions.
src/gradata/_context_compile.py (3)

8-14: ⚠️ Potential issue | 🔴 Critical

Missing required import for type safety.

The file is missing from __future__ import annotations at the top, which is required for type safety. As per coding guidelines, this import is mandatory for all files in src/gradata/**/*.py.

🔧 Proposed fix
 """
 Context Compiler.
 ===================
 Extracts entities from a user message, queries the brain,
 returns formatted context injection.
 """
 
+from __future__ import annotations
+
 import re
 from typing import TYPE_CHECKING
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_context_compile.py` around lines 8 - 14, Add the required future
import at the top of the gradata._context_compile module: insert "from
__future__ import annotations" as the very first statement in
src/gradata/_context_compile.py (before any other imports) to enable postponed
evaluation of annotations for type safety and to comply with the project coding
guideline.

90-99: ⚠️ Potential issue | 🟠 Major

Pass the ctx parameter to build_packet.

The build_packet function accepts a ctx: BrainContext | None parameter (defined in src/gradata/_context_packet.py line 342) and uses it in multiple helper function calls (_load_user_scope, _load_prospect_context, _load_drafting_context, _load_debug_context, _load_audit_context, _load_wrapup_context). The call on lines 92-96 does not pass the ctx parameter that is available in the compile_context function, which breaks the expected data access pattern.

Update needed
packet = build_packet(
    prospect=entities["prospect"],
    task_type=entities["task_type"],
    topic=entities["topic"],
    ctx=ctx,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_context_compile.py` around lines 90 - 99, The call to
build_packet inside compile_context is missing the ctx argument, which prevents
build_packet (and its helpers like _load_user_scope, _load_prospect_context,
_load_drafting_context, _load_debug_context, _load_audit_context,
_load_wrapup_context) from receiving the BrainContext; update the build_packet
invocation in src/gradata/_context_compile.py to pass ctx=ctx along with
prospect, task_type, and topic so the packet construction uses the provided
context.

76-88: ⚠️ Potential issue | 🟡 Minor

Pass ctx parameter to brain_search call on line 78.

The compile_context function receives a ctx parameter and correctly passes it to extract_entities on line 69, but fails to pass it to brain_search on line 78. Since brain_search accepts ctx: "BrainContext | None" and internally uses it for database access, it should receive the context parameter here as well for consistency and proper context handling.

Suggested fix:
results = brain_search(message[:100], top_k=3, mode="keyword", ctx=ctx)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_context_compile.py` around lines 76 - 88, The brain_search call
in compile_context does not pass the ctx parameter, so brain_search (which
accepts ctx: "BrainContext | None") won't receive the request context; update
the call in src/gradata/_context_compile.py where brain_search is invoked
(currently: brain_search(message[:100], top_k=3, mode="keyword")) to include the
ctx argument (i.e., pass ctx=ctx) so it becomes brain_search(message[:100],
top_k=3, mode="keyword", ctx=ctx), ensuring consistency with extract_entities
and correct DB/context usage.
src/gradata/contrib/patterns/human_loop.py (1)

496-512: ⚠️ Potential issue | 🟠 Major

The context parameter is unused — risk assessment ignores caller-supplied context.

The check method accepts a context parameter (line 499) but never uses it. Line 503 calls gate(action) without passing context, which means gate will internally call assess_risk(action) without context. Any caller passing context (e.g., to set risk_override or specify affected entities) will have it silently ignored.

🔧 Proposed fix to honor the context parameter
 def check(
     self,
     action: str,
     context: dict | None = None,
     approver: Callable | None = None,
 ) -> ApprovalResult:
     """Full gate check: assess risk, request approval if needed."""
-    request = gate(action)
+    risk = assess_risk(action, context)
+    request = gate(action, risk=risk)
     if request is None:
         return ApprovalResult(
             approved=True, feedback="auto_approved_low_risk"
         )
     if approver is not None:
         return approver(request)
     return ApprovalResult(
         approved=False, feedback="requires_human_review"
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/contrib/patterns/human_loop.py` around lines 496 - 512, The check
method’s context parameter is ignored — update check to forward the
caller-supplied context into the risk request path by calling gate with the
context (e.g., gate(action, context) or gate(action, context=context) to match
gate's signature) so that assess_risk receives the context and
risk_override/affected-entities are honored; also ensure any downstream use of
request (and approver(request)) remains compatible with the request object that
now may include context-derived fields.
src/gradata/onboard.py (2)

306-306: 🧹 Nitpick | 🔵 Trivial

Replace print() statements with logging.

The file contains multiple print() calls for interactive wizard output. As per coding guidelines, the core SDK should use logging instead of print() statements. Consider using a logger with a StreamHandler for CLI output while maintaining programmatic access.

Example refactor pattern
import logging

logger = logging.getLogger(__name__)

# In interactive sections:
logger.info("=== Gradata — Onboarding Wizard ===")
logger.info("Brain '%s' created at %s", name, brain_dir)

Configure a StreamHandler in CLI entry points to display these messages to users.

As per coding guidelines: "This is the core SDK. Check for: type safety (from future import annotations required), no print() statements (use logging)".

Also applies to: 324-327, 334-334, 339-340, 347-348, 477-503

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/onboard.py` at line 306, Replace all print() calls in the
onboarding interactive code with logging calls: add "import logging" and a
module logger (logger = logging.getLogger(__name__)) and change prints like
print("\n=== Gradata — Onboarding Wizard ===\n") to logger.info("=== Gradata —
Onboarding Wizard ===") and other user-facing prints to
logger.info/logger.warning as appropriate; keep programmatic return values
unchanged and ensure the CLI entrypoint configures a StreamHandler to emit
logger messages to stdout/stderr so interactive output still appears in the
terminal (update functions/methods in this module that contain print() calls
such as the onboarding wizard routines to use logger.* instead).

237-247: ⚠️ Potential issue | 🟡 Minor

Accept optional BrainContext parameter for consistency with DB access pattern.

The _create_db() function performs database initialization without accepting a BrainContext parameter, which deviates from the established codebase pattern. Other DB access functions throughout the SDK (e.g., _get_db(), get_stats(), validate_brain()) accept an optional ctx: "BrainContext | None" = None parameter for dependency injection and multi-brain support.

While this is initialization code that runs before BrainContext would normally exist, adding optional context support would improve consistency:

def _create_db(db_path: Path, ctx: "BrainContext | None" = None) -> None:

This allows future multi-brain scenarios and aligns with the SDK's dependency injection pattern without breaking the current onboarding flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/onboard.py` around lines 237 - 247, Update the _create_db
function signature to accept an optional BrainContext parameter for consistency
with other DB functions: change def _create_db(db_path: Path) -> None: to def
_create_db(db_path: Path, ctx: "BrainContext | None" = None) -> None; keep the
implementation unchanged (it doesn't need to use ctx now), add the forward-ref
type annotation (string literal) and ensure imports/type hints follow the
project's pattern (no runtime dependency), and update any callers to pass their
ctx where available (or rely on the default None) so the API matches _get_db,
get_stats, validate_brain.
src/gradata/brain.py (1)

669-680: ⚠️ Potential issue | 🔴 Critical

Avoid acquiring lessons_lock() twice in _resolve_pending().

write_lessons_safe() takes lessons_lock() internally. Calling it while the outer with lessons_lock(lessons_path): is still active can block on your own lock and time out instead of resolving the approval.

🔓 Proposed fix
         with lessons_lock(lessons_path):
             lessons = parse_lessons(lessons_path.read_text(encoding="utf-8"))
             matched = False
             for lesson in lessons:
                 if (lesson.category == cat and lesson.description[:100] == desc[:100]
                         and lesson.pending_approval):
                     mutator(lesson)
                     matched = True
                     break
             if matched:
-                from gradata._db import write_lessons_safe
-                write_lessons_safe(lessons_path, format_lessons(lessons))
+                lessons_path.write_text(format_lessons(lessons), encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/brain.py` around lines 669 - 680, In _resolve_pending(), don't
call write_lessons_safe() while holding lessons_lock() because
write_lessons_safe() acquires the same lock internally; instead, inside the with
lessons_lock(lessons_path): find the matching lesson, call mutator(lesson), set
matched=True and compute formatted = format_lessons(lessons) (but do not call
write_lessons_safe), then exit the with block and only if matched call
write_lessons_safe(lessons_path, formatted). This avoids re-acquiring
lessons_lock() and potential self-deadlock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_encryption.py`:
- Line 119: The RuntimeError raised in _encryption.py should be converted to a
custom exception for clarity and static-analysis friendliness: define a
module-level class EncryptionError(RuntimeError) with a short docstring, then
replace the current raise RuntimeError("Encryption produced empty or missing
file — plaintext preserved") with raise EncryptionError("Encryption produced
empty or missing file — plaintext preserved") (reference the module-level
EncryptionError class and the raise in the encryption routine).

In `@src/gradata/benchmarks/swe_bench.py`:
- Around line 268-271: The multiline ImportError raised in the SWE-bench loader
should be shortened to a single-line message to satisfy Ruff TRY003 while
preserving exception chaining; locate the raise ImportError in
src.gradata.benchmarks.swe_bench (the raise ImportError ... from e statement)
and replace the multi-line string with a concise one-line message such as
"datasets package required; pip install datasets" (or raise a small custom
exception) but keep the "from e" suffix so the original exception is chained.

In `@src/gradata/brain.py`:
- Around line 356-364: The call to self.emit in brain.patch_rule can raise and
make the caller think the patch failed even though
write_lessons_safe(format_lessons(lessons)) already persisted changes; to fix,
wrap the self.emit(...) call in a try/except that catches all exceptions, logs
the emit failure (including exception details) via the module/class logger
(e.g., self.logger.error) or a safe logging helper, and does not re-raise so the
successful patch is not reported as failed; keep the same payload and tags but
ensure any emit failure is non-fatal.
- Around line 342-356: The patch_rule implementation currently reads and mutates
lessons.md before acquiring the lessons_lock, risking races; change patch_rule
to acquire the lessons_lock() for the entire read→mutate→write cycle: inside the
lessons_lock context call self._find_lessons_path(), validate the path, read
file text, parse_lessons(...), call apply_patch(...), and if patched call
write_lessons_safe(...) with format_lessons(...), and return the same
error/success dicts while still inside the lock so no external writer can
interleave between parse/apply/write; keep helper calls (parse_lessons,
apply_patch, format_lessons, write_lessons_safe) and early-return semantics
identical.

In `@src/gradata/contrib/patterns/orchestrator.py`:
- Around line 526-528: Define a Protocol (e.g., SpawnQueueProtocol) that
declares a spawn_queue method with the exact signature used (e.g., def
spawn_queue(self, *, tasks: Sequence[Task], worker: WorkerType, max_concurrent:
int) -> Any), update the brain parameter/type from object | None to
Optional[SpawnQueueProtocol] (and add/replace any WorkerType/Task types as
appropriate), replace the unsafe cast/_sq assignment so mypy knows _sq is the
callable (e.g., _sq: Callable[..., Any] = brain.spawn_queue) and remove the
type: ignore[union-attr] on that line; ensure imports from typing (Protocol,
Optional, Callable) are added and adjust surrounding annotations accordingly so
spawn_queue is fully typed and type-checking passes.

In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 426-430: The current check treats any 3+ word sentence starting
with a capitalized token as actionable, letting declarative sentences slip
through; update the logic so only explicit prefix-rule archetypes (e.g.,
PREFIX_INSTRUCTION / Archetype.PREFIX_INSTRUCTION) or clear imperative starters
(first_word in _IMPERATIVE_STARTERS) mark an instruction actionable—remove the
capitalization+word-count heuristic (the return using instruction[0].isupper()
and len(instruction.split()) >= 3) and only return True for the known
verbatim-rule case or when first_word matches _IMPERATIVE_STARTERS in the same
function where first_word, _IMPERATIVE_STARTERS and PREFIX_INSTRUCTION are
referenced.
- Around line 33-37: The fallback _FACTUAL_RE in the try/except should be
expanded to match the same common factual tokens the real
gradata.enhancements.edit_classifier._FACTUAL_RE recognizes (ISO dates,
percentages, and dollar-prefixed amounts) so REPLACEMENT_FACTUAL and downstream
archetype logic behave the same when the import fails; replace the current
fallback regex with a combined pattern that keeps the existing numeric/date/URL
matches and adds ISO dates (e.g. YYYY-MM-DD), percent values (e.g. 12% or 12.5%)
and dollar amounts (e.g. $123.45), ensuring the new regex is assigned to
_FACTUAL_RE in the except block referenced in the try/except around the import
of gradata.enhancements.edit_classifier._FACTUAL_RE.

In `@src/gradata/enhancements/rule_evolution.py`:
- Around line 281-284: Cache the result of calling _detect_opposite_direction to
avoid calling it twice: compute a local boolean (e.g., is_opposite =
_detect_opposite_direction(new_desc, best_rule.description)) once when best_rule
is not None and reuse it in the two conditionals that return
RuleRelation.UPDATES and RuleRelation.EXTENDS, keeping the comparisons against
best_similarity, update_threshold, and extend_threshold unchanged and
referencing best_rule, best_similarity, update_threshold, extend_threshold,
new_desc, and _detect_opposite_direction.

In `@src/gradata/hooks/_base.py`:
- Around line 51-57: The resolve_brain_dir helper currently falls back to a
hardcoded path (~/.gradata/brain); remove that fallback so it only uses
environment variables and validates them. Update resolve_brain_dir to read
GRADATA_BRAIN_DIR or BRAIN_DIR, return the path string if the env var is set and
exists (else None), and eliminate any use of Path.home() or the default
".gradata/brain" logic so no hardcoded filesystem path remains. Ensure the
function signature and return types (resolve_brain_dir) are preserved.

In `@src/gradata/integrations/crewai_adapter.py`:
- Around line 101-104: The exception handlers in get_rules() (where it returns
self.brain.apply_brain_rules(task, context)) and in search() currently swallow
errors silently; update those except Exception blocks to log the exception
details using the same pattern as save() (e.g., logger.debug or
logger.exception) so failures include context and the exception info;
specifically modify the except blocks surrounding
self.brain.apply_brain_rules(...) and the search() implementation to call
logger.debug/exception with a brief message and the caught exception.

In `@src/gradata/integrations/embeddings.py`:
- Around line 158-160: The handlers registered with bus.on (those with
async_handler=True) call _embed_and_cache which mutates the shared dict
_embedding_cache without synchronization, causing races; fix by adding a
module-level lock (e.g., _embedding_cache_lock = threading.Lock()) and acquire
it around all mutations of _embedding_cache inside _embed_and_cache (use a
context manager to ensure release on exceptions), and ensure any other code
paths that write to _embedding_cache also use the same lock so all concurrent
async_handler invocations are serialized.

In `@src/gradata/integrations/langchain_adapter.py`:
- Line 113: The clear() method contains an unnecessary trailing "pass" after its
docstring which triggers Ruff PIE790; remove the redundant "pass" from the
clear() method (leave the existing docstring in place) so the function body is
just the docstring and no no-op statement remains.

---

Outside diff comments:
In `@src/gradata/_context_compile.py`:
- Around line 8-14: Add the required future import at the top of the
gradata._context_compile module: insert "from __future__ import annotations" as
the very first statement in src/gradata/_context_compile.py (before any other
imports) to enable postponed evaluation of annotations for type safety and to
comply with the project coding guideline.
- Around line 90-99: The call to build_packet inside compile_context is missing
the ctx argument, which prevents build_packet (and its helpers like
_load_user_scope, _load_prospect_context, _load_drafting_context,
_load_debug_context, _load_audit_context, _load_wrapup_context) from receiving
the BrainContext; update the build_packet invocation in
src/gradata/_context_compile.py to pass ctx=ctx along with prospect, task_type,
and topic so the packet construction uses the provided context.
- Around line 76-88: The brain_search call in compile_context does not pass the
ctx parameter, so brain_search (which accepts ctx: "BrainContext | None") won't
receive the request context; update the call in src/gradata/_context_compile.py
where brain_search is invoked (currently: brain_search(message[:100], top_k=3,
mode="keyword")) to include the ctx argument (i.e., pass ctx=ctx) so it becomes
brain_search(message[:100], top_k=3, mode="keyword", ctx=ctx), ensuring
consistency with extract_entities and correct DB/context usage.

In `@src/gradata/_doctor.py`:
- Around line 236-264: The function print_diagnosis uses multiple print() calls
which violate the project's "no print()" guideline; replace prints with logging
(import logging, get a module logger via logging.getLogger(__name__)) and change
each print(...) to logger.info(...) (use logger.error or logger.warning where
appropriate for non-healthy statuses), preserving the exact formatted strings
and the use of report/status/icons/labels in print_diagnosis; do not configure
handlers here—ensure the CLI entry point configures a StreamHandler that writes
to stdout with the desired format so output remains human-readable
(alternatively, move print_diagnosis to a CLI-only module if direct stdout
semantics are required).

In `@src/gradata/_fact_extractor.py`:
- Line 271: The function get_stats uses the PEP 604 union syntax (BrainContext |
None) which requires postponed evaluation of annotations for Python 3.9
compatibility; add the required future import by inserting from __future__
import annotations at the top of this module (immediately after the module
docstring and before any other imports) so type annotations like BrainContext |
None are safe across supported Python versions.

In `@src/gradata/brain.py`:
- Around line 669-680: In _resolve_pending(), don't call write_lessons_safe()
while holding lessons_lock() because write_lessons_safe() acquires the same lock
internally; instead, inside the with lessons_lock(lessons_path): find the
matching lesson, call mutator(lesson), set matched=True and compute formatted =
format_lessons(lessons) (but do not call write_lessons_safe), then exit the with
block and only if matched call write_lessons_safe(lessons_path, formatted). This
avoids re-acquiring lessons_lock() and potential self-deadlock.

In `@src/gradata/contrib/patterns/human_loop.py`:
- Around line 496-512: The check method’s context parameter is ignored — update
check to forward the caller-supplied context into the risk request path by
calling gate with the context (e.g., gate(action, context) or gate(action,
context=context) to match gate's signature) so that assess_risk receives the
context and risk_override/affected-entities are honored; also ensure any
downstream use of request (and approver(request)) remains compatible with the
request object that now may include context-derived fields.

In `@src/gradata/enhancements/self_improvement.py`:
- Around line 748-755: Update the comment above the block_promotion logic to
accurately describe current behavior: it should state that ONE_OFF
(correction_scope == "one_off") lessons are prevented from being promoted when
they are in either LessonState.INSTINCT or LessonState.PATTERN (to defensively
handle cases where a one_off lesson may already be in PATTERN due to
manual/legacy changes), matching the condition that checks lesson.scope_json,
parses _scope, and tests _scope.get("correction_scope") == "one_off" and
lesson.state in (LessonState.INSTINCT, LessonState.PATTERN).

In `@src/gradata/onboard.py`:
- Line 306: Replace all print() calls in the onboarding interactive code with
logging calls: add "import logging" and a module logger (logger =
logging.getLogger(__name__)) and change prints like print("\n=== Gradata —
Onboarding Wizard ===\n") to logger.info("=== Gradata — Onboarding Wizard ===")
and other user-facing prints to logger.info/logger.warning as appropriate; keep
programmatic return values unchanged and ensure the CLI entrypoint configures a
StreamHandler to emit logger messages to stdout/stderr so interactive output
still appears in the terminal (update functions/methods in this module that
contain print() calls such as the onboarding wizard routines to use logger.*
instead).
- Around line 237-247: Update the _create_db function signature to accept an
optional BrainContext parameter for consistency with other DB functions: change
def _create_db(db_path: Path) -> None: to def _create_db(db_path: Path, ctx:
"BrainContext | None" = None) -> None; keep the implementation unchanged (it
doesn't need to use ctx now), add the forward-ref type annotation (string
literal) and ensure imports/type hints follow the project's pattern (no runtime
dependency), and update any callers to pass their ctx where available (or rely
on the default None) so the API matches _get_db, get_stats, validate_brain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c65400f4-2d73-4f7c-b3ae-0e648dc0d7d4

📥 Commits

Reviewing files that changed from the base of the PR and between ebb7966 and e2beb6e.

📒 Files selected for processing (94)
  • pyproject.toml
  • src/gradata/__init__.py
  • src/gradata/_brain_manifest.py
  • src/gradata/_context_compile.py
  • src/gradata/_context_packet.py
  • src/gradata/_core.py
  • src/gradata/_db.py
  • src/gradata/_doctor.py
  • src/gradata/_embed.py
  • src/gradata/_encryption.py
  • src/gradata/_fact_extractor.py
  • src/gradata/_installer.py
  • src/gradata/_manifest_helpers.py
  • src/gradata/_manifest_quality.py
  • src/gradata/_math.py
  • src/gradata/_migrations.py
  • src/gradata/_paths.py
  • src/gradata/_query.py
  • src/gradata/_stats.py
  • src/gradata/_tag_taxonomy.py
  • src/gradata/_validator.py
  • src/gradata/audit.py
  • src/gradata/benchmarks/swe_bench.py
  • src/gradata/brain.py
  • src/gradata/brain_inspection.py
  • src/gradata/cli.py
  • src/gradata/context_wrapper.py
  • src/gradata/contrib/enhancements/eval_benchmark.py
  • src/gradata/contrib/enhancements/install_manifest.py
  • src/gradata/contrib/enhancements/quality_gates.py
  • src/gradata/contrib/patterns/context_brackets.py
  • src/gradata/contrib/patterns/evaluator.py
  • src/gradata/contrib/patterns/execute_qualify.py
  • src/gradata/contrib/patterns/guardrails.py
  • src/gradata/contrib/patterns/human_loop.py
  • src/gradata/contrib/patterns/loop_detection.py
  • src/gradata/contrib/patterns/middleware.py
  • src/gradata/contrib/patterns/orchestrator.py
  • src/gradata/contrib/patterns/parallel.py
  • src/gradata/contrib/patterns/pipeline.py
  • src/gradata/contrib/patterns/q_learning_router.py
  • src/gradata/contrib/patterns/reconciliation.py
  • src/gradata/contrib/patterns/reflection.py
  • src/gradata/contrib/patterns/task_escalation.py
  • src/gradata/contrib/patterns/tree_of_thoughts.py
  • src/gradata/correction_detector.py
  • src/gradata/daemon.py
  • src/gradata/detection/__init__.py
  • src/gradata/detection/addition_pattern.py
  • src/gradata/detection/correction_conflict.py
  • src/gradata/detection/mode_classifier.py
  • src/gradata/enhancements/behavioral_extractor.py
  • src/gradata/enhancements/cluster_manager.py
  • src/gradata/enhancements/edit_classifier.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/enhancements/instruction_cache.py
  • src/gradata/enhancements/lesson_discriminator.py
  • src/gradata/enhancements/memory_taxonomy.py
  • src/gradata/enhancements/meta_rules.py
  • src/gradata/enhancements/meta_rules_storage.py
  • src/gradata/enhancements/observation_hooks.py
  • src/gradata/enhancements/pattern_extractor.py
  • src/gradata/enhancements/pattern_integration.py
  • src/gradata/enhancements/quality_monitoring.py
  • src/gradata/enhancements/reporting.py
  • src/gradata/enhancements/router_warmstart.py
  • src/gradata/enhancements/rule_evolution.py
  • src/gradata/enhancements/rule_verifier.py
  • src/gradata/enhancements/self_improvement.py
  • src/gradata/enhancements/similarity.py
  • src/gradata/enhancements/super_meta_rules.py
  • src/gradata/events_bus.py
  • src/gradata/graph.py
  • src/gradata/hooks/_base.py
  • src/gradata/hooks/auto_correct.py
  • src/gradata/hooks/claude_code.py
  • src/gradata/inspection.py
  • src/gradata/integrations/anthropic_adapter.py
  • src/gradata/integrations/crewai_adapter.py
  • src/gradata/integrations/embeddings.py
  • src/gradata/integrations/langchain_adapter.py
  • src/gradata/integrations/openai_adapter.py
  • src/gradata/mcp_server.py
  • src/gradata/mcp_tools.py
  • src/gradata/onboard.py
  • src/gradata/rules/__init__.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/rules/rule_graph.py
  • src/gradata/rules/rule_tracker.py
  • src/gradata/rules/scope.py
  • src/gradata/security/__init__.py
  • src/gradata/security/correction_provenance.py
  • src/gradata/security/manifest_signing.py
  • src/gradata/sidecar/watcher.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/enhancements/instruction_cache.py
  • src/gradata/inspection.py
  • src/gradata/enhancements/git_backfill.py
  • src/gradata/contrib/patterns/parallel.py
  • src/gradata/contrib/enhancements/install_manifest.py
  • src/gradata/_core.py
  • src/gradata/contrib/patterns/guardrails.py
  • src/gradata/enhancements/pattern_integration.py
  • src/gradata/brain_inspection.py
  • src/gradata/contrib/patterns/evaluator.py
  • src/gradata/contrib/patterns/pipeline.py
  • src/gradata/enhancements/edit_classifier.py
  • src/gradata/_paths.py
  • src/gradata/_installer.py
  • src/gradata/_embed.py
  • src/gradata/sidecar/watcher.py
  • src/gradata/enhancements/cluster_manager.py
  • src/gradata/_validator.py
  • src/gradata/enhancements/memory_taxonomy.py
  • src/gradata/enhancements/lesson_discriminator.py
  • src/gradata/_doctor.py
  • src/gradata/security/correction_provenance.py
  • src/gradata/detection/__init__.py
  • src/gradata/detection/mode_classifier.py
  • src/gradata/_context_packet.py
  • src/gradata/detection/addition_pattern.py
  • src/gradata/contrib/patterns/middleware.py
  • src/gradata/contrib/patterns/loop_detection.py
  • src/gradata/events_bus.py
  • src/gradata/_encryption.py
  • src/gradata/_query.py
  • src/gradata/audit.py
  • src/gradata/contrib/patterns/q_learning_router.py
  • src/gradata/_stats.py
  • src/gradata/integrations/anthropic_adapter.py
  • src/gradata/enhancements/super_meta_rules.py
  • src/gradata/detection/correction_conflict.py
  • src/gradata/security/__init__.py
  • src/gradata/rules/scope.py
  • src/gradata/correction_detector.py
  • src/gradata/contrib/patterns/context_brackets.py
  • src/gradata/_tag_taxonomy.py
  • src/gradata/contrib/patterns/orchestrator.py
  • src/gradata/enhancements/meta_rules.py
  • src/gradata/integrations/langchain_adapter.py
  • src/gradata/contrib/enhancements/quality_gates.py
  • src/gradata/daemon.py
  • src/gradata/_manifest_helpers.py
  • src/gradata/contrib/patterns/human_loop.py
  • src/gradata/graph.py
  • src/gradata/_fact_extractor.py
  • src/gradata/hooks/auto_correct.py
  • src/gradata/contrib/patterns/tree_of_thoughts.py
  • src/gradata/cli.py
  • src/gradata/enhancements/router_warmstart.py
  • src/gradata/enhancements/rule_verifier.py
  • src/gradata/rules/rule_graph.py
  • src/gradata/integrations/crewai_adapter.py
  • src/gradata/rules/rule_tracker.py
  • src/gradata/contrib/patterns/reflection.py
  • src/gradata/mcp_server.py
  • src/gradata/contrib/patterns/execute_qualify.py
  • src/gradata/integrations/embeddings.py
  • src/gradata/_db.py
  • src/gradata/integrations/openai_adapter.py
  • src/gradata/hooks/claude_code.py
  • src/gradata/enhancements/pattern_extractor.py
  • src/gradata/enhancements/similarity.py
  • src/gradata/onboard.py
  • src/gradata/security/manifest_signing.py
  • src/gradata/_math.py
  • src/gradata/enhancements/rule_evolution.py
  • src/gradata/mcp_tools.py
  • src/gradata/enhancements/self_improvement.py
  • src/gradata/contrib/patterns/task_escalation.py
  • src/gradata/enhancements/observation_hooks.py
  • src/gradata/enhancements/reporting.py
  • src/gradata/contrib/enhancements/eval_benchmark.py
  • src/gradata/__init__.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/enhancements/meta_rules_storage.py
  • src/gradata/benchmarks/swe_bench.py
  • src/gradata/brain.py
  • src/gradata/enhancements/behavioral_extractor.py
  • src/gradata/_context_compile.py
  • src/gradata/contrib/patterns/reconciliation.py
  • src/gradata/hooks/_base.py
  • src/gradata/context_wrapper.py
  • src/gradata/_brain_manifest.py
  • src/gradata/enhancements/quality_monitoring.py
  • src/gradata/_migrations.py
  • src/gradata/rules/__init__.py
  • src/gradata/_manifest_quality.py
src/gradata/integrations/**

⚙️ CodeRabbit configuration file

src/gradata/integrations/**: Framework integrations and nervous system modules. Check for: SSRF protection on any URL from env vars,
thread safety on shared state, embedding cache must be bounded, event bus handlers must never raise.

Files:

  • src/gradata/integrations/anthropic_adapter.py
  • src/gradata/integrations/langchain_adapter.py
  • src/gradata/integrations/crewai_adapter.py
  • src/gradata/integrations/embeddings.py
  • src/gradata/integrations/openai_adapter.py
src/gradata/hooks/**

⚙️ CodeRabbit configuration file

src/gradata/hooks/**: JavaScript hooks for Claude Code integration. Check for: no shell injection (no execSync with user
input), temp files must use per-user subdirectory, HTTP calls must have timeouts, errors must be silent (never block
the tool chain).

Files:

  • src/gradata/hooks/auto_correct.py
  • src/gradata/hooks/claude_code.py
  • src/gradata/hooks/_base.py
🪛 Ruff (0.15.9)
src/gradata/_encryption.py

[warning] 119-119: Avoid specifying long messages outside the exception class

(TRY003)

src/gradata/enhancements/meta_rules.py

[warning] 497-497: Avoid specifying long messages outside the exception class

(TRY003)

src/gradata/integrations/langchain_adapter.py

[warning] 113-113: Unnecessary pass statement

Remove unnecessary pass

(PIE790)

src/gradata/_db.py

[warning] 96-98: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 108-110: Avoid specifying long messages outside the exception class

(TRY003)

src/gradata/__init__.py

[warning] 145-145: Avoid specifying long messages outside the exception class

(TRY003)

src/gradata/benchmarks/swe_bench.py

[warning] 268-271: Avoid specifying long messages outside the exception class

(TRY003)

src/gradata/brain.py

[warning] 966-966: Avoid specifying long messages outside the exception class

(TRY003)

src/gradata/hooks/_base.py

[warning] 39-39: Do not catch blind exception: Exception

(BLE001)


[warning] 69-69: Missing return type annotation for private function get_brain

(ANN202)


[warning] 80-80: Do not catch blind exception: Exception

(BLE001)


[warning] 96-96: Do not catch blind exception: Exception

(BLE001)

else:
tmp_path.unlink(missing_ok=True)
raise RuntimeError("Encryption produced empty or missing file — plaintext preserved")
raise RuntimeError("Encryption produced empty or missing file — plaintext preserved")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM — formatting fix applied.

The trailing newline addition aligns with codebase style conventions and has no functional impact.

Optional: The static analysis tool flags TRY003 here, suggesting that the exception message could be extracted to a custom exception class for better maintainability. This is a minor style improvement and not required.

♻️ Optional refactor (custom exception class)

Define a custom exception at the module level:

class EncryptionError(RuntimeError):
    """Raised when encryption operations fail."""
    pass

Then update line 119:

-        raise RuntimeError("Encryption produced empty or missing file — plaintext preserved")
+        raise EncryptionError("Encryption produced empty or missing file — plaintext preserved")
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 119-119: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_encryption.py` at line 119, The RuntimeError raised in
_encryption.py should be converted to a custom exception for clarity and
static-analysis friendliness: define a module-level class
EncryptionError(RuntimeError) with a short docstring, then replace the current
raise RuntimeError("Encryption produced empty or missing file — plaintext
preserved") with raise EncryptionError("Encryption produced empty or missing
file — plaintext preserved") (reference the module-level EncryptionError class
and the raise in the encryption routine).

Comment on lines 268 to +271
raise ImportError(
"SWE-bench data loading requires the 'datasets' package.\n"
"Install with: pip install datasets"
)
) from e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Shorten inline ImportError message to clear Ruff TRY003

Line 268-271 still triggers TRY003. Keep cause chaining, but move to a concise one-line message (or custom exception) to satisfy lint cleanly.

Proposed tweak
-        raise ImportError(
-            "SWE-bench data loading requires the 'datasets' package.\n"
-            "Install with: pip install datasets"
-        ) from e
+        raise ImportError(
+            "Missing optional dependency 'datasets'. Install with: pip install datasets."
+        ) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise ImportError(
"SWE-bench data loading requires the 'datasets' package.\n"
"Install with: pip install datasets"
)
) from e
raise ImportError(
"Missing optional dependency 'datasets'. Install with: pip install datasets."
) from e
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 268-271: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/benchmarks/swe_bench.py` around lines 268 - 271, The multiline
ImportError raised in the SWE-bench loader should be shortened to a single-line
message to satisfy Ruff TRY003 while preserving exception chaining; locate the
raise ImportError in src.gradata.benchmarks.swe_bench (the raise ImportError ...
from e statement) and replace the multi-line string with a concise one-line
message such as "datasets package required; pip install datasets" (or raise a
small custom exception) but keep the "from e" suffix so the original exception
is chained.

Comment thread src/gradata/brain.py
Comment thread src/gradata/brain.py
Comment on lines 526 to 528
if brain and hasattr(brain, "spawn_queue"):
_sq = getattr(brain, "spawn_queue")
_sq = brain.spawn_queue # type: ignore[union-attr]
result = _sq(tasks=tasks, worker=worker, max_concurrent=max_concurrent)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and read the relevant lines
head -30 src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 1481


🏁 Script executed:

# Get the line count to understand file size
wc -l src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 107


🏁 Script executed:

# Read the area around lines 526-528
sed -n '500,550p' src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 1823


🏁 Script executed:

# Check imports at the top of the file
sed -n '1,50p' src/gradata/contrib/patterns/orchestrator.py | grep -E "^from|^import|^__future__"

Repository: Gradata/gradata

Length of output: 93


🏁 Script executed:

# Get the function signature and definition
sed -n '480,530p' src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 1967


🏁 Script executed:

# Check for any existing type hints or BrainContext references in the file
rg -n "BrainContext|Protocol|SupportsSpawnQueue" src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Check what's imported at the top, looking for typing imports
sed -n '1,100p' src/gradata/contrib/patterns/orchestrator.py | grep -E "^from typing|^import typing|from collections|Protocol"

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Check if there are print() statements in the file (should use logging per guidelines)
rg -n "print\(" src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 494


🏁 Script executed:

# Check if logging is used appropriately
rg -n "logging\.|import logging" src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 136


🏁 Script executed:

# Let's check the context around those print statements to see if they're in docstrings
sed -n '35,50p' src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 506


🏁 Script executed:

# Check the entire imports section thoroughly
head -80 src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 3110


🏁 Script executed:

# Check if there are any hardcoded paths
rg -n "\"\/|'\/|pathlib|Path\(" src/gradata/contrib/patterns/orchestrator.py | head -20

Repository: Gradata/gradata

Length of output: 148


🏁 Script executed:

# Check the full typing imports to understand what's available
sed -n '1,15p' src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 730


🏁 Script executed:

# Check if BrainContext is defined in the codebase
rg -n "class BrainContext|BrainContext" --type py | head -20

Repository: Gradata/gradata

Length of output: 1873


🏁 Script executed:

# Look at the rest of the imports to understand the typing situation
sed -n '1,120p' src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 4339


🏁 Script executed:

# Check if there are any type annotations issues or if Protocol is already imported
rg -n "from typing|Protocol" src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Check the actual docstring for the function to understand its purpose
sed -n '488,510p' src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 794


🏁 Script executed:

# Verify the exact signature of spawn_queue calls in the codebase
rg -n "spawn_queue" src/gradata/contrib/patterns/orchestrator.py -B 2 -A 2

Repository: Gradata/gradata

Length of output: 880


🏁 Script executed:

# Check if there are other usages of brain.spawn_queue or similar patterns
rg -n "spawn_queue" --type py | grep -v test | head -10

Repository: Gradata/gradata

Length of output: 911


🏁 Script executed:

# Verify if Callable and Any are imported, or need to be added
grep -n "Callable\|Any" src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 108


🏁 Script executed:

# Check if execute_orchestrated or related functions access DB and should use BrainContext
sed -n '488,542p' src/gradata/contrib/patterns/orchestrator.py

Repository: Gradata/gradata

Length of output: 2223


🏁 Script executed:

# Check what Brain.spawn_queue returns to verify the return type annotation
sed -n '1110,1145p' src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 1833


🏁 Script executed:

# Verify the complete typing imports needed for the suggested refactor
grep -n "^from typing\|^import typing" src/gradata/brain.py | head -5

Repository: Gradata/gradata

Length of output: 94


Use a Protocol to eliminate the type: ignore and improve type safety.

Lines 489–491 have overly generic type annotations (worker: object, brain: object | None), causing the type: ignore[union-attr] on line 527. Replace them with proper type hints using a Protocol that describes the spawn_queue interface.

Suggested refactor
+from typing import Any, Callable, Protocol
+
+class SupportsSpawnQueue(Protocol):
+    def spawn_queue(
+        self,
+        tasks: list[str],
+        worker: Callable[[str], Any],
+        *,
+        max_concurrent: int = 3,
+    ) -> dict: ...
+
 def execute_orchestrated(
     tasks: list[str],
-    worker: object,
-    brain: object | None = None,
+    worker: Callable[[str], Any],
+    brain: SupportsSpawnQueue | None = None,
     *,
     max_concurrent: int = 3,
 ) -> dict:
@@
-    if brain and hasattr(brain, "spawn_queue"):
-        _sq = brain.spawn_queue  # type: ignore[union-attr]
+    if brain is not None:
+        _sq = brain.spawn_queue
         result = _sq(tasks=tasks, worker=worker, max_concurrent=max_concurrent)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/contrib/patterns/orchestrator.py` around lines 526 - 528, Define
a Protocol (e.g., SpawnQueueProtocol) that declares a spawn_queue method with
the exact signature used (e.g., def spawn_queue(self, *, tasks: Sequence[Task],
worker: WorkerType, max_concurrent: int) -> Any), update the brain
parameter/type from object | None to Optional[SpawnQueueProtocol] (and
add/replace any WorkerType/Task types as appropriate), replace the unsafe
cast/_sq assignment so mypy knows _sq is the callable (e.g., _sq: Callable[...,
Any] = brain.spawn_queue) and remove the type: ignore[union-attr] on that line;
ensure imports from typing (Protocol, Optional, Callable) are added and adjust
surrounding annotations accordingly so spawn_queue is fully typed and
type-checking passes.

Comment on lines +281 to +284
if best_rule is not None and best_similarity > update_threshold and _detect_opposite_direction(new_desc, best_rule.description):
return (RuleRelation.UPDATES, best_rule)
if best_rule is not None and best_similarity > extend_threshold and not _detect_opposite_direction(new_desc, best_rule.description):
return (RuleRelation.EXTENDS, best_rule)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM! Clean refactoring that improves readability.

The consolidation of nested if blocks into single conditions preserves the original logic while reducing indentation. The behavior remains correct: UPDATES is returned when similarity exceeds the update threshold and opposite direction is detected; EXTENDS is returned when similarity exceeds the extend threshold without opposite direction.

♻️ Optional: Cache direction check to avoid redundant calls

When best_similarity > update_threshold but direction is not opposite, _detect_opposite_direction will be called again for the extend check. Consider caching the result:

     if rule.category == new_lesson.category:
         rule_keywords = _extract_keywords(rule.description)
         if new_keywords & rule_keywords:
             category_cluster.append(rule)
-    if best_rule is not None and best_similarity > update_threshold and _detect_opposite_direction(new_desc, best_rule.description):
+    
+    is_opposite = _detect_opposite_direction(new_desc, best_rule.description) if best_rule is not None else False
+    
+    if best_rule is not None and best_similarity > update_threshold and is_opposite:
         return (RuleRelation.UPDATES, best_rule)
-    if best_rule is not None and best_similarity > extend_threshold and not _detect_opposite_direction(new_desc, best_rule.description):
+    if best_rule is not None and best_similarity > extend_threshold and not is_opposite:
         return (RuleRelation.EXTENDS, best_rule)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/rule_evolution.py` around lines 281 - 284, Cache the
result of calling _detect_opposite_direction to avoid calling it twice: compute
a local boolean (e.g., is_opposite = _detect_opposite_direction(new_desc,
best_rule.description)) once when best_rule is not None and reuse it in the two
conditionals that return RuleRelation.UPDATES and RuleRelation.EXTENDS, keeping
the comparisons against best_similarity, update_threshold, and extend_threshold
unchanged and referencing best_rule, best_similarity, update_threshold,
extend_threshold, new_desc, and _detect_opposite_direction.

Comment on lines +51 to +57
def resolve_brain_dir() -> str | None:
"""Resolve brain directory from env vars or default location."""
brain_dir = os.environ.get("GRADATA_BRAIN_DIR") or os.environ.get("BRAIN_DIR")
if brain_dir:
return brain_dir if Path(brain_dir).exists() else None
default = Path.home() / ".gradata" / "brain"
return str(default) if default.exists() else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove the built-in ~/.gradata/brain fallback.

This helper hardcodes a filesystem location into hook resolution, so nonstandard installs depend on one local path convention instead of the configured SDK path source.

🧭 Proposed fix
 def resolve_brain_dir() -> str | None:
     """Resolve brain directory from env vars or default location."""
     brain_dir = os.environ.get("GRADATA_BRAIN_DIR") or os.environ.get("BRAIN_DIR")
-    if brain_dir:
-        return brain_dir if Path(brain_dir).exists() else None
-    default = Path.home() / ".gradata" / "brain"
-    return str(default) if default.exists() else None
+    if not brain_dir:
+        return None
+    return brain_dir if Path(brain_dir).exists() else None
As per coding guidelines, `src/gradata/**/*.py`: "no hardcoded paths."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/hooks/_base.py` around lines 51 - 57, The resolve_brain_dir
helper currently falls back to a hardcoded path (~/.gradata/brain); remove that
fallback so it only uses environment variables and validates them. Update
resolve_brain_dir to read GRADATA_BRAIN_DIR or BRAIN_DIR, return the path string
if the env var is set and exists (else None), and eliminate any use of
Path.home() or the default ".gradata/brain" logic so no hardcoded filesystem
path remains. Ensure the function signature and return types (resolve_brain_dir)
are preserved.

Comment on lines 101 to +104
try:
return self.brain.apply_brain_rules(task, context)
except Exception:
return ""
return ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding debug logging to the exception handler for consistency.

The save() method logs exceptions with logger.debug(), but get_rules() (and search()) silently swallow errors. Adding consistent logging would improve debuggability when these methods fail.

📋 Proposed enhancement for consistency
     try:
         return self.brain.apply_brain_rules(task, context)
-    except Exception:
+    except Exception as e:
+        logger.debug("CrewAI get_rules failed: %s", e)
         return ""

Consider the same pattern for search() at lines 88-89.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
return self.brain.apply_brain_rules(task, context)
except Exception:
return ""
\ No newline at end of file
return ""
try:
return self.brain.apply_brain_rules(task, context)
except Exception as e:
logger.debug("CrewAI get_rules failed: %s", e)
return ""
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 103-103: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/integrations/crewai_adapter.py` around lines 101 - 104, The
exception handlers in get_rules() (where it returns
self.brain.apply_brain_rules(task, context)) and in search() currently swallow
errors silently; update those except Exception blocks to log the exception
details using the same pattern as save() (e.g., logger.debug or
logger.exception) so failures include context and the exception info;
specifically modify the except blocks surrounding
self.brain.apply_brain_rules(...) and the search() implementation to call
logger.debug/exception with a brief message and the caught exception.

Comment on lines 158 to +160
bus.on("correction.created", lambda p: _embed_and_cache(p, "description", "text"), async_handler=True)
bus.on("lesson.graduated", lambda p: _embed_and_cache(p, "description"), async_handler=True)
bus.on("meta_rule.created", lambda p: _embed_and_cache(p, "description", "rule"), async_handler=True)
bus.on("meta_rule.created", lambda p: _embed_and_cache(p, "description", "rule"), async_handler=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Protect embedding cache mutations in async handlers.

At Line 158–Line 160, handlers run with async_handler=True and call _embed_and_cache, which mutates _embedding_cache without synchronization. This is a race on shared state.

Proposed thread-safety patch
+import threading
 ...
 _embedding_cache: OrderedDict[str, list[float]] = OrderedDict()
+_embedding_cache_lock = threading.RLock()
 ...
 def get_client():
     global _default_client
-    if _default_client is None:
-        _default_client = EmbeddingClient(
-            api_url=os.environ.get("GRADATA_API_URL"),
-            api_token=os.environ.get("GRADATA_API_TOKEN"),
-        )
+    with _embedding_cache_lock:
+        if _default_client is None:
+            _default_client = EmbeddingClient(
+                api_url=os.environ.get("GRADATA_API_URL"),
+                api_token=os.environ.get("GRADATA_API_TOKEN"),
+            )
     return _default_client
 ...
     def _embed_and_cache(payload, *keys):
         try:
             ...
-            if desc and desc not in _embedding_cache:
-                vec = get_client().embed(desc)
-                if vec:
-                    _embedding_cache[desc] = vec
-                    while len(_embedding_cache) > _CACHE_MAX_SIZE:
-                        _embedding_cache.popitem(last=False)
+            if desc:
+                with _embedding_cache_lock:
+                    if desc in _embedding_cache:
+                        return
+                vec = get_client().embed(desc)
+                if vec:
+                    with _embedding_cache_lock:
+                        _embedding_cache[desc] = vec
+                        while len(_embedding_cache) > _CACHE_MAX_SIZE:
+                            _embedding_cache.popitem(last=False)
         except Exception:
             logger.warning("Failed to embed payload", exc_info=True)

As per coding guidelines, "src/gradata/integrations/**: ... check for ... thread safety on shared state ...".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/integrations/embeddings.py` around lines 158 - 160, The handlers
registered with bus.on (those with async_handler=True) call _embed_and_cache
which mutates the shared dict _embedding_cache without synchronization, causing
races; fix by adding a module-level lock (e.g., _embedding_cache_lock =
threading.Lock()) and acquire it around all mutations of _embedding_cache inside
_embed_and_cache (use a context manager to ensure release on exceptions), and
ensure any other code paths that write to _embedding_cache also use the same
lock so all concurrent async_handler invocations are serialized.

def clear(self) -> None:
"""No-op for persistent brain memory."""
pass
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove redundant pass in clear().

The method already has a docstring body, so pass is unnecessary and triggers Ruff PIE790.

♻️ Proposed fix
     def clear(self) -> None:
         """No-op for persistent brain memory."""
-        pass
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 113-113: Unnecessary pass statement

Remove unnecessary pass

(PIE790)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/integrations/langchain_adapter.py` at line 113, The clear()
method contains an unnecessary trailing "pass" after its docstring which
triggers Ruff PIE790; remove the redundant "pass" from the clear() method (leave
the existing docstring in place) so the function body is just the docstring and
no no-op statement remains.

Same CI fixes as PR #20 and #21 branches.

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/sdk-ci.yml:
- Around line 3-15: The CI currently only triggers SDK paths in the sdk-ci
workflow and the main ci workflow runs pytest against the SDK tests but not the
cloud backend; update CI so cloud backend tests run by either adding a step that
executes pytest cloud/tests/ in the existing ci workflow's test job (the job
that currently runs pytest tests/ and type-checks src/) or create a new workflow
triggered on changes under the cloud path that runs pytest cloud/tests/ (and
migrations checks if present), ensuring the cloud test invocation is added to
the test job or new workflow job name so backend changes are validated.
- Around line 36-40: Replace the floating action tag and ephemeral uv version
with pinned values: change the GitHub Action usage from astral-sh/setup-uv@v4 to
a specific commit SHA (e.g.,
astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57) and set the uv
input version from "latest" to a tested fixed version (e.g., "0.4.25") so the
step that uses the setup-uv action and the with: version setting are both
deterministic.

In `@pyproject.toml`:
- Around line 105-118: The pyproject.toml currently lists a global Bandit
"skips" array that mutes 105 real violations; remove the broad skips for
high/medium risk rules and convert them to targeted suppressions: keep only
"B110" in the global skips entry (since defensive try/except/pass is pervasive),
remove "B603","B607","B608","B310","B105","B615" from the skips list, run Bandit
to locate the specific occurrences of those removed rules, and add inline "#
nosec: <rule> — rationale" comments at each justified site (for example next to
subprocess calls, SQL schema code, URL-open or constant password literals) with
a brief explanation referencing the code context; ensure any remaining
true-security issues are fixed rather than suppressed and update
pyproject.toml's skips array to reflect only the accepted global exemptions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a68ad716-b2b1-4efb-8423-8c8e09ae723e

📥 Commits

Reviewing files that changed from the base of the PR and between e2beb6e and 7c952b7.

📒 Files selected for processing (2)
  • .github/workflows/sdk-ci.yml
  • pyproject.toml
📜 Review details
🔇 Additional comments (1)
pyproject.toml (1)

40-40: Good addition: Ruff is now explicitly installed in dev environments.

This aligns local tooling with the configured lint pipeline.

Comment on lines +3 to +15
on:
push:
paths:
- "src/gradata/**"
- "tests/**"
- "pyproject.toml"
- ".github/workflows/sdk-ci.yml"
pull_request:
paths:
- "src/gradata/**"
- "tests/**"
- "pyproject.toml"
- ".github/workflows/sdk-ci.yml"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Workflows present ==="
fd -t f .github/workflows

echo "=== Workflow path filters mentioning cloud/backend/deploy ==="
rg -n -C2 'paths:|cloud/\*\*|cloud/|railway|Dockerfile|migrations' .github/workflows || true

Repository: Gradata/gradata

Length of output: 475


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Workflows present ==="
fd . '.github/workflows' -t f

echo ""
echo "=== Workflow contents and path filters ==="
for workflow in $(fd . '.github/workflows' -t f); do
  echo "--- File: $workflow ---"
  cat -n "$workflow" | head -40
done

echo ""
echo "=== Search for cloud/deploy/backend references across workflows ==="
rg 'paths:|cloud/|migrations|Dockerfile|railway|deploy' '.github/workflows' -n -A2 || echo "No matches found"

Repository: Gradata/gradata

Length of output: 5930


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Repository root structure ==="
ls -la

echo ""
echo "=== Check for cloud directory ==="
fd -t d cloud . 2>/dev/null || echo "No 'cloud' directory found"

echo ""
echo "=== Check for migration-related files ==="
fd -t f migration . 2>/dev/null | head -20 || echo "No migration files found"

echo ""
echo "=== Check for Dockerfile ==="
fd -t f Dockerfile . 2>/dev/null || echo "No Dockerfile found"

echo ""
echo "=== List main directories at root ==="
fd -d 1 -t d . | grep -v '^\.$' | sort

Repository: Gradata/gradata

Length of output: 1355


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Cloud directory structure ==="
find cloud/ -type f | head -20

echo ""
echo "=== Cloud directory contents (tree view) ==="
tree cloud/ -L 3 2>/dev/null || find cloud/ -type f | sed 's|[^/]*/| |g'

echo ""
echo "=== Check if cloud code has tests ==="
fd test . cloud/ || echo "No test files in cloud/"

echo ""
echo "=== Sample cloud file to understand what it is ==="
head -20 cloud/*/\*.py | head -50 || head -30 $(find cloud/ -type f -name "*.py" | head -1) 2>/dev/null || echo "No Python files found"

Repository: Gradata/gradata

Length of output: 3920


Cloud backend code lacks CI validation coverage.

The cloud/** directory (with app code, migrations, and its own test suite in cloud/tests/) is not tested by any workflow:

  • sdk-ci.yml (lines 5–15) explicitly limits triggers to SDK paths only, excluding cloud/**
  • ci.yml has no path filters and runs on all changes, but its test job only executes pytest tests/ (SDK tests) and type-checks src/ — it does not run cloud/tests/

Backend changes to migrations, API routes, or auth logic can merge to main without validation. Add a workflow step to run pytest cloud/tests/ in ci.yml, or create a separate workflow that triggers on cloud/** changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/sdk-ci.yml around lines 3 - 15, The CI currently only
triggers SDK paths in the sdk-ci workflow and the main ci workflow runs pytest
against the SDK tests but not the cloud backend; update CI so cloud backend
tests run by either adding a step that executes pytest cloud/tests/ in the
existing ci workflow's test job (the job that currently runs pytest tests/ and
type-checks src/) or create a new workflow triggered on changes under the cloud
path that runs pytest cloud/tests/ (and migrations checks if present), ensuring
the cloud test invocation is added to the test job or new workflow job name so
backend changes are validated.

Comment on lines +36 to +40
- name: Install uv
uses: astral-sh/setup-uv@v4
with:
version: "latest"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🌐 Web query:

What is the recommended pinning strategy for astral-sh/setup-uvin GitHub Actions, and shouldversion: "latest" be avoided for reproducible CI?

💡 Result:

Recommended pinning for astral-sh/setup-uv

  • Pin the action itself to an immutable reference, ideally a full commit SHA (best), or at least an immutable semver tag like v8.0.0 (not v8). The project explicitly recommends the SHA, and it has stopped publishing major/minor floating tags (so @v8 / @v8.0 won’t be available). (github.com)
  • This also matches GitHub’s guidance for third-party actions: pin to a full-length commit SHA for immutability/security. (docs.github.com)

Example:

- uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0

(github.com)

Should version: "latest" be avoided for reproducible CI?

Yes—avoid version: "latest" (and relying on the default “latest”) if you want reproducible CI, because the action will install whatever uv version is latest at runtime when no required version is found. Prefer pinning with: version: "X.Y.Z" or setting a required-version in uv.toml / pyproject.toml so installs don’t drift over time. (github.com)

Citations:


Pin uv to a fixed version instead of latest.

Using version: "latest" introduces nondeterministic CI breakages. Pin a tested version (e.g., version: "0.4.25") for reproducible builds. Also pin the action itself to a commit SHA rather than the floating v4 tag (e.g., astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/sdk-ci.yml around lines 36 - 40, Replace the floating
action tag and ephemeral uv version with pinned values: change the GitHub Action
usage from astral-sh/setup-uv@v4 to a specific commit SHA (e.g.,
astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57) and set the uv
input version from "latest" to a tested fixed version (e.g., "0.4.25") so the
step that uses the setup-uv action and the with: version setting are both
deterministic.

Comment thread pyproject.toml
Comment on lines +105 to +118
skips = [
"B101", # allow assert in non-test code
"B110", # try/except/pass — SDK uses defensive coding with logged fallbacks
"B112", # try/except/continue — same pattern
"B311", # pseudo-random — used for non-security shuffling (rule ordering)
"B324", # weak hash — SHA256 used for content hashing, not security
"B404", # subprocess import — SDK shells out to qmd/git intentionally
"B603", # subprocess without shell — intentional CLI tool invocations
"B607", # partial path — CLI tools resolved via PATH
"B608", # SQL string construction — SDK builds SQLite queries with parameterized inputs
"B310", # url open audit — SDK opens URLs from validated config, not user input
"B105", # hardcoded password — false positive on string constants like 'pass', 'False'
"B615", # huggingface download — benchmark code, not production
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python -m pip install --quiet "bandit>=1.7,<2" "tomli>=2,<3"

echo "=== Current configured skips ==="
python - <<'PY'
import tomli, pathlib
cfg = tomli.loads(pathlib.Path("pyproject.toml").read_text())
print(cfg["tool"]["bandit"].get("skips", []))
PY

echo "=== Bandit with current config ==="
bandit -r src/gradata/ -c pyproject.toml -f json -o /tmp/bandit_config.json || true
python - <<'PY'
import json
d=json.load(open("/tmp/bandit_config.json"))
print("Issues (with config):", len(d.get("results", [])))
PY

echo "=== Bandit baseline without custom config ==="
bandit -r src/gradata/ -f json -o /tmp/bandit_baseline.json || true
python - <<'PY'
import json
d=json.load(open("/tmp/bandit_baseline.json"))
print("Issues (baseline):", len(d.get("results", [])))
sev={}
for r in d.get("results", []):
    sev[r["issue_severity"]] = sev.get(r["issue_severity"],0)+1
print("Baseline severity counts:", sev)
PY

Repository: Gradata/gradata

Length of output: 699


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Fix PATH for bandit and run comprehensive analysis
export PATH="/home/jailuser/.local/bin:$PATH"

echo "=== Verifying bandit is accessible ==="
which bandit || echo "Bandit not found"

# Re-run bandit with proper PATH
echo "=== Bandit with current config (pyproject.toml) ==="
bandit -r src/gradata/ -c pyproject.toml -f json -o /tmp/bandit_config.json 2>&1 || EXIT_CODE=$?
if [ -f /tmp/bandit_config.json ]; then
  python3 - <<'PY'
import json
d = json.load(open("/tmp/bandit_config.json"))
print(f"Issues found (with config): {len(d.get('results', []))}")
for result in d.get('results', [])[:10]:
    print(f"  - {result['test_id']}: {result['issue_text']} (severity: {result['issue_severity']})")
PY
else
  echo "Config file not generated"
fi

echo "=== Bandit baseline (no custom config) ==="
bandit -r src/gradata/ -f json -o /tmp/bandit_baseline.json 2>&1 || EXIT_CODE=$?
if [ -f /tmp/bandit_baseline.json ]; then
  python3 - <<'PY'
import json
d = json.load(open("/tmp/bandit_baseline.json"))
print(f"Issues found (baseline): {len(d.get('results', []))}")
sev = {}
for r in d.get('results', []):
    sev[r['issue_severity']] = sev.get(r['issue_severity'], 0) + 1
print(f"Severity breakdown: {sev}")

# Count by test_id
test_counts = {}
for r in d.get('results', []):
    tid = r['test_id']
    test_counts[tid] = test_counts.get(tid, 0) + 1
print("\nTop issues by test_id:")
for tid, count in sorted(test_counts.items(), key=lambda x: -x[1])[:15]:
    print(f"  {tid}: {count}")
PY
else
  echo "Baseline file not generated"
fi

Repository: Gradata/gradata

Length of output: 1154


🏁 Script executed:

#!/bin/bash

# Search for actual subprocess calls and patterns mentioned in the skips
echo "=== Searching for subprocess patterns (B603, B607, B404) ==="
rg "subprocess|import os|shell\s*=" src/gradata/ -A 2 -B 1 | head -50

echo -e "\n=== Searching for SQL construction patterns (B608) ==="
rg "SELECT|INSERT|UPDATE|DELETE|\.execute\(" src/gradata/ -A 2 -B 1 | head -50

echo -e "\n=== Searching for URL opening (B310) ==="
rg "urllib|urlopen|requests\." src/gradata/ -A 2 -B 1 | head -50

Repository: Gradata/gradata

Length of output: 9692


🏁 Script executed:

#!/bin/bash

# Check the actual structure and assess if nosec would be practical
echo "=== Number of files in src/gradata ==="
find src/gradata -type f -name "*.py" | wc -l

echo "=== Lines of code in src/gradata ==="
find src/gradata -type f -name "*.py" -exec wc -l {} + | tail -1

echo "=== Check if there are already nosec comments in use ==="
rg "# nosec" src/gradata/ | wc -l

echo "=== Current test exclusion ==="
rg "exclude_dirs.*tests" pyproject.toml -A 2

Repository: Gradata/gradata

Length of output: 448


Global Bandit skips suppress 105 real violations, including 3 HIGH and 11 MEDIUM severity issues—consider targeted suppression for high-risk checks.

Lines 105–118 globally disable security checks for the entire codebase. With current configuration, 0 issues are flagged; without it, the baseline shows 105 violations (3 HIGH, 11 MEDIUM, 91 LOW). Most concerning:

  • B110 (61 occurrences of try/except/pass) is genuinely pervasive—global skip is pragmatic here
  • B603/B607 (subprocess without shell, partial path) appear 11 times combined in controlled contexts (git operations with list-format commands)
  • B608 (SQL string construction) occurs 6 times, mostly for static schema operations with parameterized inputs
  • B310 (URL opening) occurs 4 times with SSRF validation logic present
  • B105 (hardcoded password) flags 6 false positives on string constants

The review comment's core concern is valid: global skips mask violations entirely and weaken enforcement. However, the practical path forward is more nuanced:

  1. Keep B110 global (61 occurrences of defensive try/except patterns are too numerous to annotate)
  2. Convert HIGH/MEDIUM violations to targeted # nosec comments (14 issues) with inline rationale
  3. Audit false positives (B105, B615) separately

Currently, zero # nosec comments exist in the codebase. Implementing targeted suppression is feasible for the high-risk checks but requires code changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 105 - 118, The pyproject.toml currently lists a
global Bandit "skips" array that mutes 105 real violations; remove the broad
skips for high/medium risk rules and convert them to targeted suppressions: keep
only "B110" in the global skips entry (since defensive try/except/pass is
pervasive), remove "B603","B607","B608","B310","B105","B615" from the skips
list, run Bandit to locate the specific occurrences of those removed rules, and
add inline "# nosec: <rule> — rationale" comments at each justified site (for
example next to subprocess calls, SQL schema code, URL-open or constant password
literals) with a brief explanation referencing the code context; ensure any
remaining true-security issues are fixed rather than suppressed and update
pyproject.toml's skips array to reflect only the accepted global exemptions.

- SQL: add UNIQUE(workspace_id, user_id) on brains table
- SQL: drop duplicate idx_brains_api_key (UNIQUE already creates index)
- SQL: pin search_path on SECURITY DEFINER trigger function
- SQL: create default brain in handle_new_user trigger
- brain.py: wrap patch_rule read-mutate-write under lessons_lock
- brain.py: protect emit from making successful patch look failed
- behavioral_extractor.py: align fallback _FACTUAL_RE with edit_classifier
- behavioral_extractor.py: remove capitalization bypass in _is_actionable
- conftest.py: mock select() now respects filters
- sync.py: preserve created_at from SDK payload on corrections/events

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/gradata/brain.py (1)

703-713: ⚠️ Potential issue | 🟡 Minor

Ensure DB connection closes on error paths in review_pending().

At Line 707, connection close happens only on success. If query/fetch throws before Line 712, the connection is left open.

🔧 Proposed fix
         try:
             import sqlite3

             from gradata._db import get_connection

-            conn = get_connection(self.db_path)
-            conn.row_factory = sqlite3.Row
-            rows = conn.execute(
-                "SELECT * FROM pending_approvals WHERE resolution IS NULL "
-                "ORDER BY created_at DESC").fetchall()
-            conn.close()
+            with contextlib.closing(get_connection(self.db_path)) as conn:
+                conn.row_factory = sqlite3.Row
+                rows = conn.execute(
+                    "SELECT * FROM pending_approvals WHERE resolution IS NULL "
+                    "ORDER BY created_at DESC").fetchall()
             return [dict(r) for r in rows]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/brain.py` around lines 703 - 713, The review_pending() function
opens a DB connection via get_connection(self.db_path) and currently calls
conn.close() only after a successful execute/fetchall; wrap the query logic in a
try/finally (or use a context manager) to ensure conn.close() is always called
even if conn.execute(...) or fetchall() raises. Specifically, ensure the conn
object created in review_pending() is closed in the finally block (or by using a
with-style context) around the execute/fetchall and row_factory usage so no
connections are leaked.
♻️ Duplicate comments (2)
src/gradata/enhancements/behavioral_extractor.py (1)

422-430: ⚠️ Potential issue | 🟡 Minor

Fix ruff SIM103: return the condition directly.

The logic correctly addresses the past review finding (removes capitalization bypass), but the pipeline fails due to the verbose return pattern.

🔧 Proposed fix to satisfy ruff SIM103
 def _is_actionable(instruction: str) -> bool:
     if not instruction or len(instruction) < 5:
         return False
     first_word = instruction.split()[0].lower().removesuffix("'t")
-    if first_word in _IMPERATIVE_STARTERS:
-        return True
-    # Accept only known PREFIX_INSTRUCTION patterns (explicit user rules like
-    # "User corrected: ...") — don't treat arbitrary capitalized sentences as actionable
-    return False
+    return first_word in _IMPERATIVE_STARTERS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/behavioral_extractor.py` around lines 422 - 430, In
_is_actionable, simplify the final check by returning the boolean expression
directly instead of the verbose if/return pattern: compute first_word as
currently done (using instruction.split()[0].lower().removesuffix("'t")) and
then return first_word in _IMPERATIVE_STARTERS; keep the early
empty/short-string guard intact and preserve the comment about
PREFIX_INSTRUCTION patterns.
cloud/app/routes/sync.py (1)

59-66: 🧹 Nitpick | 🔵 Trivial

Avoid single-letter variable l—easily confused with 1.

Using l as an iterator variable is flagged by linters (E741) because it's visually ambiguous with the digit 1.

♻️ Rename to `lesson`
         rows = [
             {
                 "brain_id": brain_id,
-                "category": l.category,
-                "description": l.description,
-                "state": l.state.value,
-                "confidence": float(l.confidence),
-                "fire_count": l.fire_count,
-                "recurrence_days": l.recurrence_days,
+                "category": lesson.category,
+                "description": lesson.description,
+                "state": lesson.state.value,
+                "confidence": float(lesson.confidence),
+                "fire_count": lesson.fire_count,
+                "recurrence_days": lesson.recurrence_days,
             }
-            for l in body.lessons
+            for lesson in body.lessons
         ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/sync.py` around lines 59 - 66, The comprehension is using
the ambiguous single-letter iterator `l`; rename it to `lesson` everywhere in
that expression (e.g., change "for l in body.lessons" to "for lesson in
body.lessons" and update all uses like l.category, l.description, l.state,
l.confidence, l.fire_count, l.recurrence_days to lesson.category,
lesson.description, lesson.state, lesson.confidence, lesson.fire_count,
lesson.recurrence_days) so linters won't flag E741 and readability improves.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/sync.py`:
- Around line 88-100: The insert currently drops SDK-provided lesson refs by
always writing source_lesson_ids: []; update the rows construction to populate
source_lesson_ids from the incoming payload (use m.source_lesson_ids or the
appropriate field on body.meta_rules items, falling back to [] when missing) so
db.insert("meta_rules", rows) persists SDK-provided references; keep existing
brain_id/title/description handling and update meta_rules_count accordingly, and
if resolution is intentionally deferred add a TODO comment referencing
server-side resolution to make the behavior explicit.
- Around line 54-69: The upsert into the lessons table via db.upsert(rows) is
not idempotent because Postgres needs a unique constraint or an explicit
conflict target to merge duplicates; ensure conflicts are resolved by either
adding a UNIQUE constraint on (brain_id, description) in the DB schema/migration
for the lessons table, or change the upsert call to include an explicit conflict
target/ON CONFLICT clause (so db.upsert("lessons", rows,
conflict_target=["brain_id","description"]) or equivalent) so that db.upsert and
the INSERT/merge will update existing rows rather than inserting duplicates;
reference the lessons table and the db.upsert usage in the sync handler that
constructs rows with brain_id and description.

In `@cloud/migrations/001_initial_schema.sql`:
- Around line 199-206: The policy rule_patches_via_lesson on rule_patches only
allows access when b.user_id = auth.uid(), excluding workspace members; update
the USING clause to mirror the other table policies by permitting access if the
auth user is the brain owner OR is a workspace member of that brain (e.g., add
the same JOIN/EXISTS logic used in corrections/lessons/meta_rules/events to
check workspace_members for b.id and auth.uid()), so rule_patches_via_lesson
grants access to both brain owners and workspace members for lessons-linked
patches.
- Around line 218-234: The trigger body currently INSERTs into workspaces and
then re-queries that table twice with SELECT ... ORDER BY created_at DESC LIMIT
1; change this to capture the inserted workspace id directly (use INSERT ...
RETURNING id INTO a local variable or DECLARE a workspace_id and use RETURNING
INTO) immediately after the INSERT INTO workspaces (referencing the same INSERT
statement in the BEGIN block), then use that workspace_id variable for the
subsequent INSERT INTO workspace_members and INSERT INTO brains calls instead of
the two SELECT subqueries to avoid race conditions and improve efficiency.

In `@cloud/tests/conftest.py`:
- Around line 49-50: The mock update(self, table: str, data: dict, filters: dict
| None = None) currently ignores filters and always returns [data]; change it to
respect filters or at least record them so tests can assert correct filter
usage: implement filtering logic similar to the mock select() (apply filters to
the stored rows for the given table and return only matching rows after applying
the updates) or append each call to a list like self._updates.append({"table":
table, "data": data, "filters": filters}) so tests can assert filter values;
update the update method and any test fixtures that rely on it (refer to update
and select in conftest.py).

In `@src/gradata/brain.py`:
- Around line 1149-1150: The "Re-export Pipeline type" block currently does
nothing; replace the no-op inside the contextlib.suppress(ImportError): block
with an actual re-export of the Pipeline type (e.g., import Pipeline from the
module that defines it and bind it in this module's namespace), and ensure the
symbol is included in this module's export list (__all__) if present; make the
import happen inside the existing contextlib.suppress(ImportError) so import
failures remain tolerated (refer to the existing contextlib.suppress usage and
the symbol name Pipeline to find where to add the import/re-export).
- Around line 346-357: After locating lessons_path via
self._find_lessons_path(), avoid mutating external/fallback lessons files by
adding a safety check before entering lessons_lock and calling
write_lessons_safe: verify lessons_path is inside the repository-controlled
location (or not equal to the known external fallback/read-only path) and if it
is external, return {"patched": False, "error": "readonly: external lessons
file"} instead of writing; update the code around _find_lessons_path,
lessons_lock, apply_patch, and write_lessons_safe to perform this guard so
apply_patch and format_lessons are only used when writes are permitted.

---

Outside diff comments:
In `@src/gradata/brain.py`:
- Around line 703-713: The review_pending() function opens a DB connection via
get_connection(self.db_path) and currently calls conn.close() only after a
successful execute/fetchall; wrap the query logic in a try/finally (or use a
context manager) to ensure conn.close() is always called even if
conn.execute(...) or fetchall() raises. Specifically, ensure the conn object
created in review_pending() is closed in the finally block (or by using a
with-style context) around the execute/fetchall and row_factory usage so no
connections are leaked.

---

Duplicate comments:
In `@cloud/app/routes/sync.py`:
- Around line 59-66: The comprehension is using the ambiguous single-letter
iterator `l`; rename it to `lesson` everywhere in that expression (e.g., change
"for l in body.lessons" to "for lesson in body.lessons" and update all uses like
l.category, l.description, l.state, l.confidence, l.fire_count,
l.recurrence_days to lesson.category, lesson.description, lesson.state,
lesson.confidence, lesson.fire_count, lesson.recurrence_days) so linters won't
flag E741 and readability improves.

In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 422-430: In _is_actionable, simplify the final check by returning
the boolean expression directly instead of the verbose if/return pattern:
compute first_word as currently done (using
instruction.split()[0].lower().removesuffix("'t")) and then return first_word in
_IMPERATIVE_STARTERS; keep the early empty/short-string guard intact and
preserve the comment about PREFIX_INSTRUCTION patterns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fcd68b89-b423-4da0-8269-bdf96f6f7e4f

📥 Commits

Reviewing files that changed from the base of the PR and between 7c952b7 and 01e8f73.

📒 Files selected for processing (5)
  • cloud/app/routes/sync.py
  • cloud/migrations/001_initial_schema.sql
  • cloud/tests/conftest.py
  • src/gradata/brain.py
  • src/gradata/enhancements/behavioral_extractor.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/enhancements/behavioral_extractor.py
  • src/gradata/brain.py
🪛 GitHub Actions: SDK CI
src/gradata/enhancements/behavioral_extractor.py

[error] 426-426: ruff check failed: SIM103 Return the condition directly. Inline condition.

🪛 SQLFluff (4.1.0)
cloud/migrations/001_initial_schema.sql

[error] 28-242: Line 28, Position 1: Found unparsable section: 'CREATE TABLE brains (\n id UUID PRIMAR...'

(PRS)

🔇 Additional comments (12)
src/gradata/enhancements/behavioral_extractor.py (4)

33-37: LGTM — fallback regex now covers common factual patterns.

The expanded _FACTUAL_RE fallback now properly matches ISO dates, percentages, dollar amounts, URLs, and multi-digit numbers, aligning with the canonical edit_classifier._FACTUAL_RE behavior.


204-205: LGTM — strict=True documents the length invariant.

Both iterables derive from the same source list, so lengths are guaranteed equal. The explicit strict=True serves as a correctness assertion.


208-222: LGTM — word-boundary matching prevents false positives.

Using re.search(r'\b...\b', ...) on precomputed lowercase strings ensures case-insensitive matching while avoiding substring false positives (e.g., "must" in "mustard").


405-407: LGTM — extended imperative vocabulary improves extraction coverage.

The additional verbs (prioritize, emphasize, focus, simplify, etc.) are common action words that expand the quality gate's acceptance of valid instructions.

src/gradata/brain.py (3)

350-368: Good hardening of patch flow and event emission.

Keeping parse→apply→write under lessons_lock (Line 350) and making emit failures non-fatal (Lines 359-368) correctly avoids race-related clobbers and false failure reporting.


653-659: sqlite3.Row setup in _resolve_pending looks correct.

The scoped sqlite3 import and conn.row_factory = sqlite3.Row at Line 659 keep row access consistent and explicit.


970-970: Exception chaining in export() is correct.

raise RuntimeError(...) from e at Line 970 preserves root cause while presenting a clearer API-level error.

cloud/app/routes/sync.py (2)

36-100: Partial failure leaves orphaned data—consider documenting or mitigating.

As noted in a previous review, the sync endpoint performs multiple independent writes. If, for example, db.insert("events", ...) fails after corrections and lessons succeed, the client receives an error but partial data remains committed. PostgREST doesn't support multi-statement transactions via REST.

Options to consider:

  1. Document the behavior so SDK clients know to retry the full sync on failure.
  2. Add idempotency keys to correction/event rows so retries can be deduplicated.
  3. Use a Postgres function (RPC) to wrap all inserts in a single transaction.

38-52: created_at preservation now implemented—previous concern addressed.

The correction and event rows now conditionally include created_at from the payload (Lines 47 and 81), preserving SDK-provided timestamps.

Also applies to: 72-86

cloud/migrations/001_initial_schema.sql (1)

28-37: Previous review findings have been addressed.

The UNIQUE (workspace_id, user_id) constraint is now in place (Line 36), and the SET search_path = public has been added to the trigger function (Line 216). The duplicate idx_brains_api_key index has also been removed.

cloud/tests/conftest.py (2)

30-37: Previous review finding addressed—filter logic now implemented.

The select() method correctly filters rows by matching all key-value pairs when filters is provided.


53-58: No action required—the monkeypatch approach is correct.

The fixture properly targets app.db._client, which is a module-level singleton initialized to None. When tests run, the monkeypatch intercepts this variable before get_db() is called. The code structure (singleton pattern with lazy initialization) makes the monkeypatch effective, not problematic. The inline comment "Replaced by mock in tests" confirms this approach is intentional.

			> Likely an incorrect or invalid review comment.

Comment thread cloud/app/routes/sync.py
Comment on lines +54 to +69
# Upsert lessons (idempotent by description)
if body.lessons:
rows = [
{
"brain_id": brain_id,
"category": l.category,
"description": l.description,
"state": l.state.value,
"confidence": float(l.confidence),
"fire_count": l.fire_count,
"recurrence_days": l.recurrence_days,
}
for l in body.lessons
]
await db.upsert("lessons", rows)
lessons_count = len(rows)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify lessons table constraints in the migration
rg -A 15 "CREATE TABLE lessons" cloud/migrations/001_initial_schema.sql

Repository: Gradata/gradata

Length of output: 746


🏁 Script executed:

# Check for any indexes or constraints defined on the lessons table
rg "lessons" cloud/migrations/001_initial_schema.sql -A 2 -B 2

Repository: Gradata/gradata

Length of output: 1541


🏁 Script executed:

# Look for the db.upsert implementation to understand its behavior
fd -type f -name "*.py" | xargs rg -l "def upsert" | head -5

Repository: Gradata/gradata

Length of output: 425


🏁 Script executed:

# Read the db.upsert implementation
rg -A 20 "def upsert" cloud/app/db.py

Repository: Gradata/gradata

Length of output: 954


Lessons upsert lacks a unique constraint—db.upsert will insert duplicates.

The comment says "idempotent by description," but the lessons table schema has no UNIQUE constraint on (brain_id, description) or similar. PostgREST's upsert (via Prefer: resolution=merge-duplicates) requires a unique constraint to resolve conflicts; without one, the auto-generated UUID primary key is the only conflict point, so each sync inserts new rows instead of updating existing lessons with matching brain_id and description.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/sync.py` around lines 54 - 69, The upsert into the lessons
table via db.upsert(rows) is not idempotent because Postgres needs a unique
constraint or an explicit conflict target to merge duplicates; ensure conflicts
are resolved by either adding a UNIQUE constraint on (brain_id, description) in
the DB schema/migration for the lessons table, or change the upsert call to
include an explicit conflict target/ON CONFLICT clause (so db.upsert("lessons",
rows, conflict_target=["brain_id","description"]) or equivalent) so that
db.upsert and the INSERT/merge will update existing rows rather than inserting
duplicates; reference the lessons table and the db.upsert usage in the sync
handler that constructs rows with brain_id and description.

Comment thread cloud/app/routes/sync.py
Comment on lines +88 to +100
# Insert meta-rules
if body.meta_rules:
rows = [
{
"brain_id": brain_id,
"title": m.title,
"description": m.description,
"source_lesson_ids": [], # Resolved server-side later
}
for m in body.meta_rules
]
await db.insert("meta_rules", rows)
meta_rules_count = len(rows)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

source_lesson_ids hardcoded to empty—SDK payload field ignored.

The MetaRulePayload likely includes source_lesson_ids, but the insert always sets it to []. If the SDK sends lesson references, they're discarded. If resolution is truly deferred ("Resolved server-side later"), consider adding a TODO issue or documenting the expected follow-up.

Would you like me to open an issue to track server-side resolution of source_lesson_ids?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/sync.py` around lines 88 - 100, The insert currently drops
SDK-provided lesson refs by always writing source_lesson_ids: []; update the
rows construction to populate source_lesson_ids from the incoming payload (use
m.source_lesson_ids or the appropriate field on body.meta_rules items, falling
back to [] when missing) so db.insert("meta_rules", rows) persists SDK-provided
references; keep existing brain_id/title/description handling and update
meta_rules_count accordingly, and if resolution is intentionally deferred add a
TODO comment referencing server-side resolution to make the behavior explicit.

Comment on lines +199 to +206
CREATE POLICY rule_patches_via_lesson ON rule_patches
FOR ALL USING (
lesson_id IN (
SELECT l.id FROM lessons l
JOIN brains b ON b.id = l.brain_id
WHERE b.user_id = auth.uid()
)
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

rule_patches policy is inconsistent with sibling tables—workspace members excluded.

The policies for corrections, lessons, meta_rules, and events all grant access to both brain owners and workspace members via a JOIN. However, rule_patches_via_lesson only checks brain ownership (b.user_id = auth.uid()), excluding team members who can view the parent lesson but not its patches.

🔧 Proposed fix to align with other policies
 CREATE POLICY rule_patches_via_lesson ON rule_patches
     FOR ALL USING (
         lesson_id IN (
             SELECT l.id FROM lessons l
             JOIN brains b ON b.id = l.brain_id
             WHERE b.user_id = auth.uid()
         )
+        OR lesson_id IN (
+            SELECT l.id FROM lessons l
+            JOIN brains b ON b.id = l.brain_id
+            JOIN workspace_members wm ON wm.workspace_id = b.workspace_id
+            WHERE wm.user_id = auth.uid()
+        )
     );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE POLICY rule_patches_via_lesson ON rule_patches
FOR ALL USING (
lesson_id IN (
SELECT l.id FROM lessons l
JOIN brains b ON b.id = l.brain_id
WHERE b.user_id = auth.uid()
)
);
CREATE POLICY rule_patches_via_lesson ON rule_patches
FOR ALL USING (
lesson_id IN (
SELECT l.id FROM lessons l
JOIN brains b ON b.id = l.brain_id
WHERE b.user_id = auth.uid()
)
OR lesson_id IN (
SELECT l.id FROM lessons l
JOIN brains b ON b.id = l.brain_id
JOIN workspace_members wm ON wm.workspace_id = b.workspace_id
WHERE wm.user_id = auth.uid()
)
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/migrations/001_initial_schema.sql` around lines 199 - 206, The policy
rule_patches_via_lesson on rule_patches only allows access when b.user_id =
auth.uid(), excluding workspace members; update the USING clause to mirror the
other table policies by permitting access if the auth user is the brain owner OR
is a workspace member of that brain (e.g., add the same JOIN/EXISTS logic used
in corrections/lessons/meta_rules/events to check workspace_members for b.id and
auth.uid()), so rule_patches_via_lesson grants access to both brain owners and
workspace members for lessons-linked patches.

Comment on lines +218 to +234
BEGIN
INSERT INTO workspaces (name, owner_id)
VALUES (COALESCE(NEW.raw_user_meta_data->>'full_name', 'My Workspace'), NEW.id);

INSERT INTO workspace_members (workspace_id, user_id, role)
VALUES (
(SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1),
NEW.id,
'owner'
);

INSERT INTO brains (workspace_id, user_id, name)
VALUES (
(SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1),
NEW.id,
'default'
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Trigger re-queries workspaces twice—use RETURNING or a variable instead.

The function inserts a workspace, then immediately runs two separate SELECT ... ORDER BY created_at DESC LIMIT 1 subqueries to retrieve the same workspace ID. This is fragile if concurrent inserts occur (unlikely but possible) and inefficient.

♻️ Proposed refactor using a local variable
 CREATE OR REPLACE FUNCTION handle_new_user()
 RETURNS trigger
 LANGUAGE plpgsql
 SECURITY DEFINER
 SET search_path = public
 AS $$
+DECLARE
+    new_workspace_id UUID;
 BEGIN
-    INSERT INTO workspaces (name, owner_id)
-    VALUES (COALESCE(NEW.raw_user_meta_data->>'full_name', 'My Workspace'), NEW.id);
+    INSERT INTO workspaces (name, owner_id)
+    VALUES (COALESCE(NEW.raw_user_meta_data->>'full_name', 'My Workspace'), NEW.id)
+    RETURNING id INTO new_workspace_id;
 
-    INSERT INTO workspace_members (workspace_id, user_id, role)
-    VALUES (
-        (SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1),
-        NEW.id,
-        'owner'
-    );
+    INSERT INTO workspace_members (workspace_id, user_id, role)
+    VALUES (new_workspace_id, NEW.id, 'owner');
 
-    INSERT INTO brains (workspace_id, user_id, name)
-    VALUES (
-        (SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1),
-        NEW.id,
-        'default'
-    );
+    INSERT INTO brains (workspace_id, user_id, name)
+    VALUES (new_workspace_id, NEW.id, 'default');
 
     RETURN NEW;
 END;
 $$;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/migrations/001_initial_schema.sql` around lines 218 - 234, The trigger
body currently INSERTs into workspaces and then re-queries that table twice with
SELECT ... ORDER BY created_at DESC LIMIT 1; change this to capture the inserted
workspace id directly (use INSERT ... RETURNING id INTO a local variable or
DECLARE a workspace_id and use RETURNING INTO) immediately after the INSERT INTO
workspaces (referencing the same INSERT statement in the BEGIN block), then use
that workspace_id variable for the subsequent INSERT INTO workspace_members and
INSERT INTO brains calls instead of the two SELECT subqueries to avoid race
conditions and improve efficiency.

Comment thread cloud/tests/conftest.py
Comment on lines +49 to +50
async def update(self, table: str, data: dict, filters: dict | None = None) -> list[dict]:
return [data]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Mock update() ignores filters, may mask bugs in filter-dependent logic.

The mock returns [data] regardless of the filters parameter. While simple, this could mask issues where the wrong filter is passed (e.g., filtering by a non-existent ID). Consider returning filtered rows similar to select() or at least storing the filters for assertion.

♻️ Suggested improvement for better test fidelity
+    def __init__(self):
+        self._responses: dict[str, dict[str, list[Any]]] = defaultdict(lambda: defaultdict(list))
+        self._inserts: list[dict] = []
+        self._updates: list[tuple[str, dict, dict | None]] = []  # (table, data, filters)
+
     async def update(self, table: str, data: dict, filters: dict | None = None) -> list[dict]:
+        self._updates.append((table, data, filters))
         return [data]

Tests can then assert mock._updates contains expected filter values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/tests/conftest.py` around lines 49 - 50, The mock update(self, table:
str, data: dict, filters: dict | None = None) currently ignores filters and
always returns [data]; change it to respect filters or at least record them so
tests can assert correct filter usage: implement filtering logic similar to the
mock select() (apply filters to the stored rows for the given table and return
only matching rows after applying the updates) or append each call to a list
like self._updates.append({"table": table, "data": data, "filters": filters}) so
tests can assert filter values; update the update method and any test fixtures
that rely on it (refer to update and select in conftest.py).

Comment thread src/gradata/brain.py Outdated
Comment thread src/gradata/brain.py
Comment on lines 1149 to +1150
with contextlib.suppress(ImportError):
pass
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pipeline re-export is currently a no-op.

The block labeled “Re-export Pipeline type” only executes pass (Line 1150), so the advertised re-export is missing and may break import expectations.

🔧 Proposed fix
 # Re-export Pipeline type
 with contextlib.suppress(ImportError):
-    pass
+    from gradata.contrib.patterns.pipeline import Pipeline
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/brain.py` around lines 1149 - 1150, The "Re-export Pipeline type"
block currently does nothing; replace the no-op inside the
contextlib.suppress(ImportError): block with an actual re-export of the Pipeline
type (e.g., import Pipeline from the module that defines it and bind it in this
module's namespace), and ensure the symbol is included in this module's export
list (__all__) if present; make the import happen inside the existing
contextlib.suppress(ImportError) so import failures remain tolerated (refer to
the existing contextlib.suppress usage and the symbol name Pipeline to find
where to add the import/re-export).

Oliver Le and others added 3 commits April 10, 2026 00:03
Co-Authored-By: Gradata <noreply@gradata.ai>
Same fix as PR #20/#21 — 12 corrections across 3 sessions.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
@Gradata Gradata merged commit 46d7103 into main Apr 10, 2026
6 of 7 checks passed
Comment on lines +42 to +46
rows = await db.select(
"brains",
columns="id,user_id,brain_name,domain,manifest,last_sync_at,created_at",
filters={"user_id": brain["user_id"]},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Non-existent columns queried — will 400 in production

The columns string references brain_name, domain, and manifest, but none of those columns exist in the brains table. The schema defines the column as name, with no domain or manifest columns at all. PostgREST will return a 400 Bad Request for every real call to this endpoint, even though the mocked test passes (the mock ignores the columns argument).

Suggested change
rows = await db.select(
"brains",
columns="id,user_id,brain_name,domain,manifest,last_sync_at,created_at",
filters={"user_id": brain["user_id"]},
)
rows = await db.select(
"brains",
columns="id,user_id,name,last_sync_at,created_at",
filters={"user_id": brain["user_id"]},
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/routes/brains.py
Line: 42-46

Comment:
**Non-existent columns queried — will 400 in production**

The `columns` string references `brain_name`, `domain`, and `manifest`, but none of those columns exist in the `brains` table. The schema defines the column as `name`, with no `domain` or `manifest` columns at all. PostgREST will return a `400 Bad Request` for every real call to this endpoint, even though the mocked test passes (the mock ignores the `columns` argument).

```suggestion
    rows = await db.select(
        "brains",
        columns="id,user_id,name,last_sync_at,created_at",
        filters={"user_id": brain["user_id"]},
    )
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment thread cloud/app/db.py
Comment on lines +50 to +55
async def upsert(self, table: str, data: dict | list[dict]) -> list[dict]:
"""UPSERT rows (INSERT ... ON CONFLICT DO UPDATE)."""
headers = {**self._headers, "Prefer": "return=representation,resolution=merge-duplicates"}
resp = await self._http.post(f"/{table}", json=data, headers=headers)
resp.raise_for_status()
return resp.json()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Upsert has no on_conflict target — lessons will always be duplicated

resolution=merge-duplicates tells PostgREST to do an upsert, but without an on_conflict query parameter it defaults to the primary key (id). Since lesson rows have no pre-set id, there is never a conflict and every row is inserted fresh. Every sync() call appends duplicate lesson records — breaking the stated "idempotent by description" behaviour.

Two things are needed:

  1. A unique constraint on (brain_id, description) in the migration.
  2. Pass on_conflict as a query parameter in db.upsert().
async def upsert(
    self, table: str, data: dict | list[dict], on_conflict: str | None = None,
) -> list[dict]:
    headers = {**self._headers, "Prefer": "return=representation,resolution=merge-duplicates"}
    params: dict[str, str] = {}
    if on_conflict:
        params["on_conflict"] = on_conflict
    resp = await self._http.post(f"/{table}", json=data, headers=headers, params=params)
    resp.raise_for_status()
    return resp.json()

Then in sync.py:

await db.upsert("lessons", rows, on_conflict="brain_id,description")
Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/db.py
Line: 50-55

Comment:
**Upsert has no `on_conflict` target — lessons will always be duplicated**

`resolution=merge-duplicates` tells PostgREST to do an upsert, but without an `on_conflict` query parameter it defaults to the primary key (`id`). Since lesson rows have no pre-set `id`, there is never a conflict and every row is inserted fresh. Every `sync()` call appends duplicate lesson records — breaking the stated "idempotent by description" behaviour.

Two things are needed:
1. A unique constraint on `(brain_id, description)` in the migration.
2. Pass `on_conflict` as a query parameter in `db.upsert()`.

```python
async def upsert(
    self, table: str, data: dict | list[dict], on_conflict: str | None = None,
) -> list[dict]:
    headers = {**self._headers, "Prefer": "return=representation,resolution=merge-duplicates"}
    params: dict[str, str] = {}
    if on_conflict:
        params["on_conflict"] = on_conflict
    resp = await self._http.post(f"/{table}", json=data, headers=headers, params=params)
    resp.raise_for_status()
    return resp.json()
```

Then in `sync.py`:
```python
await db.upsert("lessons", rows, on_conflict="brain_id,description")
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment thread cloud/app/auth.py
Comment on lines +33 to +35
try:
claims = jose_jwt.decode(signed_jwt, key, algorithms=["HS256"])
user_id = claims.get("sub")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security JWT aud claim not validated

jose_jwt.decode is called without an audience argument. Supabase JWTs carry aud: "authenticated", and python-jose skips audience validation when the parameter is omitted. This means any JWT signed with the same HMAC key passes regardless of which service issued it.

Suggested change
try:
claims = jose_jwt.decode(signed_jwt, key, algorithms=["HS256"])
user_id = claims.get("sub")
claims = jose_jwt.decode(
signed_jwt, key, algorithms=["HS256"], audience="authenticated"
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/auth.py
Line: 33-35

Comment:
**JWT `aud` claim not validated**

`jose_jwt.decode` is called without an `audience` argument. Supabase JWTs carry `aud: "authenticated"`, and python-jose skips audience validation when the parameter is omitted. This means any JWT signed with the same HMAC key passes regardless of which service issued it.

```suggestion
        claims = jose_jwt.decode(
            signed_jwt, key, algorithms=["HS256"], audience="authenticated"
        )
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant