feat: cloud backend API — Supabase schema, FastAPI sync endpoint, Railway deploy#22
feat: cloud backend API — Supabase schema, FastAPI sync endpoint, Railway deploy#22
Conversation
Co-Authored-By: Gradata <noreply@gradata.ai>
8 tables (workspaces, workspace_members, brains, corrections, lessons, meta_rules, events, rule_patches), indexes, RLS policies for data isolation, and auto-create-workspace trigger on user signup. Co-Authored-By: Gradata <noreply@gradata.ai>
Async httpx wrapper for PostgREST (select/insert/upsert), API key lookup for SDK clients, HS256 JWT verification for dashboard users, and 5 passing tests with MockSupabaseClient fixture. Co-Authored-By: Gradata <noreply@gradata.ai>
SyncRequest, SyncResponse, CorrectionPayload, LessonPayload, EventPayload, MetaRulePayload with Severity/LessonState enums, confidence bounds validation, and 5 passing tests. Co-Authored-By: Gradata <noreply@gradata.ai>
CORS + request logging middleware, GET /health endpoint, route aggregation with stub sync/brains routers for Tasks 6-7. Co-Authored-By: Gradata <noreply@gradata.ai>
Implements POST /api/v1/sync (corrections, lessons, events, meta-rules), POST /api/v1/brains/connect, GET /api/v1/brains, and Railway deployment config. All 17 tests pass (auth: 5, models: 5, sync: 5, brains: 2). Co-Authored-By: Gradata <noreply@gradata.ai>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Gradata Cloud FastAPI service (config, settings, auth), an async Supabase REST client, middleware and app factory, sync/brains/health routes, DB migration with RLS/trigger, hook base utilities, packaging/deploy files, and a pytest suite with a Supabase mock. Also many small formatting/refactor edits across src/. Changes
Sequence Diagram(s)sequenceDiagram
participant SDK as SDK Client
participant API as Gradata Cloud API
participant Auth as Auth Verifier
participant DB as Supabase REST
SDK->>API: POST /api/v1/sync (Authorization, payload)
API->>Auth: validate token (gd_* API key or HS256 JWT)
Auth-->>API: resolved brain record / user_id
API->>DB: select brain by api_key or user_id
DB-->>API: brain row
API->>DB: insert corrections/events/meta_rules
API->>DB: upsert lessons (merge duplicates)
DB-->>API: operation results
API->>DB: update brains set last_sync_at = now()
API-->>SDK: 200 { status: "ok", ...counts }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
| @@ -0,0 +1,9 @@ | |||
| [build] | |||
| builder = "DOCKERFILE" | |||
| dockerfilePath = "Dockerfile" | |||
There was a problem hiding this comment.
railway.toml points to Dockerfile but no Dockerfile is included in this PR. Railway will refuse to build and the deployment goal stated in the PR description cannot be met until this file is added.
A minimal working example:
FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir .
COPY app/ ./app/
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/railway.toml
Line: 3
Comment:
**Missing Dockerfile**
`railway.toml` points to `Dockerfile` but no `Dockerfile` is included in this PR. Railway will refuse to build and the deployment goal stated in the PR description cannot be met until this file is added.
A minimal working example:
```dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir .
COPY app/ ./app/
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/.env.example`:
- Around line 2-5: The .env example has Supabase keys out of the expected
dotenv-linter order; reorder the environment variables so their keys follow the
linter's configured order (use the unique keys GRADATA_SUPABASE_URL,
GRADATA_SUPABASE_ANON_KEY, GRADATA_SUPABASE_SERVICE_KEY, and
GRADATA_SUPABASE_JWT_KEY) — update the sequence in cloud/.env.example to the
proposed canonical ordering so dotenv-linter no longer flags lines 3-5.
In `@cloud/app/config.py`:
- Around line 24-26: get_settings currently claims to be a "Cached settings
loader" but returns a new Settings() on every call; modify get_settings (and/or
introduce a module-level cached instance) to return a single cached Settings
object instead of recreating it each call—e.g., use functools.lru_cache on
get_settings or create a private _settings: Optional[Settings] variable and
initialize it once and return it thereafter; ensure references to Settings and
the get_settings function are updated so callers receive the same cached
Settings instance.
In `@cloud/app/main.py`:
- Around line 15-19: The FastAPI app instantiation currently only disables
Swagger UI via docs_url but leaves the OpenAPI JSON and ReDoc exposed; update
the FastAPI(...) call (the app variable) to set openapi_url=None and
redoc_url=None as well when settings.environment == "production" (i.e., set
docs_url=None, redoc_url=None, openapi_url=None in that branch) so that app =
FastAPI(...) fully disables OpenAPI, ReDoc, and Swagger in production.
In `@cloud/app/middleware.py`:
- Around line 25-31: The middleware log_requests currently only logs after a
successful call_next; modify log_requests to wrap await call_next(request) in
try/except/finally so that if call_next raises we still compute elapsed time and
emit a log (use _log.error or _log.exception) including request.method,
request.url.path, elapsed and the exception details, then re-raise the
exception; in the success path keep the existing info log with
response.status_code and elapsed. Ensure you reference the existing function
name log_requests and variables start, elapsed, request, call_next, and response
when adding the try/except/finally.
In `@cloud/app/routes/brains.py`:
- Around line 28-35: The connect_brain handler currently only logs and echoes
brain["id"] without creating or upserting a row in the brains table and it
depends on get_current_brain which prevents first-time JWT flows; modify
connect_brain to accept a possibly-missing brain (allow None from
get_current_brain or remove the strict dependency), then perform an upsert: if
brain is None create a new brains row (populate owner info from the auth
token/API key, set brain_name from ConnectRequest) and return its id; if brain
exists update its name/metadata as needed and return the existing id; use your
DB session helper (e.g., get_db or existing ORM utilities) and ensure the
handler still returns ConnectResponse and preserves logging.
- Around line 38-42: The endpoint list_brains currently returns raw brain rows
(via db.select("brains") which defaults to select=*) and only filters by
brain["user_id"], exposing secrets like api_key and omitting brains shared via
workspace membership; update list_brains to (1) explicitly pass a select list of
safe public fields (e.g., id, name, workspace_id, created_at, updated_at) to
db.select instead of the default *, and (2) expand the filter logic to return
brains where user_id == current_user_id OR workspace_id IN (workspaces where
current_user_id is a member) by querying workspace_members (or using a
join/subquery) before calling db.select so shared-workspace brains are included.
Ensure to reference the current user ID from the brain/dependency
(get_current_brain) and use the "brains" table and "workspace_members" relation
names when implementing.
In `@cloud/app/routes/health.py`:
- Around line 9-11: The health endpoint hardcodes the service version in the
health() function; update health() to read the version from a centralized source
(e.g., your configuration object or build metadata) instead of the literal
"0.1.0": import/use the app settings/config (for example a Settings class or env
var like settings.VERSION or os.getenv/BUILD_VERSION or app.state.version
depending on your project conventions), return that value in the payload and
include a sensible fallback if the config value is missing; keep the endpoint
signature (health) and response keys unchanged so callers are not broken.
In `@cloud/app/routes/sync.py`:
- Around line 35-97: The sync handler is performing independent writes
(body.corrections, body.lessons, body.events, body.meta_rules) using
db.insert/db.upsert which is non-idempotent and not atomic; wrap all writes in a
single database transaction (use your db.transaction or equivalent) and add a
per-sync idempotency key to each row (e.g., idempotency_key derived from the
request) so retries can be deduplicated; change inserts/upserts to include the
idempotency_key column for corrections/events/meta_rules and use upsert/ON
CONFLICT logic keyed on (idempotency_key, brain_id, <natural_key>) or check
existence within the transaction before inserting, then commit or rollback the
whole unit and set corrections_count/lessons_count/events_count/meta_rules_count
after the transaction completes.
In `@cloud/migrations/001_initial_schema.sql`:
- Around line 109-110: Remove the redundant non-unique index creation for
idx_brains_api_key: delete the CREATE INDEX idx_brains_api_key ON
brains(api_key); line from the migration so you rely on the UNIQUE btree index
already created by the api_key TEXT NOT NULL UNIQUE column (do not remove the
UNIQUE constraint itself).
- Around line 212-227: The SECURITY DEFINER function handle_new_user uses
unqualified table names (workspaces, workspace_members) so you must pin the
search_path to a safe value to avoid name-resolution hijacking; modify
handle_new_user to set a fixed search_path at start (for example SET LOCAL
search_path = pg_catalog, public) so all subsequent INSERTs target the intended
schema, and keep the function declared LANGUAGE plpgsql SECURITY DEFINER as
before.
- Around line 28-36: The brains table claims "one per user per workspace" but
only enforces uniqueness on api_key; add a uniqueness constraint on the pair
(workspace_id, user_id) to prevent duplicate brains for the same user/workspace.
Update the CREATE TABLE for brains (or add an ALTER TABLE) to add a UNIQUE
constraint named e.g. unique_brains_workspace_user covering workspace_id and
user_id so lookups by (workspace_id, user_id) are unambiguous.
In `@cloud/tests/conftest.py`:
- Around line 30-31: The mock select method currently ignores the filters
parameter; update async def select(self, table: str, columns: str = "*",
filters: dict | None = None) -> list[dict]: to fetch rows =
self._responses[table].get("select", []) and, if filters is not None, return
only rows that match all key==value pairs in filters (simple equality match on
dict keys), otherwise return rows unchanged; preserve return type list[dict] and
leave columns handling as-is. This ensures filter-dependent tests (auth/data
scoping) are exercised using the existing self._responses data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a3d50ebc-2df4-41fa-8a15-ffef45a920ad
📒 Files selected for processing (21)
cloud/.env.examplecloud/app/__init__.pycloud/app/auth.pycloud/app/config.pycloud/app/db.pycloud/app/main.pycloud/app/middleware.pycloud/app/models.pycloud/app/routes/__init__.pycloud/app/routes/brains.pycloud/app/routes/health.pycloud/app/routes/sync.pycloud/migrations/001_initial_schema.sqlcloud/pyproject.tomlcloud/railway.tomlcloud/tests/__init__.pycloud/tests/conftest.pycloud/tests/test_auth.pycloud/tests/test_brains.pycloud/tests/test_models.pycloud/tests/test_sync.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
🪛 dotenv-linter (4.0.0)
cloud/.env.example
[warning] 3-3: [UnorderedKey] The GRADATA_SUPABASE_ANON_KEY key should go before the GRADATA_SUPABASE_URL key
(UnorderedKey)
[warning] 4-4: [UnorderedKey] The GRADATA_SUPABASE_SERVICE_KEY key should go before the GRADATA_SUPABASE_URL key
(UnorderedKey)
[warning] 5-5: [UnorderedKey] The GRADATA_SUPABASE_JWT_KEY key should go before the GRADATA_SUPABASE_SERVICE_KEY key
(UnorderedKey)
🪛 Ruff (0.15.9)
cloud/app/middleware.py
[warning] 26-26: Missing return type annotation for private function log_requests
(ANN202)
cloud/app/routes/sync.py
[warning] 20-20: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
[error] 64-64: Ambiguous variable name: l
(E741)
cloud/tests/conftest.py
[warning] 23-23: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 30-30: Unused method argument: columns
(ARG002)
[warning] 30-30: Unused method argument: filters
(ARG002)
[warning] 33-33: Unused method argument: table
(ARG002)
[warning] 38-38: Unused method argument: table
(ARG002)
cloud/app/auth.py
[warning] 38-38: Consider moving this statement to an else block
(TRY300)
[warning] 44-44: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
cloud/app/routes/brains.py
[warning] 31-31: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
[warning] 39-39: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
cloud/app/db.py
[warning] 18-18: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 61-61: Dynamically typed expressions (typing.Any) are disallowed in get_db
(ANN401)
[warning] 63-63: Using the global statement to update _client is discouraged
(PLW0603)
cloud/app/models.py
[warning] 10-10: Class Severity inherits from both str and enum.Enum
Inherit from enum.StrEnum
(UP042)
[warning] 18-18: Class LessonState inherits from both str and enum.Enum
Inherit from enum.StrEnum
(UP042)
[warning] 61-61: Avoid specifying long messages outside the exception class
(TRY003)
🪛 SQLFluff (4.1.0)
cloud/migrations/001_initial_schema.sql
[error] 28-231: Line 28, Position 1: Found unparsable section: 'CREATE TABLE brains (\n id UUID PRIMAR...'
(PRS)
🔇 Additional comments (10)
cloud/app/__init__.py (1)
1-1: Looks good.Clear package-level module docstring; no concerns.
cloud/tests/__init__.py (1)
1-1: Looks good.Test package docstring is clear and harmless.
cloud/.env.example (1)
7-9: App defaults look good.
GRADATA_ENVIRONMENTandGRADATA_LOG_LEVELdefaults are sensible for local setup.cloud/railway.toml (1)
1-9: Railway config is consistent with app routes.
healthcheckPath = "/health"matchescloud/app/routes/health.pyand router registration incloud/app/main.py.cloud/app/routes/__init__.py (1)
6-11: Good router aggregation.Route composition is straightforward and keeps API module boundaries clear.
cloud/tests/test_auth.py (2)
10-37: API key auth tests are solid.Good coverage of success path and both key failure modes (missing + unknown).
39-61: JWT tests are well-targeted.Valid-token extraction and expired-token rejection are both verified.
cloud/tests/test_brains.py (1)
7-35: Good endpoint-level coverage for brains connect/list flows.These tests correctly stub auth lookup and verify the response contract for both
POST /brains/connectandGET /brains.cloud/tests/test_models.py (1)
10-63: Model validation tests are solid and target the right constraints.Enum rejection, confidence bounds, and non-empty event type checks are all covered in focused unit tests.
cloud/tests/test_sync.py (1)
7-103: Nice baseline coverage for sync happy-path + auth/validation failures.The suite exercises core counters and key rejection paths (
401/403,422) forPOST /sync.
| GRADATA_SUPABASE_URL=https://YOURS.supabase.co | ||
| GRADATA_SUPABASE_ANON_KEY=YOURS | ||
| GRADATA_SUPABASE_SERVICE_KEY=YOURS | ||
| GRADATA_SUPABASE_JWT_KEY=YOURS |
There was a problem hiding this comment.
Reorder Supabase keys to satisfy dotenv-linter.
Lines 3-5 currently violate the configured key ordering checks. Please reorder these keys to avoid lint noise/failures.
♻️ Proposed reorder
-GRADATA_SUPABASE_URL=https://YOURS.supabase.co
GRADATA_SUPABASE_ANON_KEY=YOURS
-GRADATA_SUPABASE_SERVICE_KEY=YOURS
GRADATA_SUPABASE_JWT_KEY=YOURS
+GRADATA_SUPABASE_SERVICE_KEY=YOURS
+GRADATA_SUPABASE_URL=https://YOURS.supabase.co🧰 Tools
🪛 dotenv-linter (4.0.0)
[warning] 3-3: [UnorderedKey] The GRADATA_SUPABASE_ANON_KEY key should go before the GRADATA_SUPABASE_URL key
(UnorderedKey)
[warning] 4-4: [UnorderedKey] The GRADATA_SUPABASE_SERVICE_KEY key should go before the GRADATA_SUPABASE_URL key
(UnorderedKey)
[warning] 5-5: [UnorderedKey] The GRADATA_SUPABASE_JWT_KEY key should go before the GRADATA_SUPABASE_SERVICE_KEY key
(UnorderedKey)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/.env.example` around lines 2 - 5, The .env example has Supabase keys
out of the expected dotenv-linter order; reorder the environment variables so
their keys follow the linter's configured order (use the unique keys
GRADATA_SUPABASE_URL, GRADATA_SUPABASE_ANON_KEY, GRADATA_SUPABASE_SERVICE_KEY,
and GRADATA_SUPABASE_JWT_KEY) — update the sequence in cloud/.env.example to the
proposed canonical ordering so dotenv-linter no longer flags lines 3-5.
| @app.middleware("http") | ||
| async def log_requests(request: Request, call_next): | ||
| start = time.monotonic() | ||
| response = await call_next(request) | ||
| elapsed = (time.monotonic() - start) * 1000 | ||
| _log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, response.status_code, elapsed) | ||
| return response |
There was a problem hiding this comment.
Error-path requests are not logged.
When call_next raises, this middleware exits before logging. That drops observability for failing requests, which is high-impact during incidents.
🛠️ Proposed fix
`@app.middleware`("http")
async def log_requests(request: Request, call_next):
start = time.monotonic()
- response = await call_next(request)
- elapsed = (time.monotonic() - start) * 1000
- _log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, response.status_code, elapsed)
- return response
+ status_code = 500
+ try:
+ response = await call_next(request)
+ status_code = response.status_code
+ return response
+ finally:
+ elapsed = (time.monotonic() - start) * 1000
+ _log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, status_code, elapsed)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @app.middleware("http") | |
| async def log_requests(request: Request, call_next): | |
| start = time.monotonic() | |
| response = await call_next(request) | |
| elapsed = (time.monotonic() - start) * 1000 | |
| _log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, response.status_code, elapsed) | |
| return response | |
| `@app.middleware`("http") | |
| async def log_requests(request: Request, call_next): | |
| start = time.monotonic() | |
| status_code = 500 | |
| try: | |
| response = await call_next(request) | |
| status_code = response.status_code | |
| return response | |
| finally: | |
| elapsed = (time.monotonic() - start) * 1000 | |
| _log.info("%s %s -> %d (%.0fms)", request.method, request.url.path, status_code, elapsed) |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 26-26: Missing return type annotation for private function log_requests
(ANN202)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/middleware.py` around lines 25 - 31, The middleware log_requests
currently only logs after a successful call_next; modify log_requests to wrap
await call_next(request) in try/except/finally so that if call_next raises we
still compute elapsed time and emit a log (use _log.error or _log.exception)
including request.method, request.url.path, elapsed and the exception details,
then re-raise the exception; in the success path keep the existing info log with
response.status_code and elapsed. Ensure you reference the existing function
name log_requests and variables start, elapsed, request, call_next, and response
when adding the try/except/finally.
| @router.post("/brains/connect", response_model=ConnectResponse) | ||
| async def connect_brain( | ||
| body: ConnectRequest, | ||
| brain: dict = Depends(get_current_brain), | ||
| ) -> ConnectResponse: | ||
| """Register/reconnect a brain. Called by CloudClient.connect().""" | ||
| _log.info("Brain connected: %s (name=%s)", brain["id"], body.brain_name) | ||
| return ConnectResponse(brain_id=brain["id"]) |
There was a problem hiding this comment.
Implement actual connect/create behavior here.
This handler never inserts or upserts a brains row. Because it depends on app.auth.get_current_brain, JWT callers with no existing brain will fail before reaching the handler, so the first-time connect flow cannot succeed. For API-key callers, this endpoint is effectively just echoing the current brain id.
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 31-31: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/brains.py` around lines 28 - 35, The connect_brain handler
currently only logs and echoes brain["id"] without creating or upserting a row
in the brains table and it depends on get_current_brain which prevents
first-time JWT flows; modify connect_brain to accept a possibly-missing brain
(allow None from get_current_brain or remove the strict dependency), then
perform an upsert: if brain is None create a new brains row (populate owner info
from the auth token/API key, set brain_name from ConnectRequest) and return its
id; if brain exists update its name/metadata as needed and return the existing
id; use your DB session helper (e.g., get_db or existing ORM utilities) and
ensure the handler still returns ConnectResponse and preserves logging.
| # Insert corrections | ||
| if body.corrections: | ||
| rows = [ | ||
| { | ||
| "brain_id": brain_id, | ||
| "session": c.session, | ||
| "category": c.category, | ||
| "severity": c.severity.value, | ||
| "description": c.description, | ||
| "draft_preview": c.draft_preview[:500], | ||
| "final_preview": c.final_preview[:500], | ||
| } | ||
| for c in body.corrections | ||
| ] | ||
| await db.insert("corrections", rows) | ||
| corrections_count = len(rows) | ||
|
|
||
| # Upsert lessons (idempotent by description) | ||
| if body.lessons: | ||
| rows = [ | ||
| { | ||
| "brain_id": brain_id, | ||
| "category": l.category, | ||
| "description": l.description, | ||
| "state": l.state.value, | ||
| "confidence": float(l.confidence), | ||
| "fire_count": l.fire_count, | ||
| "recurrence_days": l.recurrence_days, | ||
| } | ||
| for l in body.lessons | ||
| ] | ||
| await db.upsert("lessons", rows) | ||
| lessons_count = len(rows) | ||
|
|
||
| # Insert events | ||
| if body.events: | ||
| rows = [ | ||
| { | ||
| "brain_id": brain_id, | ||
| "type": e.type, | ||
| "source": e.source, | ||
| "data": e.data, | ||
| "tags": e.tags, | ||
| "session": e.session, | ||
| } | ||
| for e in body.events | ||
| ] | ||
| await db.insert("events", rows) | ||
| events_count = len(rows) | ||
|
|
||
| # Insert meta-rules | ||
| if body.meta_rules: | ||
| rows = [ | ||
| { | ||
| "brain_id": brain_id, | ||
| "title": m.title, | ||
| "description": m.description, | ||
| "source_lesson_ids": [], # Resolved server-side later | ||
| } | ||
| for m in body.meta_rules | ||
| ] | ||
| await db.insert("meta_rules", rows) | ||
| meta_rules_count = len(rows) |
There was a problem hiding this comment.
Sync persistence is non-idempotent and non-atomic.
Current flow issues independent writes for corrections/lessons/events/meta-rules without a transaction or dedupe key. Retries can duplicate records, and mid-flight failures can persist partial state.
Please add an idempotency key (per sync batch) and execute all section writes in a single transactional unit (or single RPC) so retries are safe and failures roll back consistently.
🧰 Tools
🪛 Ruff (0.15.9)
[error] 64-64: Ambiguous variable name: l
(E741)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/sync.py` around lines 35 - 97, The sync handler is
performing independent writes (body.corrections, body.lessons, body.events,
body.meta_rules) using db.insert/db.upsert which is non-idempotent and not
atomic; wrap all writes in a single database transaction (use your
db.transaction or equivalent) and add a per-sync idempotency key to each row
(e.g., idempotency_key derived from the request) so retries can be deduplicated;
change inserts/upserts to include the idempotency_key column for
corrections/events/meta_rules and use upsert/ON CONFLICT logic keyed on
(idempotency_key, brain_id, <natural_key>) or check existence within the
transaction before inserting, then commit or rollback the whole unit and set
corrections_count/lessons_count/events_count/meta_rules_count after the
transaction completes.
- Cache get_settings() with @lru_cache(maxsize=1) - Disable openapi_url and redoc_url in production - Add asynccontextmanager lifespan to clean up httpx client - Wrap log_requests middleware in try/except so errors get logged - Project out api_key column from brains list endpoint - Update last_sync_at after successful sync - Reorder .env.example vars alphabetically - Add update() method to SupabaseClient and mock Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
cloud/app/middleware.py (1)
30-33:⚠️ Potential issue | 🟠 MajorInclude exception details in error-path request logs.
Line 30-Line 33 logs only a generic ERROR line; traceback/context is dropped. Switch to
_log.exception(...)(orexc_info=True) before re-raising.Proposed fix
except Exception: elapsed = (time.monotonic() - start) * 1000 - _log.error("%s %s -> ERROR (%.0fms)", request.method, request.url.path, elapsed) + _log.exception("%s %s -> ERROR (%.0fms)", request.method, request.url.path, elapsed) raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/middleware.py` around lines 30 - 33, The except block in middleware.py currently calls _log.error(...) and re-raises, which loses the traceback; change that logging call to include exception details by using _log.exception("%s %s -> ERROR (%.0fms)", request.method, request.url.path, elapsed) (or _log.error(..., exc_info=True)) so the stacktrace and exception context are recorded before the existing raise; keep the computed elapsed and the re-raise behavior intact.cloud/app/routes/brains.py (2)
28-35:⚠️ Potential issue | 🟠 Major
POST /brains/connectstill does not create or upsert a brain record.The handler only echoes the current brain id, and
Depends(get_current_brain)still blocks first-time JWT connect flows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/brains.py` around lines 28 - 35, The connect_brain handler currently just echoes the existing brain id and uses Depends(get_current_brain), which blocks first-time JWT connect flows and never creates/upserts a brain record; change connect_brain to accept the ConnectRequest and extract/validate the JWT (or use a non-blocking/optional dependency rather than get_current_brain), derive the brain id from the token, then perform a create-or-update (upsert) into your brains store using the brain id and body.brain_name (and any metadata), and finally return ConnectResponse(brain_id=brain_id); remove or replace the hard dependency on get_current_brain so the endpoint allows initial connections and ensures a brain record is created/updated on every call.
39-46:⚠️ Potential issue | 🟠 MajorAccessible-brains contract does not match query behavior.
Line 40 says “accessible,” but the query only returns brains where
user_id == current_user_id; shared-workspace brains are excluded.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/brains.py` around lines 39 - 46, The docstring for list_brains claims it returns "accessible" brains but the db.select call only filters by user_id (brain["user_id"]) which excludes shared/workspace brains; update list_brains to actually return brains accessible to the current user by either expanding the filters passed to db.select to include OR conditions for workspace/shared membership (e.g., include workspace_id in a list of workspaces the user belongs to or join against the memberships table), or call a helper that computes accessible brain IDs and query with filters={"id": {"$in": accessible_ids}}; locate the logic in list_brains (and the use of get_current_brain and db.select) and implement the appropriate filter/join so shared-workspace brains are included.cloud/tests/conftest.py (1)
30-31:⚠️ Potential issue | 🟠 MajorMock
select()should apply filters.Line 30-Line 31 ignores
filters, so auth/scoping tests don’t exercise filter-dependent behavior.Proposed fix
async def select(self, table: str, columns: str = "*", filters: dict | None = None) -> list[dict]: - return self._responses[table].get("select", []) + rows = self._responses[table].get("select", []) + if not filters: + return rows + return [row for row in rows if all(row.get(k) == v for k, v in filters.items())]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/tests/conftest.py` around lines 30 - 31, The mock select method currently ignores the filters parameter; update async def select(self, table: str, columns: str = "*", filters: dict | None = None) to return only rows from self._responses[table].get("select", []) that match the provided filters (if filters is None return all rows). Implement simple key->value exact-match filtering across each row dict (skip rows missing a filter key or with non-equal values), and ensure safe handling when table or "select" key is absent by returning an empty list. Use the existing select method and self._responses identifiers when making the change.cloud/app/routes/sync.py (1)
36-105:⚠️ Potential issue | 🔴 CriticalMake sync persistence atomic and idempotent.
The handler still does independent writes without a transactional unit or request idempotency key. Retries can duplicate data, and failures can persist partial batches.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/sync.py` around lines 36 - 105, The sync handler currently performs independent writes (db.insert on "corrections","events","meta_rules", db.upsert on "lessons", and db.update on "brains") which can produce partial/duplicate state on retries; wrap all persistence in a single transaction (use the DB's transaction API, e.g. db.transaction / begin/commit/rollback) so either all inserts/upserts and the last_sync_at update succeed or all are rolled back, and add request-level idempotency by accepting a stable sync identifier (e.g. body.sync_id or request_id) and persist it with each row or into a "sync_requests" table (or add a unique constraint+upsert key) to detect/reject duplicates so repeated requests do not re-insert the same corrections/events/meta_rules; ensure corrections/events/meta_rules insertion uses the idempotency key or unique constraints and lessons continue to use upsert by description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/app/routes/sync.py`:
- Around line 39-47: The payload dicts built for corrections/events drop the
caller timestamp so timeline fidelity is lost; include the
SyncRequest.created_at value when constructing each row (i.e., add a
"created_at": c.created_at entry alongside brain_id, session, category,
severity.value, description, draft_preview, final_preview) and do the same in
the other similar payload block so both correction and event rows carry the
original timestamp (preserve the value/format expected downstream).
---
Duplicate comments:
In `@cloud/app/middleware.py`:
- Around line 30-33: The except block in middleware.py currently calls
_log.error(...) and re-raises, which loses the traceback; change that logging
call to include exception details by using _log.exception("%s %s -> ERROR
(%.0fms)", request.method, request.url.path, elapsed) (or _log.error(...,
exc_info=True)) so the stacktrace and exception context are recorded before the
existing raise; keep the computed elapsed and the re-raise behavior intact.
In `@cloud/app/routes/brains.py`:
- Around line 28-35: The connect_brain handler currently just echoes the
existing brain id and uses Depends(get_current_brain), which blocks first-time
JWT connect flows and never creates/upserts a brain record; change connect_brain
to accept the ConnectRequest and extract/validate the JWT (or use a
non-blocking/optional dependency rather than get_current_brain), derive the
brain id from the token, then perform a create-or-update (upsert) into your
brains store using the brain id and body.brain_name (and any metadata), and
finally return ConnectResponse(brain_id=brain_id); remove or replace the hard
dependency on get_current_brain so the endpoint allows initial connections and
ensures a brain record is created/updated on every call.
- Around line 39-46: The docstring for list_brains claims it returns
"accessible" brains but the db.select call only filters by user_id
(brain["user_id"]) which excludes shared/workspace brains; update list_brains to
actually return brains accessible to the current user by either expanding the
filters passed to db.select to include OR conditions for workspace/shared
membership (e.g., include workspace_id in a list of workspaces the user belongs
to or join against the memberships table), or call a helper that computes
accessible brain IDs and query with filters={"id": {"$in": accessible_ids}};
locate the logic in list_brains (and the use of get_current_brain and db.select)
and implement the appropriate filter/join so shared-workspace brains are
included.
In `@cloud/app/routes/sync.py`:
- Around line 36-105: The sync handler currently performs independent writes
(db.insert on "corrections","events","meta_rules", db.upsert on "lessons", and
db.update on "brains") which can produce partial/duplicate state on retries;
wrap all persistence in a single transaction (use the DB's transaction API, e.g.
db.transaction / begin/commit/rollback) so either all inserts/upserts and the
last_sync_at update succeed or all are rolled back, and add request-level
idempotency by accepting a stable sync identifier (e.g. body.sync_id or
request_id) and persist it with each row or into a "sync_requests" table (or add
a unique constraint+upsert key) to detect/reject duplicates so repeated requests
do not re-insert the same corrections/events/meta_rules; ensure
corrections/events/meta_rules insertion uses the idempotency key or unique
constraints and lessons continue to use upsert by description.
In `@cloud/tests/conftest.py`:
- Around line 30-31: The mock select method currently ignores the filters
parameter; update async def select(self, table: str, columns: str = "*",
filters: dict | None = None) to return only rows from
self._responses[table].get("select", []) that match the provided filters (if
filters is None return all rows). Implement simple key->value exact-match
filtering across each row dict (skip rows missing a filter key or with non-equal
values), and ensure safe handling when table or "select" key is absent by
returning an empty list. Use the existing select method and self._responses
identifiers when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6cd85c0d-36b7-4f51-a8fd-d4c2cc6f3d09
📒 Files selected for processing (8)
cloud/.env.examplecloud/app/config.pycloud/app/db.pycloud/app/main.pycloud/app/middleware.pycloud/app/routes/brains.pycloud/app/routes/sync.pycloud/tests/conftest.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
🪛 Ruff (0.15.9)
cloud/app/middleware.py
[warning] 26-26: Missing return type annotation for private function log_requests
(ANN202)
cloud/app/main.py
[warning] 14-14: Unused function argument: app
(ARG001)
cloud/app/routes/sync.py
[warning] 21-21: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
[error] 65-65: Ambiguous variable name: l
(E741)
[warning] 103-103: Use datetime.UTC alias
Convert to datetime.UTC alias
(UP017)
cloud/tests/conftest.py
[warning] 23-23: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 30-30: Unused method argument: columns
(ARG002)
[warning] 30-30: Unused method argument: filters
(ARG002)
[warning] 33-33: Unused method argument: table
(ARG002)
[warning] 38-38: Unused method argument: table
(ARG002)
[warning] 43-43: Unused method argument: table
(ARG002)
[warning] 43-43: Unused method argument: filters
(ARG002)
cloud/app/routes/brains.py
[warning] 31-31: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
[warning] 39-39: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
[warning] 47-47: Unnecessary assignment to rows before return statement
Remove unnecessary assignment
(RET504)
cloud/app/db.py
[warning] 18-18: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 73-73: Dynamically typed expressions (typing.Any) are disallowed in get_db
(ANN401)
[warning] 75-75: Using the global statement to update _client is discouraged
(PLW0603)
🔇 Additional comments (4)
cloud/.env.example (1)
2-9: Env template looks correct and complete.Supabase/app keys are present and ordered consistently; this addresses prior dotenv ordering noise.
cloud/app/config.py (1)
26-29: Settings caching fix is correctly applied.
get_settings()now returns a cached singleton instance, matching its contract.cloud/app/main.py (1)
28-35: Production docs/schema gating is correctly implemented.
docs_url,openapi_url, andredoc_urlare all disabled in production, and lifespan wiring is in place.cloud/app/db.py (1)
26-68: Supabase client request flow looks solid.Timeouts and
raise_for_status()are consistently applied across CRUD methods.
| @router.post("/brains/connect", response_model=ConnectResponse) | ||
| async def connect_brain( | ||
| body: ConnectRequest, | ||
| brain: dict = Depends(get_current_brain), | ||
| ) -> ConnectResponse: | ||
| """Register/reconnect a brain. Called by CloudClient.connect().""" | ||
| _log.info("Brain connected: %s (name=%s)", brain["id"], body.brain_name) | ||
| return ConnectResponse(brain_id=brain["id"]) |
There was a problem hiding this comment.
connect_brain can never register a new brain
This endpoint is documented as "Register/reconnect a brain", but its Depends(get_current_brain) dependency already requires a valid brain to exist in the database before the handler runs. For the API-key path, verify_api_key looks up brains by api_key and raises 401 if not found; for the JWT path, get_current_brain returns 404 if no brain exists for the user.
Because the handle_new_user() trigger only creates a workspace and a membership (not a brain), a brand-new user calling POST /api/v1/brains/connect will always receive a 404 before the endpoint body executes. There is no code path that creates the initial brain row.
The fix is to upsert the brain inside the handler instead of relying on the dependency to have already resolved one:
@router.post("/brains/connect", response_model=ConnectResponse)
async def connect_brain(
body: ConnectRequest,
credentials: HTTPAuthorizationCredentials = Security(_bearer),
) -> ConnectResponse:
user_id = verify_jwt(credentials.credentials)
db = get_db()
rows = await db.upsert("brains", {"user_id": user_id, "name": body.brain_name})
brain_id = rows[0]["id"]
return ConnectResponse(brain_id=brain_id)Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/routes/brains.py
Line: 28-35
Comment:
**`connect_brain` can never register a new brain**
This endpoint is documented as "Register/reconnect a brain", but its `Depends(get_current_brain)` dependency already requires a valid brain to exist in the database before the handler runs. For the API-key path, `verify_api_key` looks up `brains` by `api_key` and raises 401 if not found; for the JWT path, `get_current_brain` returns 404 if no brain exists for the user.
Because the `handle_new_user()` trigger only creates a workspace and a membership (not a brain), a brand-new user calling `POST /api/v1/brains/connect` will always receive a 404 before the endpoint body executes. There is no code path that creates the initial brain row.
The fix is to upsert the brain inside the handler instead of relying on the dependency to have already resolved one:
```python
@router.post("/brains/connect", response_model=ConnectResponse)
async def connect_brain(
body: ConnectRequest,
credentials: HTTPAuthorizationCredentials = Security(_bearer),
) -> ConnectResponse:
user_id = verify_jwt(credentials.credentials)
db = get_db()
rows = await db.upsert("brains", {"user_id": user_id, "name": body.brain_name})
brain_id = rows[0]["id"]
return ConnectResponse(brain_id=brain_id)
```
How can I resolve this? If you propose a fix, please make it concise.Same fixes as other branches: trailing newlines, import sorting, raise-from, collapsible-if, contextlib.suppress, pyright type ignore. Added ruff>=0.4 to dev deps and ignore rules for intentional patterns. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
src/gradata/_doctor.py (1)
236-264:⚠️ Potential issue | 🟠 MajorGuideline violation: Use logging instead of print() statements.
The function uses multiple
print()statements (lines 246, 247, 249, 253, 255, 257, 259, 260, 262, 263, 264), which violates the coding guideline forsrc/gradata/**/*.pythat requires using logging instead ofprint().While this is a diagnostic CLI tool designed for human-readable output, the guideline is explicit. Consider one of these alternatives:
- Refactor to use logging with a
StreamHandlerconfigured for stdout and appropriate formatters for the CLI output.- Move to a CLI-specific module outside
src/gradata/if print() is essential for this tool's UX.- Use
sys.stdout.write()if direct stdout output is required while maintaining a distinction from debug print statements.As per coding guidelines: "no print() statements (use logging)".
♻️ Example refactor using logging
def print_diagnosis(report: dict) -> None: - """Print human-readable diagnosis report.""" + """Print human-readable diagnosis report via logging.""" + import logging + logger = logging.getLogger(__name__) + status = report["status"] icons = {"ok": "+", "skip": "-", "missing": "!", "warn": "!", "fail": "X", "error": "X"} labels = { "healthy": "HEALTHY", "degraded": "DEGRADED", "broken": "BROKEN", } - print(f"\n Gradata Doctor — {labels.get(status, status.upper())}") - print(f" {'=' * 46}") + logger.info(f"\n Gradata Doctor — {labels.get(status, status.upper())}") + logger.info(f" {'=' * 46}") for check in report["checks"]: icon = icons.get(check["status"], "?") name = check["name"].replace("_", " ").title() detail = check["detail"] - print(f" [{icon}] {name:<25} {detail}") + logger.info(f" [{icon}] {name:<25} {detail}") - print() + logger.info("") if status == "healthy": - print(" All checks passed. Brain environment is ready.") + logger.info(" All checks passed. Brain environment is ready.") elif status == "degraded": - print(" Some optional dependencies missing. Core functionality may be limited.") - print(" Run the suggested pip install commands above to fix.") + logger.info(" Some optional dependencies missing. Core functionality may be limited.") + logger.info(" Run the suggested pip install commands above to fix.") else: - print(" Critical issues found. Brain cannot operate correctly.") - print(" Fix the [X] items above before continuing.") + logger.info(" Critical issues found. Brain cannot operate correctly.") + logger.info(" Fix the [X] items above before continuing.") - print() + logger.info("")Note: You'll need to configure logging with an appropriate handler in the CLI entry point to ensure output goes to stdout with the desired format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_doctor.py` around lines 236 - 264, The function print_diagnosis uses multiple print() calls which violate the project's "no print()" guideline; replace prints with logging (import logging, get a module logger via logging.getLogger(__name__)) and change each print(...) to logger.info(...) (use logger.error or logger.warning where appropriate for non-healthy statuses), preserving the exact formatted strings and the use of report/status/icons/labels in print_diagnosis; do not configure handlers here—ensure the CLI entry point configures a StreamHandler that writes to stdout with the desired format so output remains human-readable (alternatively, move print_diagnosis to a CLI-only module if direct stdout semantics are required).src/gradata/enhancements/self_improvement.py (1)
748-755: 🧹 Nitpick | 🔵 TrivialUpdate the comment to match the implementation.
The comment at line 748 states "ONE_OFF scoped lessons never graduate past INSTINCT", but the code at line 754 blocks promotion for lessons in both
INSTINCTandPATTERNstates. This is correct defensive behavior (handles cases where a one_off lesson might already be in PATTERN state due to manual intervention or legacy data), but the comment should reflect this.📝 Suggested comment update
- # ONE_OFF scoped lessons never graduate past INSTINCT + # ONE_OFF scoped lessons never promote beyond their current state (INSTINCT or PATTERN) block_promotion = False if lesson.scope_json:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/self_improvement.py` around lines 748 - 755, Update the comment above the block_promotion logic to accurately describe current behavior: it should state that ONE_OFF (correction_scope == "one_off") lessons are prevented from being promoted when they are in either LessonState.INSTINCT or LessonState.PATTERN (to defensively handle cases where a one_off lesson may already be in PATTERN due to manual/legacy changes), matching the condition that checks lesson.scope_json, parses _scope, and tests _scope.get("correction_scope") == "one_off" and lesson.state in (LessonState.INSTINCT, LessonState.PATTERN).src/gradata/_fact_extractor.py (1)
271-271:⚠️ Potential issue | 🟡 MinorAdd the required future import for type safety.
The type annotation
BrainContext | Noneuses the union syntax that requires Python 3.10+ without the future import. To ensure compatibility with Python 3.9+ and maintain consistency with the codebase, add the required import at the top of the file.As per coding guidelines: "from __future__ import annotations required" for files matching
src/gradata/**/*.py.🔧 Proposed fix
Add this import at the beginning of the file (after the docstring, before other imports):
""" Structured fact extraction pipeline. ===================================== Reads brain entity files (prospects/, candidates/, etc.) and extracts structured facts. Domain-agnostic: entity directory and fact types are configurable. """ +from __future__ import annotations import json🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_fact_extractor.py` at line 271, The function get_stats uses the PEP 604 union syntax (BrainContext | None) which requires postponed evaluation of annotations for Python 3.9 compatibility; add the required future import by inserting from __future__ import annotations at the top of this module (immediately after the module docstring and before any other imports) so type annotations like BrainContext | None are safe across supported Python versions.src/gradata/_context_compile.py (3)
8-14:⚠️ Potential issue | 🔴 CriticalMissing required import for type safety.
The file is missing
from __future__ import annotationsat the top, which is required for type safety. As per coding guidelines, this import is mandatory for all files insrc/gradata/**/*.py.🔧 Proposed fix
""" Context Compiler. =================== Extracts entities from a user message, queries the brain, returns formatted context injection. """ +from __future__ import annotations + import re from typing import TYPE_CHECKING🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_context_compile.py` around lines 8 - 14, Add the required future import at the top of the gradata._context_compile module: insert "from __future__ import annotations" as the very first statement in src/gradata/_context_compile.py (before any other imports) to enable postponed evaluation of annotations for type safety and to comply with the project coding guideline.
90-99:⚠️ Potential issue | 🟠 MajorPass the
ctxparameter tobuild_packet.The
build_packetfunction accepts actx: BrainContext | Noneparameter (defined insrc/gradata/_context_packet.pyline 342) and uses it in multiple helper function calls (_load_user_scope,_load_prospect_context,_load_drafting_context,_load_debug_context,_load_audit_context,_load_wrapup_context). The call on lines 92-96 does not pass thectxparameter that is available in thecompile_contextfunction, which breaks the expected data access pattern.Update needed
packet = build_packet( prospect=entities["prospect"], task_type=entities["task_type"], topic=entities["topic"], ctx=ctx, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_context_compile.py` around lines 90 - 99, The call to build_packet inside compile_context is missing the ctx argument, which prevents build_packet (and its helpers like _load_user_scope, _load_prospect_context, _load_drafting_context, _load_debug_context, _load_audit_context, _load_wrapup_context) from receiving the BrainContext; update the build_packet invocation in src/gradata/_context_compile.py to pass ctx=ctx along with prospect, task_type, and topic so the packet construction uses the provided context.
76-88:⚠️ Potential issue | 🟡 MinorPass
ctxparameter tobrain_searchcall on line 78.The
compile_contextfunction receives actxparameter and correctly passes it toextract_entitieson line 69, but fails to pass it tobrain_searchon line 78. Sincebrain_searchacceptsctx: "BrainContext | None"and internally uses it for database access, it should receive the context parameter here as well for consistency and proper context handling.Suggested fix:
results = brain_search(message[:100], top_k=3, mode="keyword", ctx=ctx)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_context_compile.py` around lines 76 - 88, The brain_search call in compile_context does not pass the ctx parameter, so brain_search (which accepts ctx: "BrainContext | None") won't receive the request context; update the call in src/gradata/_context_compile.py where brain_search is invoked (currently: brain_search(message[:100], top_k=3, mode="keyword")) to include the ctx argument (i.e., pass ctx=ctx) so it becomes brain_search(message[:100], top_k=3, mode="keyword", ctx=ctx), ensuring consistency with extract_entities and correct DB/context usage.src/gradata/contrib/patterns/human_loop.py (1)
496-512:⚠️ Potential issue | 🟠 MajorThe
contextparameter is unused — risk assessment ignores caller-supplied context.The
checkmethod accepts acontextparameter (line 499) but never uses it. Line 503 callsgate(action)without passing context, which meansgatewill internally callassess_risk(action)without context. Any caller passingcontext(e.g., to setrisk_overrideor specify affected entities) will have it silently ignored.🔧 Proposed fix to honor the context parameter
def check( self, action: str, context: dict | None = None, approver: Callable | None = None, ) -> ApprovalResult: """Full gate check: assess risk, request approval if needed.""" - request = gate(action) + risk = assess_risk(action, context) + request = gate(action, risk=risk) if request is None: return ApprovalResult( approved=True, feedback="auto_approved_low_risk" ) if approver is not None: return approver(request) return ApprovalResult( approved=False, feedback="requires_human_review" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/contrib/patterns/human_loop.py` around lines 496 - 512, The check method’s context parameter is ignored — update check to forward the caller-supplied context into the risk request path by calling gate with the context (e.g., gate(action, context) or gate(action, context=context) to match gate's signature) so that assess_risk receives the context and risk_override/affected-entities are honored; also ensure any downstream use of request (and approver(request)) remains compatible with the request object that now may include context-derived fields.src/gradata/onboard.py (2)
306-306: 🧹 Nitpick | 🔵 TrivialReplace print() statements with logging.
The file contains multiple
print()calls for interactive wizard output. As per coding guidelines, the core SDK should use logging instead of print() statements. Consider using a logger with a StreamHandler for CLI output while maintaining programmatic access.Example refactor pattern
import logging logger = logging.getLogger(__name__) # In interactive sections: logger.info("=== Gradata — Onboarding Wizard ===") logger.info("Brain '%s' created at %s", name, brain_dir)Configure a StreamHandler in CLI entry points to display these messages to users.
As per coding guidelines: "This is the core SDK. Check for: type safety (from future import annotations required), no print() statements (use logging)".
Also applies to: 324-327, 334-334, 339-340, 347-348, 477-503
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/onboard.py` at line 306, Replace all print() calls in the onboarding interactive code with logging calls: add "import logging" and a module logger (logger = logging.getLogger(__name__)) and change prints like print("\n=== Gradata — Onboarding Wizard ===\n") to logger.info("=== Gradata — Onboarding Wizard ===") and other user-facing prints to logger.info/logger.warning as appropriate; keep programmatic return values unchanged and ensure the CLI entrypoint configures a StreamHandler to emit logger messages to stdout/stderr so interactive output still appears in the terminal (update functions/methods in this module that contain print() calls such as the onboarding wizard routines to use logger.* instead).
237-247:⚠️ Potential issue | 🟡 MinorAccept optional
BrainContextparameter for consistency with DB access pattern.The
_create_db()function performs database initialization without accepting aBrainContextparameter, which deviates from the established codebase pattern. Other DB access functions throughout the SDK (e.g.,_get_db(),get_stats(),validate_brain()) accept an optionalctx: "BrainContext | None" = Noneparameter for dependency injection and multi-brain support.While this is initialization code that runs before
BrainContextwould normally exist, adding optional context support would improve consistency:def _create_db(db_path: Path, ctx: "BrainContext | None" = None) -> None:This allows future multi-brain scenarios and aligns with the SDK's dependency injection pattern without breaking the current onboarding flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/onboard.py` around lines 237 - 247, Update the _create_db function signature to accept an optional BrainContext parameter for consistency with other DB functions: change def _create_db(db_path: Path) -> None: to def _create_db(db_path: Path, ctx: "BrainContext | None" = None) -> None; keep the implementation unchanged (it doesn't need to use ctx now), add the forward-ref type annotation (string literal) and ensure imports/type hints follow the project's pattern (no runtime dependency), and update any callers to pass their ctx where available (or rely on the default None) so the API matches _get_db, get_stats, validate_brain.src/gradata/brain.py (1)
669-680:⚠️ Potential issue | 🔴 CriticalAvoid acquiring
lessons_lock()twice in_resolve_pending().
write_lessons_safe()takeslessons_lock()internally. Calling it while the outerwith lessons_lock(lessons_path):is still active can block on your own lock and time out instead of resolving the approval.🔓 Proposed fix
with lessons_lock(lessons_path): lessons = parse_lessons(lessons_path.read_text(encoding="utf-8")) matched = False for lesson in lessons: if (lesson.category == cat and lesson.description[:100] == desc[:100] and lesson.pending_approval): mutator(lesson) matched = True break if matched: - from gradata._db import write_lessons_safe - write_lessons_safe(lessons_path, format_lessons(lessons)) + lessons_path.write_text(format_lessons(lessons), encoding="utf-8")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain.py` around lines 669 - 680, In _resolve_pending(), don't call write_lessons_safe() while holding lessons_lock() because write_lessons_safe() acquires the same lock internally; instead, inside the with lessons_lock(lessons_path): find the matching lesson, call mutator(lesson), set matched=True and compute formatted = format_lessons(lessons) (but do not call write_lessons_safe), then exit the with block and only if matched call write_lessons_safe(lessons_path, formatted). This avoids re-acquiring lessons_lock() and potential self-deadlock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/_encryption.py`:
- Line 119: The RuntimeError raised in _encryption.py should be converted to a
custom exception for clarity and static-analysis friendliness: define a
module-level class EncryptionError(RuntimeError) with a short docstring, then
replace the current raise RuntimeError("Encryption produced empty or missing
file — plaintext preserved") with raise EncryptionError("Encryption produced
empty or missing file — plaintext preserved") (reference the module-level
EncryptionError class and the raise in the encryption routine).
In `@src/gradata/benchmarks/swe_bench.py`:
- Around line 268-271: The multiline ImportError raised in the SWE-bench loader
should be shortened to a single-line message to satisfy Ruff TRY003 while
preserving exception chaining; locate the raise ImportError in
src.gradata.benchmarks.swe_bench (the raise ImportError ... from e statement)
and replace the multi-line string with a concise one-line message such as
"datasets package required; pip install datasets" (or raise a small custom
exception) but keep the "from e" suffix so the original exception is chained.
In `@src/gradata/brain.py`:
- Around line 356-364: The call to self.emit in brain.patch_rule can raise and
make the caller think the patch failed even though
write_lessons_safe(format_lessons(lessons)) already persisted changes; to fix,
wrap the self.emit(...) call in a try/except that catches all exceptions, logs
the emit failure (including exception details) via the module/class logger
(e.g., self.logger.error) or a safe logging helper, and does not re-raise so the
successful patch is not reported as failed; keep the same payload and tags but
ensure any emit failure is non-fatal.
- Around line 342-356: The patch_rule implementation currently reads and mutates
lessons.md before acquiring the lessons_lock, risking races; change patch_rule
to acquire the lessons_lock() for the entire read→mutate→write cycle: inside the
lessons_lock context call self._find_lessons_path(), validate the path, read
file text, parse_lessons(...), call apply_patch(...), and if patched call
write_lessons_safe(...) with format_lessons(...), and return the same
error/success dicts while still inside the lock so no external writer can
interleave between parse/apply/write; keep helper calls (parse_lessons,
apply_patch, format_lessons, write_lessons_safe) and early-return semantics
identical.
In `@src/gradata/contrib/patterns/orchestrator.py`:
- Around line 526-528: Define a Protocol (e.g., SpawnQueueProtocol) that
declares a spawn_queue method with the exact signature used (e.g., def
spawn_queue(self, *, tasks: Sequence[Task], worker: WorkerType, max_concurrent:
int) -> Any), update the brain parameter/type from object | None to
Optional[SpawnQueueProtocol] (and add/replace any WorkerType/Task types as
appropriate), replace the unsafe cast/_sq assignment so mypy knows _sq is the
callable (e.g., _sq: Callable[..., Any] = brain.spawn_queue) and remove the
type: ignore[union-attr] on that line; ensure imports from typing (Protocol,
Optional, Callable) are added and adjust surrounding annotations accordingly so
spawn_queue is fully typed and type-checking passes.
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 426-430: The current check treats any 3+ word sentence starting
with a capitalized token as actionable, letting declarative sentences slip
through; update the logic so only explicit prefix-rule archetypes (e.g.,
PREFIX_INSTRUCTION / Archetype.PREFIX_INSTRUCTION) or clear imperative starters
(first_word in _IMPERATIVE_STARTERS) mark an instruction actionable—remove the
capitalization+word-count heuristic (the return using instruction[0].isupper()
and len(instruction.split()) >= 3) and only return True for the known
verbatim-rule case or when first_word matches _IMPERATIVE_STARTERS in the same
function where first_word, _IMPERATIVE_STARTERS and PREFIX_INSTRUCTION are
referenced.
- Around line 33-37: The fallback _FACTUAL_RE in the try/except should be
expanded to match the same common factual tokens the real
gradata.enhancements.edit_classifier._FACTUAL_RE recognizes (ISO dates,
percentages, and dollar-prefixed amounts) so REPLACEMENT_FACTUAL and downstream
archetype logic behave the same when the import fails; replace the current
fallback regex with a combined pattern that keeps the existing numeric/date/URL
matches and adds ISO dates (e.g. YYYY-MM-DD), percent values (e.g. 12% or 12.5%)
and dollar amounts (e.g. $123.45), ensuring the new regex is assigned to
_FACTUAL_RE in the except block referenced in the try/except around the import
of gradata.enhancements.edit_classifier._FACTUAL_RE.
In `@src/gradata/enhancements/rule_evolution.py`:
- Around line 281-284: Cache the result of calling _detect_opposite_direction to
avoid calling it twice: compute a local boolean (e.g., is_opposite =
_detect_opposite_direction(new_desc, best_rule.description)) once when best_rule
is not None and reuse it in the two conditionals that return
RuleRelation.UPDATES and RuleRelation.EXTENDS, keeping the comparisons against
best_similarity, update_threshold, and extend_threshold unchanged and
referencing best_rule, best_similarity, update_threshold, extend_threshold,
new_desc, and _detect_opposite_direction.
In `@src/gradata/hooks/_base.py`:
- Around line 51-57: The resolve_brain_dir helper currently falls back to a
hardcoded path (~/.gradata/brain); remove that fallback so it only uses
environment variables and validates them. Update resolve_brain_dir to read
GRADATA_BRAIN_DIR or BRAIN_DIR, return the path string if the env var is set and
exists (else None), and eliminate any use of Path.home() or the default
".gradata/brain" logic so no hardcoded filesystem path remains. Ensure the
function signature and return types (resolve_brain_dir) are preserved.
In `@src/gradata/integrations/crewai_adapter.py`:
- Around line 101-104: The exception handlers in get_rules() (where it returns
self.brain.apply_brain_rules(task, context)) and in search() currently swallow
errors silently; update those except Exception blocks to log the exception
details using the same pattern as save() (e.g., logger.debug or
logger.exception) so failures include context and the exception info;
specifically modify the except blocks surrounding
self.brain.apply_brain_rules(...) and the search() implementation to call
logger.debug/exception with a brief message and the caught exception.
In `@src/gradata/integrations/embeddings.py`:
- Around line 158-160: The handlers registered with bus.on (those with
async_handler=True) call _embed_and_cache which mutates the shared dict
_embedding_cache without synchronization, causing races; fix by adding a
module-level lock (e.g., _embedding_cache_lock = threading.Lock()) and acquire
it around all mutations of _embedding_cache inside _embed_and_cache (use a
context manager to ensure release on exceptions), and ensure any other code
paths that write to _embedding_cache also use the same lock so all concurrent
async_handler invocations are serialized.
In `@src/gradata/integrations/langchain_adapter.py`:
- Line 113: The clear() method contains an unnecessary trailing "pass" after its
docstring which triggers Ruff PIE790; remove the redundant "pass" from the
clear() method (leave the existing docstring in place) so the function body is
just the docstring and no no-op statement remains.
---
Outside diff comments:
In `@src/gradata/_context_compile.py`:
- Around line 8-14: Add the required future import at the top of the
gradata._context_compile module: insert "from __future__ import annotations" as
the very first statement in src/gradata/_context_compile.py (before any other
imports) to enable postponed evaluation of annotations for type safety and to
comply with the project coding guideline.
- Around line 90-99: The call to build_packet inside compile_context is missing
the ctx argument, which prevents build_packet (and its helpers like
_load_user_scope, _load_prospect_context, _load_drafting_context,
_load_debug_context, _load_audit_context, _load_wrapup_context) from receiving
the BrainContext; update the build_packet invocation in
src/gradata/_context_compile.py to pass ctx=ctx along with prospect, task_type,
and topic so the packet construction uses the provided context.
- Around line 76-88: The brain_search call in compile_context does not pass the
ctx parameter, so brain_search (which accepts ctx: "BrainContext | None") won't
receive the request context; update the call in src/gradata/_context_compile.py
where brain_search is invoked (currently: brain_search(message[:100], top_k=3,
mode="keyword")) to include the ctx argument (i.e., pass ctx=ctx) so it becomes
brain_search(message[:100], top_k=3, mode="keyword", ctx=ctx), ensuring
consistency with extract_entities and correct DB/context usage.
In `@src/gradata/_doctor.py`:
- Around line 236-264: The function print_diagnosis uses multiple print() calls
which violate the project's "no print()" guideline; replace prints with logging
(import logging, get a module logger via logging.getLogger(__name__)) and change
each print(...) to logger.info(...) (use logger.error or logger.warning where
appropriate for non-healthy statuses), preserving the exact formatted strings
and the use of report/status/icons/labels in print_diagnosis; do not configure
handlers here—ensure the CLI entry point configures a StreamHandler that writes
to stdout with the desired format so output remains human-readable
(alternatively, move print_diagnosis to a CLI-only module if direct stdout
semantics are required).
In `@src/gradata/_fact_extractor.py`:
- Line 271: The function get_stats uses the PEP 604 union syntax (BrainContext |
None) which requires postponed evaluation of annotations for Python 3.9
compatibility; add the required future import by inserting from __future__
import annotations at the top of this module (immediately after the module
docstring and before any other imports) so type annotations like BrainContext |
None are safe across supported Python versions.
In `@src/gradata/brain.py`:
- Around line 669-680: In _resolve_pending(), don't call write_lessons_safe()
while holding lessons_lock() because write_lessons_safe() acquires the same lock
internally; instead, inside the with lessons_lock(lessons_path): find the
matching lesson, call mutator(lesson), set matched=True and compute formatted =
format_lessons(lessons) (but do not call write_lessons_safe), then exit the with
block and only if matched call write_lessons_safe(lessons_path, formatted). This
avoids re-acquiring lessons_lock() and potential self-deadlock.
In `@src/gradata/contrib/patterns/human_loop.py`:
- Around line 496-512: The check method’s context parameter is ignored — update
check to forward the caller-supplied context into the risk request path by
calling gate with the context (e.g., gate(action, context) or gate(action,
context=context) to match gate's signature) so that assess_risk receives the
context and risk_override/affected-entities are honored; also ensure any
downstream use of request (and approver(request)) remains compatible with the
request object that now may include context-derived fields.
In `@src/gradata/enhancements/self_improvement.py`:
- Around line 748-755: Update the comment above the block_promotion logic to
accurately describe current behavior: it should state that ONE_OFF
(correction_scope == "one_off") lessons are prevented from being promoted when
they are in either LessonState.INSTINCT or LessonState.PATTERN (to defensively
handle cases where a one_off lesson may already be in PATTERN due to
manual/legacy changes), matching the condition that checks lesson.scope_json,
parses _scope, and tests _scope.get("correction_scope") == "one_off" and
lesson.state in (LessonState.INSTINCT, LessonState.PATTERN).
In `@src/gradata/onboard.py`:
- Line 306: Replace all print() calls in the onboarding interactive code with
logging calls: add "import logging" and a module logger (logger =
logging.getLogger(__name__)) and change prints like print("\n=== Gradata —
Onboarding Wizard ===\n") to logger.info("=== Gradata — Onboarding Wizard ===")
and other user-facing prints to logger.info/logger.warning as appropriate; keep
programmatic return values unchanged and ensure the CLI entrypoint configures a
StreamHandler to emit logger messages to stdout/stderr so interactive output
still appears in the terminal (update functions/methods in this module that
contain print() calls such as the onboarding wizard routines to use logger.*
instead).
- Around line 237-247: Update the _create_db function signature to accept an
optional BrainContext parameter for consistency with other DB functions: change
def _create_db(db_path: Path) -> None: to def _create_db(db_path: Path, ctx:
"BrainContext | None" = None) -> None; keep the implementation unchanged (it
doesn't need to use ctx now), add the forward-ref type annotation (string
literal) and ensure imports/type hints follow the project's pattern (no runtime
dependency), and update any callers to pass their ctx where available (or rely
on the default None) so the API matches _get_db, get_stats, validate_brain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c65400f4-2d73-4f7c-b3ae-0e648dc0d7d4
📒 Files selected for processing (94)
pyproject.tomlsrc/gradata/__init__.pysrc/gradata/_brain_manifest.pysrc/gradata/_context_compile.pysrc/gradata/_context_packet.pysrc/gradata/_core.pysrc/gradata/_db.pysrc/gradata/_doctor.pysrc/gradata/_embed.pysrc/gradata/_encryption.pysrc/gradata/_fact_extractor.pysrc/gradata/_installer.pysrc/gradata/_manifest_helpers.pysrc/gradata/_manifest_quality.pysrc/gradata/_math.pysrc/gradata/_migrations.pysrc/gradata/_paths.pysrc/gradata/_query.pysrc/gradata/_stats.pysrc/gradata/_tag_taxonomy.pysrc/gradata/_validator.pysrc/gradata/audit.pysrc/gradata/benchmarks/swe_bench.pysrc/gradata/brain.pysrc/gradata/brain_inspection.pysrc/gradata/cli.pysrc/gradata/context_wrapper.pysrc/gradata/contrib/enhancements/eval_benchmark.pysrc/gradata/contrib/enhancements/install_manifest.pysrc/gradata/contrib/enhancements/quality_gates.pysrc/gradata/contrib/patterns/context_brackets.pysrc/gradata/contrib/patterns/evaluator.pysrc/gradata/contrib/patterns/execute_qualify.pysrc/gradata/contrib/patterns/guardrails.pysrc/gradata/contrib/patterns/human_loop.pysrc/gradata/contrib/patterns/loop_detection.pysrc/gradata/contrib/patterns/middleware.pysrc/gradata/contrib/patterns/orchestrator.pysrc/gradata/contrib/patterns/parallel.pysrc/gradata/contrib/patterns/pipeline.pysrc/gradata/contrib/patterns/q_learning_router.pysrc/gradata/contrib/patterns/reconciliation.pysrc/gradata/contrib/patterns/reflection.pysrc/gradata/contrib/patterns/task_escalation.pysrc/gradata/contrib/patterns/tree_of_thoughts.pysrc/gradata/correction_detector.pysrc/gradata/daemon.pysrc/gradata/detection/__init__.pysrc/gradata/detection/addition_pattern.pysrc/gradata/detection/correction_conflict.pysrc/gradata/detection/mode_classifier.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/enhancements/cluster_manager.pysrc/gradata/enhancements/edit_classifier.pysrc/gradata/enhancements/git_backfill.pysrc/gradata/enhancements/instruction_cache.pysrc/gradata/enhancements/lesson_discriminator.pysrc/gradata/enhancements/memory_taxonomy.pysrc/gradata/enhancements/meta_rules.pysrc/gradata/enhancements/meta_rules_storage.pysrc/gradata/enhancements/observation_hooks.pysrc/gradata/enhancements/pattern_extractor.pysrc/gradata/enhancements/pattern_integration.pysrc/gradata/enhancements/quality_monitoring.pysrc/gradata/enhancements/reporting.pysrc/gradata/enhancements/router_warmstart.pysrc/gradata/enhancements/rule_evolution.pysrc/gradata/enhancements/rule_verifier.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/enhancements/similarity.pysrc/gradata/enhancements/super_meta_rules.pysrc/gradata/events_bus.pysrc/gradata/graph.pysrc/gradata/hooks/_base.pysrc/gradata/hooks/auto_correct.pysrc/gradata/hooks/claude_code.pysrc/gradata/inspection.pysrc/gradata/integrations/anthropic_adapter.pysrc/gradata/integrations/crewai_adapter.pysrc/gradata/integrations/embeddings.pysrc/gradata/integrations/langchain_adapter.pysrc/gradata/integrations/openai_adapter.pysrc/gradata/mcp_server.pysrc/gradata/mcp_tools.pysrc/gradata/onboard.pysrc/gradata/rules/__init__.pysrc/gradata/rules/rule_engine.pysrc/gradata/rules/rule_graph.pysrc/gradata/rules/rule_tracker.pysrc/gradata/rules/scope.pysrc/gradata/security/__init__.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/manifest_signing.pysrc/gradata/sidecar/watcher.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/enhancements/instruction_cache.pysrc/gradata/inspection.pysrc/gradata/enhancements/git_backfill.pysrc/gradata/contrib/patterns/parallel.pysrc/gradata/contrib/enhancements/install_manifest.pysrc/gradata/_core.pysrc/gradata/contrib/patterns/guardrails.pysrc/gradata/enhancements/pattern_integration.pysrc/gradata/brain_inspection.pysrc/gradata/contrib/patterns/evaluator.pysrc/gradata/contrib/patterns/pipeline.pysrc/gradata/enhancements/edit_classifier.pysrc/gradata/_paths.pysrc/gradata/_installer.pysrc/gradata/_embed.pysrc/gradata/sidecar/watcher.pysrc/gradata/enhancements/cluster_manager.pysrc/gradata/_validator.pysrc/gradata/enhancements/memory_taxonomy.pysrc/gradata/enhancements/lesson_discriminator.pysrc/gradata/_doctor.pysrc/gradata/security/correction_provenance.pysrc/gradata/detection/__init__.pysrc/gradata/detection/mode_classifier.pysrc/gradata/_context_packet.pysrc/gradata/detection/addition_pattern.pysrc/gradata/contrib/patterns/middleware.pysrc/gradata/contrib/patterns/loop_detection.pysrc/gradata/events_bus.pysrc/gradata/_encryption.pysrc/gradata/_query.pysrc/gradata/audit.pysrc/gradata/contrib/patterns/q_learning_router.pysrc/gradata/_stats.pysrc/gradata/integrations/anthropic_adapter.pysrc/gradata/enhancements/super_meta_rules.pysrc/gradata/detection/correction_conflict.pysrc/gradata/security/__init__.pysrc/gradata/rules/scope.pysrc/gradata/correction_detector.pysrc/gradata/contrib/patterns/context_brackets.pysrc/gradata/_tag_taxonomy.pysrc/gradata/contrib/patterns/orchestrator.pysrc/gradata/enhancements/meta_rules.pysrc/gradata/integrations/langchain_adapter.pysrc/gradata/contrib/enhancements/quality_gates.pysrc/gradata/daemon.pysrc/gradata/_manifest_helpers.pysrc/gradata/contrib/patterns/human_loop.pysrc/gradata/graph.pysrc/gradata/_fact_extractor.pysrc/gradata/hooks/auto_correct.pysrc/gradata/contrib/patterns/tree_of_thoughts.pysrc/gradata/cli.pysrc/gradata/enhancements/router_warmstart.pysrc/gradata/enhancements/rule_verifier.pysrc/gradata/rules/rule_graph.pysrc/gradata/integrations/crewai_adapter.pysrc/gradata/rules/rule_tracker.pysrc/gradata/contrib/patterns/reflection.pysrc/gradata/mcp_server.pysrc/gradata/contrib/patterns/execute_qualify.pysrc/gradata/integrations/embeddings.pysrc/gradata/_db.pysrc/gradata/integrations/openai_adapter.pysrc/gradata/hooks/claude_code.pysrc/gradata/enhancements/pattern_extractor.pysrc/gradata/enhancements/similarity.pysrc/gradata/onboard.pysrc/gradata/security/manifest_signing.pysrc/gradata/_math.pysrc/gradata/enhancements/rule_evolution.pysrc/gradata/mcp_tools.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/contrib/patterns/task_escalation.pysrc/gradata/enhancements/observation_hooks.pysrc/gradata/enhancements/reporting.pysrc/gradata/contrib/enhancements/eval_benchmark.pysrc/gradata/__init__.pysrc/gradata/rules/rule_engine.pysrc/gradata/enhancements/meta_rules_storage.pysrc/gradata/benchmarks/swe_bench.pysrc/gradata/brain.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/_context_compile.pysrc/gradata/contrib/patterns/reconciliation.pysrc/gradata/hooks/_base.pysrc/gradata/context_wrapper.pysrc/gradata/_brain_manifest.pysrc/gradata/enhancements/quality_monitoring.pysrc/gradata/_migrations.pysrc/gradata/rules/__init__.pysrc/gradata/_manifest_quality.py
src/gradata/integrations/**
⚙️ CodeRabbit configuration file
src/gradata/integrations/**: Framework integrations and nervous system modules. Check for: SSRF protection on any URL from env vars,
thread safety on shared state, embedding cache must be bounded, event bus handlers must never raise.
Files:
src/gradata/integrations/anthropic_adapter.pysrc/gradata/integrations/langchain_adapter.pysrc/gradata/integrations/crewai_adapter.pysrc/gradata/integrations/embeddings.pysrc/gradata/integrations/openai_adapter.py
src/gradata/hooks/**
⚙️ CodeRabbit configuration file
src/gradata/hooks/**: JavaScript hooks for Claude Code integration. Check for: no shell injection (no execSync with user
input), temp files must use per-user subdirectory, HTTP calls must have timeouts, errors must be silent (never block
the tool chain).
Files:
src/gradata/hooks/auto_correct.pysrc/gradata/hooks/claude_code.pysrc/gradata/hooks/_base.py
🪛 Ruff (0.15.9)
src/gradata/_encryption.py
[warning] 119-119: Avoid specifying long messages outside the exception class
(TRY003)
src/gradata/enhancements/meta_rules.py
[warning] 497-497: Avoid specifying long messages outside the exception class
(TRY003)
src/gradata/integrations/langchain_adapter.py
[warning] 113-113: Unnecessary pass statement
Remove unnecessary pass
(PIE790)
src/gradata/_db.py
[warning] 96-98: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 108-110: Avoid specifying long messages outside the exception class
(TRY003)
src/gradata/__init__.py
[warning] 145-145: Avoid specifying long messages outside the exception class
(TRY003)
src/gradata/benchmarks/swe_bench.py
[warning] 268-271: Avoid specifying long messages outside the exception class
(TRY003)
src/gradata/brain.py
[warning] 966-966: Avoid specifying long messages outside the exception class
(TRY003)
src/gradata/hooks/_base.py
[warning] 39-39: Do not catch blind exception: Exception
(BLE001)
[warning] 69-69: Missing return type annotation for private function get_brain
(ANN202)
[warning] 80-80: Do not catch blind exception: Exception
(BLE001)
[warning] 96-96: Do not catch blind exception: Exception
(BLE001)
| else: | ||
| tmp_path.unlink(missing_ok=True) | ||
| raise RuntimeError("Encryption produced empty or missing file — plaintext preserved") | ||
| raise RuntimeError("Encryption produced empty or missing file — plaintext preserved") |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LGTM — formatting fix applied.
The trailing newline addition aligns with codebase style conventions and has no functional impact.
Optional: The static analysis tool flags TRY003 here, suggesting that the exception message could be extracted to a custom exception class for better maintainability. This is a minor style improvement and not required.
♻️ Optional refactor (custom exception class)
Define a custom exception at the module level:
class EncryptionError(RuntimeError):
"""Raised when encryption operations fail."""
passThen update line 119:
- raise RuntimeError("Encryption produced empty or missing file — plaintext preserved")
+ raise EncryptionError("Encryption produced empty or missing file — plaintext preserved")🧰 Tools
🪛 Ruff (0.15.9)
[warning] 119-119: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_encryption.py` at line 119, The RuntimeError raised in
_encryption.py should be converted to a custom exception for clarity and
static-analysis friendliness: define a module-level class
EncryptionError(RuntimeError) with a short docstring, then replace the current
raise RuntimeError("Encryption produced empty or missing file — plaintext
preserved") with raise EncryptionError("Encryption produced empty or missing
file — plaintext preserved") (reference the module-level EncryptionError class
and the raise in the encryption routine).
| raise ImportError( | ||
| "SWE-bench data loading requires the 'datasets' package.\n" | ||
| "Install with: pip install datasets" | ||
| ) | ||
| ) from e |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Shorten inline ImportError message to clear Ruff TRY003
Line 268-271 still triggers TRY003. Keep cause chaining, but move to a concise one-line message (or custom exception) to satisfy lint cleanly.
Proposed tweak
- raise ImportError(
- "SWE-bench data loading requires the 'datasets' package.\n"
- "Install with: pip install datasets"
- ) from e
+ raise ImportError(
+ "Missing optional dependency 'datasets'. Install with: pip install datasets."
+ ) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise ImportError( | |
| "SWE-bench data loading requires the 'datasets' package.\n" | |
| "Install with: pip install datasets" | |
| ) | |
| ) from e | |
| raise ImportError( | |
| "Missing optional dependency 'datasets'. Install with: pip install datasets." | |
| ) from e |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 268-271: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/benchmarks/swe_bench.py` around lines 268 - 271, The multiline
ImportError raised in the SWE-bench loader should be shortened to a single-line
message to satisfy Ruff TRY003 while preserving exception chaining; locate the
raise ImportError in src.gradata.benchmarks.swe_bench (the raise ImportError ...
from e statement) and replace the multi-line string with a concise one-line
message such as "datasets package required; pip install datasets" (or raise a
small custom exception) but keep the "from e" suffix so the original exception
is chained.
| if brain and hasattr(brain, "spawn_queue"): | ||
| _sq = getattr(brain, "spawn_queue") | ||
| _sq = brain.spawn_queue # type: ignore[union-attr] | ||
| result = _sq(tasks=tasks, worker=worker, max_concurrent=max_concurrent) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
# First, let's check if the file exists and read the relevant lines
head -30 src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 1481
🏁 Script executed:
# Get the line count to understand file size
wc -l src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 107
🏁 Script executed:
# Read the area around lines 526-528
sed -n '500,550p' src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 1823
🏁 Script executed:
# Check imports at the top of the file
sed -n '1,50p' src/gradata/contrib/patterns/orchestrator.py | grep -E "^from|^import|^__future__"Repository: Gradata/gradata
Length of output: 93
🏁 Script executed:
# Get the function signature and definition
sed -n '480,530p' src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 1967
🏁 Script executed:
# Check for any existing type hints or BrainContext references in the file
rg -n "BrainContext|Protocol|SupportsSpawnQueue" src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Check what's imported at the top, looking for typing imports
sed -n '1,100p' src/gradata/contrib/patterns/orchestrator.py | grep -E "^from typing|^import typing|from collections|Protocol"Repository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Check if there are print() statements in the file (should use logging per guidelines)
rg -n "print\(" src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 494
🏁 Script executed:
# Check if logging is used appropriately
rg -n "logging\.|import logging" src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 136
🏁 Script executed:
# Let's check the context around those print statements to see if they're in docstrings
sed -n '35,50p' src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 506
🏁 Script executed:
# Check the entire imports section thoroughly
head -80 src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 3110
🏁 Script executed:
# Check if there are any hardcoded paths
rg -n "\"\/|'\/|pathlib|Path\(" src/gradata/contrib/patterns/orchestrator.py | head -20Repository: Gradata/gradata
Length of output: 148
🏁 Script executed:
# Check the full typing imports to understand what's available
sed -n '1,15p' src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 730
🏁 Script executed:
# Check if BrainContext is defined in the codebase
rg -n "class BrainContext|BrainContext" --type py | head -20Repository: Gradata/gradata
Length of output: 1873
🏁 Script executed:
# Look at the rest of the imports to understand the typing situation
sed -n '1,120p' src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 4339
🏁 Script executed:
# Check if there are any type annotations issues or if Protocol is already imported
rg -n "from typing|Protocol" src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Check the actual docstring for the function to understand its purpose
sed -n '488,510p' src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 794
🏁 Script executed:
# Verify the exact signature of spawn_queue calls in the codebase
rg -n "spawn_queue" src/gradata/contrib/patterns/orchestrator.py -B 2 -A 2Repository: Gradata/gradata
Length of output: 880
🏁 Script executed:
# Check if there are other usages of brain.spawn_queue or similar patterns
rg -n "spawn_queue" --type py | grep -v test | head -10Repository: Gradata/gradata
Length of output: 911
🏁 Script executed:
# Verify if Callable and Any are imported, or need to be added
grep -n "Callable\|Any" src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 108
🏁 Script executed:
# Check if execute_orchestrated or related functions access DB and should use BrainContext
sed -n '488,542p' src/gradata/contrib/patterns/orchestrator.pyRepository: Gradata/gradata
Length of output: 2223
🏁 Script executed:
# Check what Brain.spawn_queue returns to verify the return type annotation
sed -n '1110,1145p' src/gradata/brain.pyRepository: Gradata/gradata
Length of output: 1833
🏁 Script executed:
# Verify the complete typing imports needed for the suggested refactor
grep -n "^from typing\|^import typing" src/gradata/brain.py | head -5Repository: Gradata/gradata
Length of output: 94
Use a Protocol to eliminate the type: ignore and improve type safety.
Lines 489–491 have overly generic type annotations (worker: object, brain: object | None), causing the type: ignore[union-attr] on line 527. Replace them with proper type hints using a Protocol that describes the spawn_queue interface.
Suggested refactor
+from typing import Any, Callable, Protocol
+
+class SupportsSpawnQueue(Protocol):
+ def spawn_queue(
+ self,
+ tasks: list[str],
+ worker: Callable[[str], Any],
+ *,
+ max_concurrent: int = 3,
+ ) -> dict: ...
+
def execute_orchestrated(
tasks: list[str],
- worker: object,
- brain: object | None = None,
+ worker: Callable[[str], Any],
+ brain: SupportsSpawnQueue | None = None,
*,
max_concurrent: int = 3,
) -> dict:
@@
- if brain and hasattr(brain, "spawn_queue"):
- _sq = brain.spawn_queue # type: ignore[union-attr]
+ if brain is not None:
+ _sq = brain.spawn_queue
result = _sq(tasks=tasks, worker=worker, max_concurrent=max_concurrent)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/contrib/patterns/orchestrator.py` around lines 526 - 528, Define
a Protocol (e.g., SpawnQueueProtocol) that declares a spawn_queue method with
the exact signature used (e.g., def spawn_queue(self, *, tasks: Sequence[Task],
worker: WorkerType, max_concurrent: int) -> Any), update the brain
parameter/type from object | None to Optional[SpawnQueueProtocol] (and
add/replace any WorkerType/Task types as appropriate), replace the unsafe
cast/_sq assignment so mypy knows _sq is the callable (e.g., _sq: Callable[...,
Any] = brain.spawn_queue) and remove the type: ignore[union-attr] on that line;
ensure imports from typing (Protocol, Optional, Callable) are added and adjust
surrounding annotations accordingly so spawn_queue is fully typed and
type-checking passes.
| if best_rule is not None and best_similarity > update_threshold and _detect_opposite_direction(new_desc, best_rule.description): | ||
| return (RuleRelation.UPDATES, best_rule) | ||
| if best_rule is not None and best_similarity > extend_threshold and not _detect_opposite_direction(new_desc, best_rule.description): | ||
| return (RuleRelation.EXTENDS, best_rule) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LGTM! Clean refactoring that improves readability.
The consolidation of nested if blocks into single conditions preserves the original logic while reducing indentation. The behavior remains correct: UPDATES is returned when similarity exceeds the update threshold and opposite direction is detected; EXTENDS is returned when similarity exceeds the extend threshold without opposite direction.
♻️ Optional: Cache direction check to avoid redundant calls
When best_similarity > update_threshold but direction is not opposite, _detect_opposite_direction will be called again for the extend check. Consider caching the result:
if rule.category == new_lesson.category:
rule_keywords = _extract_keywords(rule.description)
if new_keywords & rule_keywords:
category_cluster.append(rule)
- if best_rule is not None and best_similarity > update_threshold and _detect_opposite_direction(new_desc, best_rule.description):
+
+ is_opposite = _detect_opposite_direction(new_desc, best_rule.description) if best_rule is not None else False
+
+ if best_rule is not None and best_similarity > update_threshold and is_opposite:
return (RuleRelation.UPDATES, best_rule)
- if best_rule is not None and best_similarity > extend_threshold and not _detect_opposite_direction(new_desc, best_rule.description):
+ if best_rule is not None and best_similarity > extend_threshold and not is_opposite:
return (RuleRelation.EXTENDS, best_rule)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/rule_evolution.py` around lines 281 - 284, Cache the
result of calling _detect_opposite_direction to avoid calling it twice: compute
a local boolean (e.g., is_opposite = _detect_opposite_direction(new_desc,
best_rule.description)) once when best_rule is not None and reuse it in the two
conditionals that return RuleRelation.UPDATES and RuleRelation.EXTENDS, keeping
the comparisons against best_similarity, update_threshold, and extend_threshold
unchanged and referencing best_rule, best_similarity, update_threshold,
extend_threshold, new_desc, and _detect_opposite_direction.
| def resolve_brain_dir() -> str | None: | ||
| """Resolve brain directory from env vars or default location.""" | ||
| brain_dir = os.environ.get("GRADATA_BRAIN_DIR") or os.environ.get("BRAIN_DIR") | ||
| if brain_dir: | ||
| return brain_dir if Path(brain_dir).exists() else None | ||
| default = Path.home() / ".gradata" / "brain" | ||
| return str(default) if default.exists() else None |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove the built-in ~/.gradata/brain fallback.
This helper hardcodes a filesystem location into hook resolution, so nonstandard installs depend on one local path convention instead of the configured SDK path source.
🧭 Proposed fix
def resolve_brain_dir() -> str | None:
"""Resolve brain directory from env vars or default location."""
brain_dir = os.environ.get("GRADATA_BRAIN_DIR") or os.environ.get("BRAIN_DIR")
- if brain_dir:
- return brain_dir if Path(brain_dir).exists() else None
- default = Path.home() / ".gradata" / "brain"
- return str(default) if default.exists() else None
+ if not brain_dir:
+ return None
+ return brain_dir if Path(brain_dir).exists() else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/hooks/_base.py` around lines 51 - 57, The resolve_brain_dir
helper currently falls back to a hardcoded path (~/.gradata/brain); remove that
fallback so it only uses environment variables and validates them. Update
resolve_brain_dir to read GRADATA_BRAIN_DIR or BRAIN_DIR, return the path string
if the env var is set and exists (else None), and eliminate any use of
Path.home() or the default ".gradata/brain" logic so no hardcoded filesystem
path remains. Ensure the function signature and return types (resolve_brain_dir)
are preserved.
| try: | ||
| return self.brain.apply_brain_rules(task, context) | ||
| except Exception: | ||
| return "" | ||
| return "" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding debug logging to the exception handler for consistency.
The save() method logs exceptions with logger.debug(), but get_rules() (and search()) silently swallow errors. Adding consistent logging would improve debuggability when these methods fail.
📋 Proposed enhancement for consistency
try:
return self.brain.apply_brain_rules(task, context)
- except Exception:
+ except Exception as e:
+ logger.debug("CrewAI get_rules failed: %s", e)
return ""Consider the same pattern for search() at lines 88-89.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| return self.brain.apply_brain_rules(task, context) | |
| except Exception: | |
| return "" | |
| \ No newline at end of file | |
| return "" | |
| try: | |
| return self.brain.apply_brain_rules(task, context) | |
| except Exception as e: | |
| logger.debug("CrewAI get_rules failed: %s", e) | |
| return "" |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 103-103: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/integrations/crewai_adapter.py` around lines 101 - 104, The
exception handlers in get_rules() (where it returns
self.brain.apply_brain_rules(task, context)) and in search() currently swallow
errors silently; update those except Exception blocks to log the exception
details using the same pattern as save() (e.g., logger.debug or
logger.exception) so failures include context and the exception info;
specifically modify the except blocks surrounding
self.brain.apply_brain_rules(...) and the search() implementation to call
logger.debug/exception with a brief message and the caught exception.
| bus.on("correction.created", lambda p: _embed_and_cache(p, "description", "text"), async_handler=True) | ||
| bus.on("lesson.graduated", lambda p: _embed_and_cache(p, "description"), async_handler=True) | ||
| bus.on("meta_rule.created", lambda p: _embed_and_cache(p, "description", "rule"), async_handler=True) | ||
| bus.on("meta_rule.created", lambda p: _embed_and_cache(p, "description", "rule"), async_handler=True) |
There was a problem hiding this comment.
Protect embedding cache mutations in async handlers.
At Line 158–Line 160, handlers run with async_handler=True and call _embed_and_cache, which mutates _embedding_cache without synchronization. This is a race on shared state.
Proposed thread-safety patch
+import threading
...
_embedding_cache: OrderedDict[str, list[float]] = OrderedDict()
+_embedding_cache_lock = threading.RLock()
...
def get_client():
global _default_client
- if _default_client is None:
- _default_client = EmbeddingClient(
- api_url=os.environ.get("GRADATA_API_URL"),
- api_token=os.environ.get("GRADATA_API_TOKEN"),
- )
+ with _embedding_cache_lock:
+ if _default_client is None:
+ _default_client = EmbeddingClient(
+ api_url=os.environ.get("GRADATA_API_URL"),
+ api_token=os.environ.get("GRADATA_API_TOKEN"),
+ )
return _default_client
...
def _embed_and_cache(payload, *keys):
try:
...
- if desc and desc not in _embedding_cache:
- vec = get_client().embed(desc)
- if vec:
- _embedding_cache[desc] = vec
- while len(_embedding_cache) > _CACHE_MAX_SIZE:
- _embedding_cache.popitem(last=False)
+ if desc:
+ with _embedding_cache_lock:
+ if desc in _embedding_cache:
+ return
+ vec = get_client().embed(desc)
+ if vec:
+ with _embedding_cache_lock:
+ _embedding_cache[desc] = vec
+ while len(_embedding_cache) > _CACHE_MAX_SIZE:
+ _embedding_cache.popitem(last=False)
except Exception:
logger.warning("Failed to embed payload", exc_info=True)As per coding guidelines, "src/gradata/integrations/**: ... check for ... thread safety on shared state ...".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/integrations/embeddings.py` around lines 158 - 160, The handlers
registered with bus.on (those with async_handler=True) call _embed_and_cache
which mutates the shared dict _embedding_cache without synchronization, causing
races; fix by adding a module-level lock (e.g., _embedding_cache_lock =
threading.Lock()) and acquire it around all mutations of _embedding_cache inside
_embed_and_cache (use a context manager to ensure release on exceptions), and
ensure any other code paths that write to _embedding_cache also use the same
lock so all concurrent async_handler invocations are serialized.
| def clear(self) -> None: | ||
| """No-op for persistent brain memory.""" | ||
| pass | ||
| pass |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Remove redundant pass in clear().
The method already has a docstring body, so pass is unnecessary and triggers Ruff PIE790.
♻️ Proposed fix
def clear(self) -> None:
"""No-op for persistent brain memory."""
- pass🧰 Tools
🪛 Ruff (0.15.9)
[warning] 113-113: Unnecessary pass statement
Remove unnecessary pass
(PIE790)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/integrations/langchain_adapter.py` at line 113, The clear()
method contains an unnecessary trailing "pass" after its docstring which
triggers Ruff PIE790; remove the redundant "pass" from the clear() method (leave
the existing docstring in place) so the function body is just the docstring and
no no-op statement remains.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/sdk-ci.yml:
- Around line 3-15: The CI currently only triggers SDK paths in the sdk-ci
workflow and the main ci workflow runs pytest against the SDK tests but not the
cloud backend; update CI so cloud backend tests run by either adding a step that
executes pytest cloud/tests/ in the existing ci workflow's test job (the job
that currently runs pytest tests/ and type-checks src/) or create a new workflow
triggered on changes under the cloud path that runs pytest cloud/tests/ (and
migrations checks if present), ensuring the cloud test invocation is added to
the test job or new workflow job name so backend changes are validated.
- Around line 36-40: Replace the floating action tag and ephemeral uv version
with pinned values: change the GitHub Action usage from astral-sh/setup-uv@v4 to
a specific commit SHA (e.g.,
astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57) and set the uv
input version from "latest" to a tested fixed version (e.g., "0.4.25") so the
step that uses the setup-uv action and the with: version setting are both
deterministic.
In `@pyproject.toml`:
- Around line 105-118: The pyproject.toml currently lists a global Bandit
"skips" array that mutes 105 real violations; remove the broad skips for
high/medium risk rules and convert them to targeted suppressions: keep only
"B110" in the global skips entry (since defensive try/except/pass is pervasive),
remove "B603","B607","B608","B310","B105","B615" from the skips list, run Bandit
to locate the specific occurrences of those removed rules, and add inline "#
nosec: <rule> — rationale" comments at each justified site (for example next to
subprocess calls, SQL schema code, URL-open or constant password literals) with
a brief explanation referencing the code context; ensure any remaining
true-security issues are fixed rather than suppressed and update
pyproject.toml's skips array to reflect only the accepted global exemptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a68ad716-b2b1-4efb-8423-8c8e09ae723e
📒 Files selected for processing (2)
.github/workflows/sdk-ci.ymlpyproject.toml
📜 Review details
🔇 Additional comments (1)
pyproject.toml (1)
40-40: Good addition: Ruff is now explicitly installed in dev environments.This aligns local tooling with the configured lint pipeline.
| on: | ||
| push: | ||
| paths: | ||
| - "src/gradata/**" | ||
| - "tests/**" | ||
| - "pyproject.toml" | ||
| - ".github/workflows/sdk-ci.yml" | ||
| pull_request: | ||
| paths: | ||
| - "src/gradata/**" | ||
| - "tests/**" | ||
| - "pyproject.toml" | ||
| - ".github/workflows/sdk-ci.yml" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Workflows present ==="
fd -t f .github/workflows
echo "=== Workflow path filters mentioning cloud/backend/deploy ==="
rg -n -C2 'paths:|cloud/\*\*|cloud/|railway|Dockerfile|migrations' .github/workflows || trueRepository: Gradata/gradata
Length of output: 475
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Workflows present ==="
fd . '.github/workflows' -t f
echo ""
echo "=== Workflow contents and path filters ==="
for workflow in $(fd . '.github/workflows' -t f); do
echo "--- File: $workflow ---"
cat -n "$workflow" | head -40
done
echo ""
echo "=== Search for cloud/deploy/backend references across workflows ==="
rg 'paths:|cloud/|migrations|Dockerfile|railway|deploy' '.github/workflows' -n -A2 || echo "No matches found"Repository: Gradata/gradata
Length of output: 5930
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Repository root structure ==="
ls -la
echo ""
echo "=== Check for cloud directory ==="
fd -t d cloud . 2>/dev/null || echo "No 'cloud' directory found"
echo ""
echo "=== Check for migration-related files ==="
fd -t f migration . 2>/dev/null | head -20 || echo "No migration files found"
echo ""
echo "=== Check for Dockerfile ==="
fd -t f Dockerfile . 2>/dev/null || echo "No Dockerfile found"
echo ""
echo "=== List main directories at root ==="
fd -d 1 -t d . | grep -v '^\.$' | sortRepository: Gradata/gradata
Length of output: 1355
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Cloud directory structure ==="
find cloud/ -type f | head -20
echo ""
echo "=== Cloud directory contents (tree view) ==="
tree cloud/ -L 3 2>/dev/null || find cloud/ -type f | sed 's|[^/]*/| |g'
echo ""
echo "=== Check if cloud code has tests ==="
fd test . cloud/ || echo "No test files in cloud/"
echo ""
echo "=== Sample cloud file to understand what it is ==="
head -20 cloud/*/\*.py | head -50 || head -30 $(find cloud/ -type f -name "*.py" | head -1) 2>/dev/null || echo "No Python files found"Repository: Gradata/gradata
Length of output: 3920
Cloud backend code lacks CI validation coverage.
The cloud/** directory (with app code, migrations, and its own test suite in cloud/tests/) is not tested by any workflow:
- sdk-ci.yml (lines 5–15) explicitly limits triggers to SDK paths only, excluding
cloud/** - ci.yml has no path filters and runs on all changes, but its test job only executes
pytest tests/(SDK tests) and type-checkssrc/— it does not runcloud/tests/
Backend changes to migrations, API routes, or auth logic can merge to main without validation. Add a workflow step to run pytest cloud/tests/ in ci.yml, or create a separate workflow that triggers on cloud/** changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/sdk-ci.yml around lines 3 - 15, The CI currently only
triggers SDK paths in the sdk-ci workflow and the main ci workflow runs pytest
against the SDK tests but not the cloud backend; update CI so cloud backend
tests run by either adding a step that executes pytest cloud/tests/ in the
existing ci workflow's test job (the job that currently runs pytest tests/ and
type-checks src/) or create a new workflow triggered on changes under the cloud
path that runs pytest cloud/tests/ (and migrations checks if present), ensuring
the cloud test invocation is added to the test job or new workflow job name so
backend changes are validated.
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
| with: | ||
| version: "latest" | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🌐 Web query:
What is the recommended pinning strategy for astral-sh/setup-uvin GitHub Actions, and shouldversion: "latest" be avoided for reproducible CI?
💡 Result:
Recommended pinning for astral-sh/setup-uv
- Pin the action itself to an immutable reference, ideally a full commit SHA (best), or at least an immutable semver tag like
v8.0.0(notv8). The project explicitly recommends the SHA, and it has stopped publishing major/minor floating tags (so@v8/@v8.0won’t be available). (github.com) - This also matches GitHub’s guidance for third-party actions: pin to a full-length commit SHA for immutability/security. (docs.github.com)
Example:
- uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0Should version: "latest" be avoided for reproducible CI?
Yes—avoid version: "latest" (and relying on the default “latest”) if you want reproducible CI, because the action will install whatever uv version is latest at runtime when no required version is found. Prefer pinning with: version: "X.Y.Z" or setting a required-version in uv.toml / pyproject.toml so installs don’t drift over time. (github.com)
Citations:
- 1: https://github.com/astral-sh/setup-uv/releases
- 2: https://docs.github.com/en/enterprise-server%403.16/actions/how-tos/security-for-github-actions/security-guides/security-hardening-for-github-actions?utm_source=openai
- 3: https://github.com/astral-sh/setup-uv
- 4: https://github.com/astral-sh/setup-uv
Pin uv to a fixed version instead of latest.
Using version: "latest" introduces nondeterministic CI breakages. Pin a tested version (e.g., version: "0.4.25") for reproducible builds. Also pin the action itself to a commit SHA rather than the floating v4 tag (e.g., astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/sdk-ci.yml around lines 36 - 40, Replace the floating
action tag and ephemeral uv version with pinned values: change the GitHub Action
usage from astral-sh/setup-uv@v4 to a specific commit SHA (e.g.,
astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57) and set the uv
input version from "latest" to a tested fixed version (e.g., "0.4.25") so the
step that uses the setup-uv action and the with: version setting are both
deterministic.
| skips = [ | ||
| "B101", # allow assert in non-test code | ||
| "B110", # try/except/pass — SDK uses defensive coding with logged fallbacks | ||
| "B112", # try/except/continue — same pattern | ||
| "B311", # pseudo-random — used for non-security shuffling (rule ordering) | ||
| "B324", # weak hash — SHA256 used for content hashing, not security | ||
| "B404", # subprocess import — SDK shells out to qmd/git intentionally | ||
| "B603", # subprocess without shell — intentional CLI tool invocations | ||
| "B607", # partial path — CLI tools resolved via PATH | ||
| "B608", # SQL string construction — SDK builds SQLite queries with parameterized inputs | ||
| "B310", # url open audit — SDK opens URLs from validated config, not user input | ||
| "B105", # hardcoded password — false positive on string constants like 'pass', 'False' | ||
| "B615", # huggingface download — benchmark code, not production | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python -m pip install --quiet "bandit>=1.7,<2" "tomli>=2,<3"
echo "=== Current configured skips ==="
python - <<'PY'
import tomli, pathlib
cfg = tomli.loads(pathlib.Path("pyproject.toml").read_text())
print(cfg["tool"]["bandit"].get("skips", []))
PY
echo "=== Bandit with current config ==="
bandit -r src/gradata/ -c pyproject.toml -f json -o /tmp/bandit_config.json || true
python - <<'PY'
import json
d=json.load(open("/tmp/bandit_config.json"))
print("Issues (with config):", len(d.get("results", [])))
PY
echo "=== Bandit baseline without custom config ==="
bandit -r src/gradata/ -f json -o /tmp/bandit_baseline.json || true
python - <<'PY'
import json
d=json.load(open("/tmp/bandit_baseline.json"))
print("Issues (baseline):", len(d.get("results", [])))
sev={}
for r in d.get("results", []):
sev[r["issue_severity"]] = sev.get(r["issue_severity"],0)+1
print("Baseline severity counts:", sev)
PYRepository: Gradata/gradata
Length of output: 699
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Fix PATH for bandit and run comprehensive analysis
export PATH="/home/jailuser/.local/bin:$PATH"
echo "=== Verifying bandit is accessible ==="
which bandit || echo "Bandit not found"
# Re-run bandit with proper PATH
echo "=== Bandit with current config (pyproject.toml) ==="
bandit -r src/gradata/ -c pyproject.toml -f json -o /tmp/bandit_config.json 2>&1 || EXIT_CODE=$?
if [ -f /tmp/bandit_config.json ]; then
python3 - <<'PY'
import json
d = json.load(open("/tmp/bandit_config.json"))
print(f"Issues found (with config): {len(d.get('results', []))}")
for result in d.get('results', [])[:10]:
print(f" - {result['test_id']}: {result['issue_text']} (severity: {result['issue_severity']})")
PY
else
echo "Config file not generated"
fi
echo "=== Bandit baseline (no custom config) ==="
bandit -r src/gradata/ -f json -o /tmp/bandit_baseline.json 2>&1 || EXIT_CODE=$?
if [ -f /tmp/bandit_baseline.json ]; then
python3 - <<'PY'
import json
d = json.load(open("/tmp/bandit_baseline.json"))
print(f"Issues found (baseline): {len(d.get('results', []))}")
sev = {}
for r in d.get('results', []):
sev[r['issue_severity']] = sev.get(r['issue_severity'], 0) + 1
print(f"Severity breakdown: {sev}")
# Count by test_id
test_counts = {}
for r in d.get('results', []):
tid = r['test_id']
test_counts[tid] = test_counts.get(tid, 0) + 1
print("\nTop issues by test_id:")
for tid, count in sorted(test_counts.items(), key=lambda x: -x[1])[:15]:
print(f" {tid}: {count}")
PY
else
echo "Baseline file not generated"
fiRepository: Gradata/gradata
Length of output: 1154
🏁 Script executed:
#!/bin/bash
# Search for actual subprocess calls and patterns mentioned in the skips
echo "=== Searching for subprocess patterns (B603, B607, B404) ==="
rg "subprocess|import os|shell\s*=" src/gradata/ -A 2 -B 1 | head -50
echo -e "\n=== Searching for SQL construction patterns (B608) ==="
rg "SELECT|INSERT|UPDATE|DELETE|\.execute\(" src/gradata/ -A 2 -B 1 | head -50
echo -e "\n=== Searching for URL opening (B310) ==="
rg "urllib|urlopen|requests\." src/gradata/ -A 2 -B 1 | head -50Repository: Gradata/gradata
Length of output: 9692
🏁 Script executed:
#!/bin/bash
# Check the actual structure and assess if nosec would be practical
echo "=== Number of files in src/gradata ==="
find src/gradata -type f -name "*.py" | wc -l
echo "=== Lines of code in src/gradata ==="
find src/gradata -type f -name "*.py" -exec wc -l {} + | tail -1
echo "=== Check if there are already nosec comments in use ==="
rg "# nosec" src/gradata/ | wc -l
echo "=== Current test exclusion ==="
rg "exclude_dirs.*tests" pyproject.toml -A 2Repository: Gradata/gradata
Length of output: 448
Global Bandit skips suppress 105 real violations, including 3 HIGH and 11 MEDIUM severity issues—consider targeted suppression for high-risk checks.
Lines 105–118 globally disable security checks for the entire codebase. With current configuration, 0 issues are flagged; without it, the baseline shows 105 violations (3 HIGH, 11 MEDIUM, 91 LOW). Most concerning:
- B110 (61 occurrences of try/except/pass) is genuinely pervasive—global skip is pragmatic here
- B603/B607 (subprocess without shell, partial path) appear 11 times combined in controlled contexts (git operations with list-format commands)
- B608 (SQL string construction) occurs 6 times, mostly for static schema operations with parameterized inputs
- B310 (URL opening) occurs 4 times with SSRF validation logic present
- B105 (hardcoded password) flags 6 false positives on string constants
The review comment's core concern is valid: global skips mask violations entirely and weaken enforcement. However, the practical path forward is more nuanced:
- Keep B110 global (61 occurrences of defensive try/except patterns are too numerous to annotate)
- Convert HIGH/MEDIUM violations to targeted
# noseccomments (14 issues) with inline rationale - Audit false positives (B105, B615) separately
Currently, zero # nosec comments exist in the codebase. Implementing targeted suppression is feasible for the high-risk checks but requires code changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyproject.toml` around lines 105 - 118, The pyproject.toml currently lists a
global Bandit "skips" array that mutes 105 real violations; remove the broad
skips for high/medium risk rules and convert them to targeted suppressions: keep
only "B110" in the global skips entry (since defensive try/except/pass is
pervasive), remove "B603","B607","B608","B310","B105","B615" from the skips
list, run Bandit to locate the specific occurrences of those removed rules, and
add inline "# nosec: <rule> — rationale" comments at each justified site (for
example next to subprocess calls, SQL schema code, URL-open or constant password
literals) with a brief explanation referencing the code context; ensure any
remaining true-security issues are fixed rather than suppressed and update
pyproject.toml's skips array to reflect only the accepted global exemptions.
- SQL: add UNIQUE(workspace_id, user_id) on brains table - SQL: drop duplicate idx_brains_api_key (UNIQUE already creates index) - SQL: pin search_path on SECURITY DEFINER trigger function - SQL: create default brain in handle_new_user trigger - brain.py: wrap patch_rule read-mutate-write under lessons_lock - brain.py: protect emit from making successful patch look failed - behavioral_extractor.py: align fallback _FACTUAL_RE with edit_classifier - behavioral_extractor.py: remove capitalization bypass in _is_actionable - conftest.py: mock select() now respects filters - sync.py: preserve created_at from SDK payload on corrections/events Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gradata/brain.py (1)
703-713:⚠️ Potential issue | 🟡 MinorEnsure DB connection closes on error paths in
review_pending().At Line 707, connection close happens only on success. If query/fetch throws before Line 712, the connection is left open.
🔧 Proposed fix
try: import sqlite3 from gradata._db import get_connection - conn = get_connection(self.db_path) - conn.row_factory = sqlite3.Row - rows = conn.execute( - "SELECT * FROM pending_approvals WHERE resolution IS NULL " - "ORDER BY created_at DESC").fetchall() - conn.close() + with contextlib.closing(get_connection(self.db_path)) as conn: + conn.row_factory = sqlite3.Row + rows = conn.execute( + "SELECT * FROM pending_approvals WHERE resolution IS NULL " + "ORDER BY created_at DESC").fetchall() return [dict(r) for r in rows]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain.py` around lines 703 - 713, The review_pending() function opens a DB connection via get_connection(self.db_path) and currently calls conn.close() only after a successful execute/fetchall; wrap the query logic in a try/finally (or use a context manager) to ensure conn.close() is always called even if conn.execute(...) or fetchall() raises. Specifically, ensure the conn object created in review_pending() is closed in the finally block (or by using a with-style context) around the execute/fetchall and row_factory usage so no connections are leaked.
♻️ Duplicate comments (2)
src/gradata/enhancements/behavioral_extractor.py (1)
422-430:⚠️ Potential issue | 🟡 MinorFix ruff SIM103: return the condition directly.
The logic correctly addresses the past review finding (removes capitalization bypass), but the pipeline fails due to the verbose return pattern.
🔧 Proposed fix to satisfy ruff SIM103
def _is_actionable(instruction: str) -> bool: if not instruction or len(instruction) < 5: return False first_word = instruction.split()[0].lower().removesuffix("'t") - if first_word in _IMPERATIVE_STARTERS: - return True - # Accept only known PREFIX_INSTRUCTION patterns (explicit user rules like - # "User corrected: ...") — don't treat arbitrary capitalized sentences as actionable - return False + return first_word in _IMPERATIVE_STARTERS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/behavioral_extractor.py` around lines 422 - 430, In _is_actionable, simplify the final check by returning the boolean expression directly instead of the verbose if/return pattern: compute first_word as currently done (using instruction.split()[0].lower().removesuffix("'t")) and then return first_word in _IMPERATIVE_STARTERS; keep the early empty/short-string guard intact and preserve the comment about PREFIX_INSTRUCTION patterns.cloud/app/routes/sync.py (1)
59-66: 🧹 Nitpick | 🔵 TrivialAvoid single-letter variable
l—easily confused with1.Using
las an iterator variable is flagged by linters (E741) because it's visually ambiguous with the digit1.♻️ Rename to `lesson`
rows = [ { "brain_id": brain_id, - "category": l.category, - "description": l.description, - "state": l.state.value, - "confidence": float(l.confidence), - "fire_count": l.fire_count, - "recurrence_days": l.recurrence_days, + "category": lesson.category, + "description": lesson.description, + "state": lesson.state.value, + "confidence": float(lesson.confidence), + "fire_count": lesson.fire_count, + "recurrence_days": lesson.recurrence_days, } - for l in body.lessons + for lesson in body.lessons ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/sync.py` around lines 59 - 66, The comprehension is using the ambiguous single-letter iterator `l`; rename it to `lesson` everywhere in that expression (e.g., change "for l in body.lessons" to "for lesson in body.lessons" and update all uses like l.category, l.description, l.state, l.confidence, l.fire_count, l.recurrence_days to lesson.category, lesson.description, lesson.state, lesson.confidence, lesson.fire_count, lesson.recurrence_days) so linters won't flag E741 and readability improves.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/app/routes/sync.py`:
- Around line 88-100: The insert currently drops SDK-provided lesson refs by
always writing source_lesson_ids: []; update the rows construction to populate
source_lesson_ids from the incoming payload (use m.source_lesson_ids or the
appropriate field on body.meta_rules items, falling back to [] when missing) so
db.insert("meta_rules", rows) persists SDK-provided references; keep existing
brain_id/title/description handling and update meta_rules_count accordingly, and
if resolution is intentionally deferred add a TODO comment referencing
server-side resolution to make the behavior explicit.
- Around line 54-69: The upsert into the lessons table via db.upsert(rows) is
not idempotent because Postgres needs a unique constraint or an explicit
conflict target to merge duplicates; ensure conflicts are resolved by either
adding a UNIQUE constraint on (brain_id, description) in the DB schema/migration
for the lessons table, or change the upsert call to include an explicit conflict
target/ON CONFLICT clause (so db.upsert("lessons", rows,
conflict_target=["brain_id","description"]) or equivalent) so that db.upsert and
the INSERT/merge will update existing rows rather than inserting duplicates;
reference the lessons table and the db.upsert usage in the sync handler that
constructs rows with brain_id and description.
In `@cloud/migrations/001_initial_schema.sql`:
- Around line 199-206: The policy rule_patches_via_lesson on rule_patches only
allows access when b.user_id = auth.uid(), excluding workspace members; update
the USING clause to mirror the other table policies by permitting access if the
auth user is the brain owner OR is a workspace member of that brain (e.g., add
the same JOIN/EXISTS logic used in corrections/lessons/meta_rules/events to
check workspace_members for b.id and auth.uid()), so rule_patches_via_lesson
grants access to both brain owners and workspace members for lessons-linked
patches.
- Around line 218-234: The trigger body currently INSERTs into workspaces and
then re-queries that table twice with SELECT ... ORDER BY created_at DESC LIMIT
1; change this to capture the inserted workspace id directly (use INSERT ...
RETURNING id INTO a local variable or DECLARE a workspace_id and use RETURNING
INTO) immediately after the INSERT INTO workspaces (referencing the same INSERT
statement in the BEGIN block), then use that workspace_id variable for the
subsequent INSERT INTO workspace_members and INSERT INTO brains calls instead of
the two SELECT subqueries to avoid race conditions and improve efficiency.
In `@cloud/tests/conftest.py`:
- Around line 49-50: The mock update(self, table: str, data: dict, filters: dict
| None = None) currently ignores filters and always returns [data]; change it to
respect filters or at least record them so tests can assert correct filter
usage: implement filtering logic similar to the mock select() (apply filters to
the stored rows for the given table and return only matching rows after applying
the updates) or append each call to a list like self._updates.append({"table":
table, "data": data, "filters": filters}) so tests can assert filter values;
update the update method and any test fixtures that rely on it (refer to update
and select in conftest.py).
In `@src/gradata/brain.py`:
- Around line 1149-1150: The "Re-export Pipeline type" block currently does
nothing; replace the no-op inside the contextlib.suppress(ImportError): block
with an actual re-export of the Pipeline type (e.g., import Pipeline from the
module that defines it and bind it in this module's namespace), and ensure the
symbol is included in this module's export list (__all__) if present; make the
import happen inside the existing contextlib.suppress(ImportError) so import
failures remain tolerated (refer to the existing contextlib.suppress usage and
the symbol name Pipeline to find where to add the import/re-export).
- Around line 346-357: After locating lessons_path via
self._find_lessons_path(), avoid mutating external/fallback lessons files by
adding a safety check before entering lessons_lock and calling
write_lessons_safe: verify lessons_path is inside the repository-controlled
location (or not equal to the known external fallback/read-only path) and if it
is external, return {"patched": False, "error": "readonly: external lessons
file"} instead of writing; update the code around _find_lessons_path,
lessons_lock, apply_patch, and write_lessons_safe to perform this guard so
apply_patch and format_lessons are only used when writes are permitted.
---
Outside diff comments:
In `@src/gradata/brain.py`:
- Around line 703-713: The review_pending() function opens a DB connection via
get_connection(self.db_path) and currently calls conn.close() only after a
successful execute/fetchall; wrap the query logic in a try/finally (or use a
context manager) to ensure conn.close() is always called even if
conn.execute(...) or fetchall() raises. Specifically, ensure the conn object
created in review_pending() is closed in the finally block (or by using a
with-style context) around the execute/fetchall and row_factory usage so no
connections are leaked.
---
Duplicate comments:
In `@cloud/app/routes/sync.py`:
- Around line 59-66: The comprehension is using the ambiguous single-letter
iterator `l`; rename it to `lesson` everywhere in that expression (e.g., change
"for l in body.lessons" to "for lesson in body.lessons" and update all uses like
l.category, l.description, l.state, l.confidence, l.fire_count,
l.recurrence_days to lesson.category, lesson.description, lesson.state,
lesson.confidence, lesson.fire_count, lesson.recurrence_days) so linters won't
flag E741 and readability improves.
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 422-430: In _is_actionable, simplify the final check by returning
the boolean expression directly instead of the verbose if/return pattern:
compute first_word as currently done (using
instruction.split()[0].lower().removesuffix("'t")) and then return first_word in
_IMPERATIVE_STARTERS; keep the early empty/short-string guard intact and
preserve the comment about PREFIX_INSTRUCTION patterns.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fcd68b89-b423-4da0-8269-bdf96f6f7e4f
📒 Files selected for processing (5)
cloud/app/routes/sync.pycloud/migrations/001_initial_schema.sqlcloud/tests/conftest.pysrc/gradata/brain.pysrc/gradata/enhancements/behavioral_extractor.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/enhancements/behavioral_extractor.pysrc/gradata/brain.py
🪛 GitHub Actions: SDK CI
src/gradata/enhancements/behavioral_extractor.py
[error] 426-426: ruff check failed: SIM103 Return the condition directly. Inline condition.
🪛 SQLFluff (4.1.0)
cloud/migrations/001_initial_schema.sql
[error] 28-242: Line 28, Position 1: Found unparsable section: 'CREATE TABLE brains (\n id UUID PRIMAR...'
(PRS)
🔇 Additional comments (12)
src/gradata/enhancements/behavioral_extractor.py (4)
33-37: LGTM — fallback regex now covers common factual patterns.The expanded
_FACTUAL_REfallback now properly matches ISO dates, percentages, dollar amounts, URLs, and multi-digit numbers, aligning with the canonicaledit_classifier._FACTUAL_REbehavior.
204-205: LGTM —strict=Truedocuments the length invariant.Both iterables derive from the same source list, so lengths are guaranteed equal. The explicit
strict=Trueserves as a correctness assertion.
208-222: LGTM — word-boundary matching prevents false positives.Using
re.search(r'\b...\b', ...)on precomputed lowercase strings ensures case-insensitive matching while avoiding substring false positives (e.g., "must" in "mustard").
405-407: LGTM — extended imperative vocabulary improves extraction coverage.The additional verbs (
prioritize,emphasize,focus,simplify, etc.) are common action words that expand the quality gate's acceptance of valid instructions.src/gradata/brain.py (3)
350-368: Good hardening of patch flow and event emission.Keeping parse→apply→write under
lessons_lock(Line 350) and making emit failures non-fatal (Lines 359-368) correctly avoids race-related clobbers and false failure reporting.
653-659:sqlite3.Rowsetup in_resolve_pendinglooks correct.The scoped
sqlite3import andconn.row_factory = sqlite3.Rowat Line 659 keep row access consistent and explicit.
970-970: Exception chaining inexport()is correct.
raise RuntimeError(...) from eat Line 970 preserves root cause while presenting a clearer API-level error.cloud/app/routes/sync.py (2)
36-100: Partial failure leaves orphaned data—consider documenting or mitigating.As noted in a previous review, the sync endpoint performs multiple independent writes. If, for example,
db.insert("events", ...)fails after corrections and lessons succeed, the client receives an error but partial data remains committed. PostgREST doesn't support multi-statement transactions via REST.Options to consider:
- Document the behavior so SDK clients know to retry the full sync on failure.
- Add idempotency keys to correction/event rows so retries can be deduplicated.
- Use a Postgres function (RPC) to wrap all inserts in a single transaction.
38-52:created_atpreservation now implemented—previous concern addressed.The correction and event rows now conditionally include
created_atfrom the payload (Lines 47 and 81), preserving SDK-provided timestamps.Also applies to: 72-86
cloud/migrations/001_initial_schema.sql (1)
28-37: Previous review findings have been addressed.The
UNIQUE (workspace_id, user_id)constraint is now in place (Line 36), and theSET search_path = publichas been added to the trigger function (Line 216). The duplicateidx_brains_api_keyindex has also been removed.cloud/tests/conftest.py (2)
30-37: Previous review finding addressed—filter logic now implemented.The
select()method correctly filters rows by matching all key-value pairs whenfiltersis provided.
53-58: No action required—the monkeypatch approach is correct.The fixture properly targets
app.db._client, which is a module-level singleton initialized toNone. When tests run, the monkeypatch intercepts this variable beforeget_db()is called. The code structure (singleton pattern with lazy initialization) makes the monkeypatch effective, not problematic. The inline comment "Replaced by mock in tests" confirms this approach is intentional.> Likely an incorrect or invalid review comment.
| # Upsert lessons (idempotent by description) | ||
| if body.lessons: | ||
| rows = [ | ||
| { | ||
| "brain_id": brain_id, | ||
| "category": l.category, | ||
| "description": l.description, | ||
| "state": l.state.value, | ||
| "confidence": float(l.confidence), | ||
| "fire_count": l.fire_count, | ||
| "recurrence_days": l.recurrence_days, | ||
| } | ||
| for l in body.lessons | ||
| ] | ||
| await db.upsert("lessons", rows) | ||
| lessons_count = len(rows) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify lessons table constraints in the migration
rg -A 15 "CREATE TABLE lessons" cloud/migrations/001_initial_schema.sqlRepository: Gradata/gradata
Length of output: 746
🏁 Script executed:
# Check for any indexes or constraints defined on the lessons table
rg "lessons" cloud/migrations/001_initial_schema.sql -A 2 -B 2Repository: Gradata/gradata
Length of output: 1541
🏁 Script executed:
# Look for the db.upsert implementation to understand its behavior
fd -type f -name "*.py" | xargs rg -l "def upsert" | head -5Repository: Gradata/gradata
Length of output: 425
🏁 Script executed:
# Read the db.upsert implementation
rg -A 20 "def upsert" cloud/app/db.pyRepository: Gradata/gradata
Length of output: 954
Lessons upsert lacks a unique constraint—db.upsert will insert duplicates.
The comment says "idempotent by description," but the lessons table schema has no UNIQUE constraint on (brain_id, description) or similar. PostgREST's upsert (via Prefer: resolution=merge-duplicates) requires a unique constraint to resolve conflicts; without one, the auto-generated UUID primary key is the only conflict point, so each sync inserts new rows instead of updating existing lessons with matching brain_id and description.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/sync.py` around lines 54 - 69, The upsert into the lessons
table via db.upsert(rows) is not idempotent because Postgres needs a unique
constraint or an explicit conflict target to merge duplicates; ensure conflicts
are resolved by either adding a UNIQUE constraint on (brain_id, description) in
the DB schema/migration for the lessons table, or change the upsert call to
include an explicit conflict target/ON CONFLICT clause (so db.upsert("lessons",
rows, conflict_target=["brain_id","description"]) or equivalent) so that
db.upsert and the INSERT/merge will update existing rows rather than inserting
duplicates; reference the lessons table and the db.upsert usage in the sync
handler that constructs rows with brain_id and description.
| # Insert meta-rules | ||
| if body.meta_rules: | ||
| rows = [ | ||
| { | ||
| "brain_id": brain_id, | ||
| "title": m.title, | ||
| "description": m.description, | ||
| "source_lesson_ids": [], # Resolved server-side later | ||
| } | ||
| for m in body.meta_rules | ||
| ] | ||
| await db.insert("meta_rules", rows) | ||
| meta_rules_count = len(rows) |
There was a problem hiding this comment.
source_lesson_ids hardcoded to empty—SDK payload field ignored.
The MetaRulePayload likely includes source_lesson_ids, but the insert always sets it to []. If the SDK sends lesson references, they're discarded. If resolution is truly deferred ("Resolved server-side later"), consider adding a TODO issue or documenting the expected follow-up.
Would you like me to open an issue to track server-side resolution of source_lesson_ids?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/sync.py` around lines 88 - 100, The insert currently drops
SDK-provided lesson refs by always writing source_lesson_ids: []; update the
rows construction to populate source_lesson_ids from the incoming payload (use
m.source_lesson_ids or the appropriate field on body.meta_rules items, falling
back to [] when missing) so db.insert("meta_rules", rows) persists SDK-provided
references; keep existing brain_id/title/description handling and update
meta_rules_count accordingly, and if resolution is intentionally deferred add a
TODO comment referencing server-side resolution to make the behavior explicit.
| CREATE POLICY rule_patches_via_lesson ON rule_patches | ||
| FOR ALL USING ( | ||
| lesson_id IN ( | ||
| SELECT l.id FROM lessons l | ||
| JOIN brains b ON b.id = l.brain_id | ||
| WHERE b.user_id = auth.uid() | ||
| ) | ||
| ); |
There was a problem hiding this comment.
rule_patches policy is inconsistent with sibling tables—workspace members excluded.
The policies for corrections, lessons, meta_rules, and events all grant access to both brain owners and workspace members via a JOIN. However, rule_patches_via_lesson only checks brain ownership (b.user_id = auth.uid()), excluding team members who can view the parent lesson but not its patches.
🔧 Proposed fix to align with other policies
CREATE POLICY rule_patches_via_lesson ON rule_patches
FOR ALL USING (
lesson_id IN (
SELECT l.id FROM lessons l
JOIN brains b ON b.id = l.brain_id
WHERE b.user_id = auth.uid()
)
+ OR lesson_id IN (
+ SELECT l.id FROM lessons l
+ JOIN brains b ON b.id = l.brain_id
+ JOIN workspace_members wm ON wm.workspace_id = b.workspace_id
+ WHERE wm.user_id = auth.uid()
+ )
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CREATE POLICY rule_patches_via_lesson ON rule_patches | |
| FOR ALL USING ( | |
| lesson_id IN ( | |
| SELECT l.id FROM lessons l | |
| JOIN brains b ON b.id = l.brain_id | |
| WHERE b.user_id = auth.uid() | |
| ) | |
| ); | |
| CREATE POLICY rule_patches_via_lesson ON rule_patches | |
| FOR ALL USING ( | |
| lesson_id IN ( | |
| SELECT l.id FROM lessons l | |
| JOIN brains b ON b.id = l.brain_id | |
| WHERE b.user_id = auth.uid() | |
| ) | |
| OR lesson_id IN ( | |
| SELECT l.id FROM lessons l | |
| JOIN brains b ON b.id = l.brain_id | |
| JOIN workspace_members wm ON wm.workspace_id = b.workspace_id | |
| WHERE wm.user_id = auth.uid() | |
| ) | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/migrations/001_initial_schema.sql` around lines 199 - 206, The policy
rule_patches_via_lesson on rule_patches only allows access when b.user_id =
auth.uid(), excluding workspace members; update the USING clause to mirror the
other table policies by permitting access if the auth user is the brain owner OR
is a workspace member of that brain (e.g., add the same JOIN/EXISTS logic used
in corrections/lessons/meta_rules/events to check workspace_members for b.id and
auth.uid()), so rule_patches_via_lesson grants access to both brain owners and
workspace members for lessons-linked patches.
| BEGIN | ||
| INSERT INTO workspaces (name, owner_id) | ||
| VALUES (COALESCE(NEW.raw_user_meta_data->>'full_name', 'My Workspace'), NEW.id); | ||
|
|
||
| INSERT INTO workspace_members (workspace_id, user_id, role) | ||
| VALUES ( | ||
| (SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1), | ||
| NEW.id, | ||
| 'owner' | ||
| ); | ||
|
|
||
| INSERT INTO brains (workspace_id, user_id, name) | ||
| VALUES ( | ||
| (SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1), | ||
| NEW.id, | ||
| 'default' | ||
| ); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Trigger re-queries workspaces twice—use RETURNING or a variable instead.
The function inserts a workspace, then immediately runs two separate SELECT ... ORDER BY created_at DESC LIMIT 1 subqueries to retrieve the same workspace ID. This is fragile if concurrent inserts occur (unlikely but possible) and inefficient.
♻️ Proposed refactor using a local variable
CREATE OR REPLACE FUNCTION handle_new_user()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = public
AS $$
+DECLARE
+ new_workspace_id UUID;
BEGIN
- INSERT INTO workspaces (name, owner_id)
- VALUES (COALESCE(NEW.raw_user_meta_data->>'full_name', 'My Workspace'), NEW.id);
+ INSERT INTO workspaces (name, owner_id)
+ VALUES (COALESCE(NEW.raw_user_meta_data->>'full_name', 'My Workspace'), NEW.id)
+ RETURNING id INTO new_workspace_id;
- INSERT INTO workspace_members (workspace_id, user_id, role)
- VALUES (
- (SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1),
- NEW.id,
- 'owner'
- );
+ INSERT INTO workspace_members (workspace_id, user_id, role)
+ VALUES (new_workspace_id, NEW.id, 'owner');
- INSERT INTO brains (workspace_id, user_id, name)
- VALUES (
- (SELECT id FROM workspaces WHERE owner_id = NEW.id ORDER BY created_at DESC LIMIT 1),
- NEW.id,
- 'default'
- );
+ INSERT INTO brains (workspace_id, user_id, name)
+ VALUES (new_workspace_id, NEW.id, 'default');
RETURN NEW;
END;
$$;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/migrations/001_initial_schema.sql` around lines 218 - 234, The trigger
body currently INSERTs into workspaces and then re-queries that table twice with
SELECT ... ORDER BY created_at DESC LIMIT 1; change this to capture the inserted
workspace id directly (use INSERT ... RETURNING id INTO a local variable or
DECLARE a workspace_id and use RETURNING INTO) immediately after the INSERT INTO
workspaces (referencing the same INSERT statement in the BEGIN block), then use
that workspace_id variable for the subsequent INSERT INTO workspace_members and
INSERT INTO brains calls instead of the two SELECT subqueries to avoid race
conditions and improve efficiency.
| async def update(self, table: str, data: dict, filters: dict | None = None) -> list[dict]: | ||
| return [data] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Mock update() ignores filters, may mask bugs in filter-dependent logic.
The mock returns [data] regardless of the filters parameter. While simple, this could mask issues where the wrong filter is passed (e.g., filtering by a non-existent ID). Consider returning filtered rows similar to select() or at least storing the filters for assertion.
♻️ Suggested improvement for better test fidelity
+ def __init__(self):
+ self._responses: dict[str, dict[str, list[Any]]] = defaultdict(lambda: defaultdict(list))
+ self._inserts: list[dict] = []
+ self._updates: list[tuple[str, dict, dict | None]] = [] # (table, data, filters)
+
async def update(self, table: str, data: dict, filters: dict | None = None) -> list[dict]:
+ self._updates.append((table, data, filters))
return [data]Tests can then assert mock._updates contains expected filter values.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/tests/conftest.py` around lines 49 - 50, The mock update(self, table:
str, data: dict, filters: dict | None = None) currently ignores filters and
always returns [data]; change it to respect filters or at least record them so
tests can assert correct filter usage: implement filtering logic similar to the
mock select() (apply filters to the stored rows for the given table and return
only matching rows after applying the updates) or append each call to a list
like self._updates.append({"table": table, "data": data, "filters": filters}) so
tests can assert filter values; update the update method and any test fixtures
that rely on it (refer to update and select in conftest.py).
| with contextlib.suppress(ImportError): | ||
| pass | ||
| pass |
There was a problem hiding this comment.
Pipeline re-export is currently a no-op.
The block labeled “Re-export Pipeline type” only executes pass (Line 1150), so the advertised re-export is missing and may break import expectations.
🔧 Proposed fix
# Re-export Pipeline type
with contextlib.suppress(ImportError):
- pass
+ from gradata.contrib.patterns.pipeline import Pipeline🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/brain.py` around lines 1149 - 1150, The "Re-export Pipeline type"
block currently does nothing; replace the no-op inside the
contextlib.suppress(ImportError): block with an actual re-export of the Pipeline
type (e.g., import Pipeline from the module that defines it and bind it in this
module's namespace), and ensure the symbol is included in this module's export
list (__all__) if present; make the import happen inside the existing
contextlib.suppress(ImportError) so import failures remain tolerated (refer to
the existing contextlib.suppress usage and the symbol name Pipeline to find
where to add the import/re-export).
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
| rows = await db.select( | ||
| "brains", | ||
| columns="id,user_id,brain_name,domain,manifest,last_sync_at,created_at", | ||
| filters={"user_id": brain["user_id"]}, | ||
| ) |
There was a problem hiding this comment.
Non-existent columns queried — will 400 in production
The columns string references brain_name, domain, and manifest, but none of those columns exist in the brains table. The schema defines the column as name, with no domain or manifest columns at all. PostgREST will return a 400 Bad Request for every real call to this endpoint, even though the mocked test passes (the mock ignores the columns argument).
| rows = await db.select( | |
| "brains", | |
| columns="id,user_id,brain_name,domain,manifest,last_sync_at,created_at", | |
| filters={"user_id": brain["user_id"]}, | |
| ) | |
| rows = await db.select( | |
| "brains", | |
| columns="id,user_id,name,last_sync_at,created_at", | |
| filters={"user_id": brain["user_id"]}, | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/routes/brains.py
Line: 42-46
Comment:
**Non-existent columns queried — will 400 in production**
The `columns` string references `brain_name`, `domain`, and `manifest`, but none of those columns exist in the `brains` table. The schema defines the column as `name`, with no `domain` or `manifest` columns at all. PostgREST will return a `400 Bad Request` for every real call to this endpoint, even though the mocked test passes (the mock ignores the `columns` argument).
```suggestion
rows = await db.select(
"brains",
columns="id,user_id,name,last_sync_at,created_at",
filters={"user_id": brain["user_id"]},
)
```
How can I resolve this? If you propose a fix, please make it concise.| async def upsert(self, table: str, data: dict | list[dict]) -> list[dict]: | ||
| """UPSERT rows (INSERT ... ON CONFLICT DO UPDATE).""" | ||
| headers = {**self._headers, "Prefer": "return=representation,resolution=merge-duplicates"} | ||
| resp = await self._http.post(f"/{table}", json=data, headers=headers) | ||
| resp.raise_for_status() | ||
| return resp.json() |
There was a problem hiding this comment.
Upsert has no
on_conflict target — lessons will always be duplicated
resolution=merge-duplicates tells PostgREST to do an upsert, but without an on_conflict query parameter it defaults to the primary key (id). Since lesson rows have no pre-set id, there is never a conflict and every row is inserted fresh. Every sync() call appends duplicate lesson records — breaking the stated "idempotent by description" behaviour.
Two things are needed:
- A unique constraint on
(brain_id, description)in the migration. - Pass
on_conflictas a query parameter indb.upsert().
async def upsert(
self, table: str, data: dict | list[dict], on_conflict: str | None = None,
) -> list[dict]:
headers = {**self._headers, "Prefer": "return=representation,resolution=merge-duplicates"}
params: dict[str, str] = {}
if on_conflict:
params["on_conflict"] = on_conflict
resp = await self._http.post(f"/{table}", json=data, headers=headers, params=params)
resp.raise_for_status()
return resp.json()Then in sync.py:
await db.upsert("lessons", rows, on_conflict="brain_id,description")Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/db.py
Line: 50-55
Comment:
**Upsert has no `on_conflict` target — lessons will always be duplicated**
`resolution=merge-duplicates` tells PostgREST to do an upsert, but without an `on_conflict` query parameter it defaults to the primary key (`id`). Since lesson rows have no pre-set `id`, there is never a conflict and every row is inserted fresh. Every `sync()` call appends duplicate lesson records — breaking the stated "idempotent by description" behaviour.
Two things are needed:
1. A unique constraint on `(brain_id, description)` in the migration.
2. Pass `on_conflict` as a query parameter in `db.upsert()`.
```python
async def upsert(
self, table: str, data: dict | list[dict], on_conflict: str | None = None,
) -> list[dict]:
headers = {**self._headers, "Prefer": "return=representation,resolution=merge-duplicates"}
params: dict[str, str] = {}
if on_conflict:
params["on_conflict"] = on_conflict
resp = await self._http.post(f"/{table}", json=data, headers=headers, params=params)
resp.raise_for_status()
return resp.json()
```
Then in `sync.py`:
```python
await db.upsert("lessons", rows, on_conflict="brain_id,description")
```
How can I resolve this? If you propose a fix, please make it concise.| try: | ||
| claims = jose_jwt.decode(signed_jwt, key, algorithms=["HS256"]) | ||
| user_id = claims.get("sub") |
There was a problem hiding this comment.
jose_jwt.decode is called without an audience argument. Supabase JWTs carry aud: "authenticated", and python-jose skips audience validation when the parameter is omitted. This means any JWT signed with the same HMAC key passes regardless of which service issued it.
| try: | |
| claims = jose_jwt.decode(signed_jwt, key, algorithms=["HS256"]) | |
| user_id = claims.get("sub") | |
| claims = jose_jwt.decode( | |
| signed_jwt, key, algorithms=["HS256"], audience="authenticated" | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: cloud/app/auth.py
Line: 33-35
Comment:
**JWT `aud` claim not validated**
`jose_jwt.decode` is called without an `audience` argument. Supabase JWTs carry `aud: "authenticated"`, and python-jose skips audience validation when the parameter is omitted. This means any JWT signed with the same HMAC key passes regardless of which service issued it.
```suggestion
claims = jose_jwt.decode(
signed_jwt, key, algorithms=["HS256"], audience="authenticated"
)
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
gd_xxx) + Supabase JWT (dashboard frontend)Architecture
Test plan
Generated with Gradata
Greptile Summary
This PR introduces the full cloud backend for Gradata: a Supabase schema (8 tables, RLS policies, auto-create workspace trigger), a FastAPI server with dual auth (SDK API keys + Supabase JWT), and Railway deployment config. The architecture is clean and the overall design is solid, but there are three concrete bugs that need to be addressed before the endpoints work correctly against a real Supabase instance.
Key findings:
GET /api/v1/brainswill always 400 in production:list_brainsqueries columnsbrain_name,domain, andmanifestwhich do not exist in the schema (the column isname). The mock-based tests pass becauseMockSupabaseClient.selectignores thecolumnsargument.db.upsert()sendsPrefer: resolution=merge-duplicatesbut omits theon_conflictquery parameter. PostgREST defaults to the primary key (id), and since new lesson rows have noid, every sync inserts fresh duplicates. The migration also has no unique constraint on(brain_id, description).jose_jwt.decodeis called withoutaudience=\"authenticated\", so any JWT signed with the same HMAC key passes regardless of which service issued it.Confidence Score: 3/5
Not safe to merge — the brains list endpoint will 400 in production and every sync creates duplicate lesson rows.
Two P0/P1 bugs make two of the four API endpoints either broken or silently incorrect against a real database. The column-name mismatch in
list_brainsis straightforward to fix, as is the upsert issue (though it also requires a migration change). The JWT audience omission is a security gap. All three are concrete, verified problems.cloud/app/routes/brains.py(wrong column names),cloud/app/db.py(missingon_conflict),cloud/app/auth.py(missing audience check), andcloud/migrations/001_initial_schema.sql(missing unique constraint on lessons).Security Review
cloud/app/auth.py:33):jose_jwt.decodeomitsaudience=\"authenticated\", accepting any token signed with the Supabase HMAC key regardless of its intended purpose. Fix: passaudience=\"authenticated\"todecode().Important Files Changed
list_brainsqueries columnsbrain_name,domain,manifestwhich don't exist in the schema — PostgREST will 400 on every real call.upsertlacks anon_conflictquery param, so lessons are always inserted as new rows rather than merged — defeats idempotency.audiencevalidation; otherwise well-structured dual-auth (API key + JWT) dependency.last_sync_at; upsert idempotency depends on fixingdb.upsert.handle_new_usertrigger creates workspace + brain on signup; missing unique constraint on(brain_id, description)needed for lesson upsert.lru_cache, env prefix correctly scoped toGRADATA_.MockSupabaseClientwithmonkeypatchinjection; test env vars correctly set to non-real placeholders.Sequence Diagram
sequenceDiagram participant SDK as SDK (CloudClient) participant Dashboard as Dashboard (Next.js) participant FastAPI as FastAPI (Railway) participant Supabase as Supabase (PostgREST) SDK->>FastAPI: POST /api/v1/sync with SDK API key FastAPI->>Supabase: SELECT brains WHERE api_key matches Supabase-->>FastAPI: brain record FastAPI->>Supabase: INSERT corrections / events / meta_rules FastAPI->>Supabase: UPSERT lessons FastAPI->>Supabase: PATCH brains SET last_sync_at FastAPI-->>SDK: status ok with counts Dashboard->>FastAPI: GET /api/v1/brains with Supabase JWT FastAPI->>FastAPI: verify_jwt extracts user_id FastAPI->>Supabase: SELECT brains WHERE user_id matches Supabase-->>FastAPI: brain rows FastAPI-->>Dashboard: list of brains SDK->>FastAPI: POST /api/v1/brains/connect with SDK API key FastAPI->>Supabase: SELECT brains WHERE api_key matches Supabase-->>FastAPI: brain record FastAPI-->>SDK: brain_id and connected statusPrompt To Fix All With AI
Reviews (3): Last reviewed commit: "merge: resolve conflicts with main after..." | Re-trigger Greptile