fix: dataset QA follow-ups (nil tags, counts, provider match)#909
Conversation
Default nil example tags to an empty array so manual/import adds no longer 500, include active example and version counts on dataset get, and reject generation jobs when the provider account does not match the selected model alias.
Greptile SummaryThis PR fixes three dataset QA regressions: nil tags on example upsert causing HTTP 500, missing
Confidence Score: 3/5The nil-tags fix and count-aggregation changes are safe to merge; the provider mismatch validation is incomplete because the HTTP handler was not updated. The provider mismatch validation added in StartDatasetGeneration will return HTTP 500 on mismatch instead of the intended 400, because handleDatasetGenerationError only string-matches a narrow hardcoded list of validation messages and the two new ones are absent. The manager-layer test passes, but the actual API surface will behave incorrectly for callers hitting a mismatch. backend/internal/api/datasets_generations.go — the handleDatasetGenerationError switch needs cases for the two new provider mismatch error messages (or, better, sentinel errors).
|
| Filename | Overview |
|---|---|
| backend/internal/api/datasets_generations.go | Adds provider-key and provider-account-ID validation before creating a generation job, but the HTTP error handler is not updated to map the new errors to 400 — they will return 500 in production. |
| backend/internal/api/datasets_generations_test.go | Adds a test for provider mismatch and fixes the happy-path test to include matching provider keys; tests the manager layer only, not the HTTP handler status code. |
| backend/internal/repository/datasets.go | Adds nil→empty-slice normalization for tags to fix the HTTP 500 on tagless upserts, and refactors list/get-by-id mappers to share a common detail-row mapping function. |
| backend/db/queries/datasets.sql | Rewrites GetDatasetByID to LEFT JOIN dataset_examples and dataset_versions and aggregate counts, matching the list-query pattern. |
| backend/internal/repository/sqlc/datasets.sql.go | SQLC-generated file updated to return a new GetDatasetByIDRow struct with ActiveExampleCount and VersionCount fields. |
| backend/internal/repository/sqlc/querier.go | Updates the Querier interface to reflect the new GetDatasetByID return type. |
Sequence Diagram
sequenceDiagram
participant CLI
participant Handler as startDatasetGenerationHandler
participant Manager as DatasetManager.StartDatasetGeneration
participant DB
CLI->>Handler: "POST /datasets/{id}/generate"
Handler->>Manager: StartDatasetGeneration(input)
Manager->>DB: GetProviderAccountByID
DB-->>Manager: providerAccount
Manager->>DB: GetModelAliasByID
DB-->>Manager: modelAlias
Manager->>Manager: check providerAccount.WorkspaceID
Manager->>Manager: check EqualFold(providerAccount.ProviderKey, modelAlias.CatalogProviderKey)
alt provider key mismatch
Manager-->>Handler: errors.New("provider_account provider does not match...")
Handler->>Handler: handleDatasetGenerationError (no case matches)
Handler-->>CLI: HTTP 500 (intended: 400)
else provider account ID mismatch
Manager-->>Handler: errors.New("provider_account_id does not match...")
Handler->>Handler: handleDatasetGenerationError (no case matches)
Handler-->>CLI: HTTP 500 (intended: 400)
else all checks pass
Manager->>DB: CreateDatasetGenerationJob
DB-->>Manager: job
Manager-->>Handler: job
Handler-->>CLI: HTTP 202 Accepted
end
Comments Outside Diff (1)
-
backend/internal/api/datasets_generations.go, line 257-258 (link)Provider mismatch errors fall through to HTTP 500
The two new
errors.New(...)returns added in this PR —"provider_account provider does not match model alias provider"and"provider_account_id does not match model alias provider account"— are not enumerated inhandleDatasetGenerationError. That switch's string-comparison case covers only"target_count must be between 1 and 100"and"dataset must have at least one active seed example", so both new messages hit thedefaultbranch, which callshandleDatasetError, which logs and returns HTTP 500. The PR description says these should return 400, and the test validates the manager layer but not the HTTP status code.Prompt To Fix With AI
This is a comment left during a code review. Path: backend/internal/api/datasets_generations.go Line: 257-258 Comment: **Provider mismatch errors fall through to HTTP 500** The two new `errors.New(...)` returns added in this PR — `"provider_account provider does not match model alias provider"` and `"provider_account_id does not match model alias provider account"` — are not enumerated in `handleDatasetGenerationError`. That switch's string-comparison case covers only `"target_count must be between 1 and 100"` and `"dataset must have at least one active seed example"`, so both new messages hit the `default` branch, which calls `handleDatasetError`, which logs and returns HTTP 500. The PR description says these should return 400, and the test validates the manager layer but not the HTTP status code. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
backend/internal/api/datasets_generations.go:257-258
**Provider mismatch errors fall through to HTTP 500**
The two new `errors.New(...)` returns added in this PR — `"provider_account provider does not match model alias provider"` and `"provider_account_id does not match model alias provider account"` — are not enumerated in `handleDatasetGenerationError`. That switch's string-comparison case covers only `"target_count must be between 1 and 100"` and `"dataset must have at least one active seed example"`, so both new messages hit the `default` branch, which calls `handleDatasetError`, which logs and returns HTTP 500. The PR description says these should return 400, and the test validates the manager layer but not the HTTP status code.
Reviews (1): Last reviewed commit: "fix: harden dataset example upserts and ..." | Re-trigger Greptile
Summary
[]on upsert sodataset example addwithout--tagno longer returns HTTP 500.active_example_countandversion_countonGET /datasets/{id}(same aggregation as list).Test plan
go test -short -race -count=1 ./internal/api ./internal/repositorydataset example addwithout tags,dataset viewcounts, mismatcheddataset generate→ 400