[OPIK-6626] [BE] feat: support multiple environments per prompt version#6873
Conversation
thiagohora
left a comment
There was a problem hiding this comment.
A few follow-ups before moving forward
| 'type', pv.type, | ||
| 'version_type', pv.version_type, | ||
| 'environment', pv.environment, | ||
| 'environments', (SELECT JSON_ARRAYAGG(pve.environment) FROM prompt_version_envs pve WHERE pve.workspace_id = pv.workspace_id AND pve.version_id = pv.id AND pve.ended_at IS NULL), |
There was a problem hiding this comment.
Can you check the query plan? I think this will run once per row
There was a problem hiding this comment.
Updated the query
There was a problem hiding this comment.
Commit d1f5296 addressed this comment by introducing CTEs that materialize prompt versions and their active environments before the final SELECT, so the environment aggregation no longer runs per row but is computed once and joined back to the version records.
There was a problem hiding this comment.
After updating the query, I checked the query plan. All CTEs are run only once, and everything seems okay.
| -> Table scan on p (cost=46 rows=435)
-> Select #2 (subquery in projection; run only once)
-> Aggregate: count(pv.id) (cost=116 rows=1)
-> Filter: (pv.version_type = 'prompt_version') (cost=79.9 rows=366)
-> Table scan on pv (cost=79.9 rows=731)
-> Select #4 (subquery in projection; run only once)
-> Limit: 1 row(s)
-> Sort: id DESC, limit input to 1 row(s) per chunk
-> Stream results (cost=46.5 rows=366)
-> Left hash join (<hash>(ve.version_id)=<hash>(pv.id)), extra conditions: (ve.version_id = pv.id) (cost=46.5 rows=366)
-> Filter: (pv.version_type = 'prompt_version') (cost=79.9 rows=366)
-> Table scan on pv (cost=79.9 rows=731)
-> Hash
-> Table scan on ve (cost=3.1..3.1 rows=1)
-> Materialize CTE ver_envs if needed (cost=0.585..0.585 rows=1)
-> Group aggregate: json_arrayagg(pve.environment) (cost=0.485 rows=1)
-> Nested loop inner join (cost=0.385 rows=1)
-> Sort: version_id (cost=0.26 rows=1)
-> Filter: (pve.ended_at is null) (cost=0.26 rows=1)
-> Index lookup on pve using uq_pve_active_env (workspace_id='default') (cost=0.26 rows=1)
-> Single-row covering index lookup on pv using PRIMARY (id=pve.version_id) (cost=1.25 rows=1)
-> Select #9 (subquery in projection; run only once)
-> Left hash join (<hash>(ve.version_id)=<hash>(pv.id)), extra conditions: (ve.version_id = pv.id) (cost=46.5 rows=366)
-> Filter: (pv.version_type = 'prompt_version') (cost=79.9 rows=366)
-> Table scan on pv (cost=79.9 rows=731)
-> Hash
-> Table scan on ve (cost=3.1..3.1 rows=1)
-> Materialize CTE ver_envs if needed (query plan printed elsewhere) (cost=0.585..0.585 rows=1)
|
thiagohora
left a comment
There was a problem hiding this comment.
Two review notes (non-blocking):
-
Tests — the new environment tests assert only
.environments()rather than the wholePromptVersion. Reuse the existinggetPromptVersionAndAssert/usingRecursiveComparisonhelpers so other fields are covered too. (inline) -
Single-owner invariant — dropping the unique index moves this guarantee from the DB to the app lock. I verified the write paths and a DB-level unique constraint via a generated
active_envcolumn is fully compatible and would restore defense-in-depth. (inline)
andrescrz
left a comment
There was a problem hiding this comment.
There's a change in the contract for environments. I agree that it's for good reasons and a good one on the long term. But it breaks backwards compatibility. Can we do that? How do you ensure that we don't break existing clients? Are our clients only internal SDKs or FE? Or do we potentially have external clients?
@andrescrz This feature is not yet in production, so the current environment is not being used by Frontend or SDK. Breaking contract is safe here. |
andrescrz
left a comment
There was a problem hiding this comment.
No concern about the backwards compatibility after your explanation.
I gave a more specific review.
| --liquibase formatted sql | ||
| --changeset boryst:000077_add_prompt_version_envs_table | ||
|
|
||
| CREATE TABLE prompt_version_envs ( |
There was a problem hiding this comment.
Given the low number of envs (100). I'm not sure if a normalised schema here is beneficial over simply storing an array of values on the version. Likely, the number isn't going to increase.
I believe this needs more investigation for a proper decision.
There was a problem hiding this comment.
The reason for a separate table is as following:
-
Ownership tracking across versions. The ended_at column is what makes the "move environment" operation possible without losing history, was discussed with product team that history might be needed later. With an array on the version row, moving env=production from v2 to v3 means patching two rows (v2.environments -= production, v3.environments += production). With the current schema you soft-delete the old ownership row and insert a new one atomically — the history is preserved.
-
DB-level uniqueness. The generated column + unique index enforces "only one active version owns this environment per prompt" at the database level. With a JSON array you can't do that — you'd be back to app-level enforcement only.
-
Indexed ownership lookups. "Which version currently owns environment X for prompt Y?" is a single indexed point lookup against prompt_version_envs WHERE environment = :env AND ended_at IS NULL. With a JSON array on the version you'd need JSON_CONTAINS which is not indexable in MySQL.
andrescrz
left a comment
There was a problem hiding this comment.
The PR still has some blocking gaps. Please take a look at the specific comments. I leave the other reviewer @thiagohora to double check the queries as result of the plans review on PromptDAO.
|
|
||
| @SqlUpdate(""" | ||
| UPDATE prompt_version_envs | ||
| SET ended_at = CURRENT_TIMESTAMP(6) |
There was a problem hiding this comment.
I believe you should track the typical update metadata: last_updated_at and last_updated_by. Any reason for not doing this in the new prompt_version_envs table?
There was a problem hiding this comment.
No this is a desired behavior. With our typical update metadata, we will not be able to track the history. It will be just updated in place.
| created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6), | ||
| created_by VARCHAR(255) NOT NULL DEFAULT '', |
There was a problem hiding this comment.
Mentioned in another comment. There are update semantics. We likely require the typical metadata for that: last_updated at and by.
There was a problem hiding this comment.
We never update an environment in a regular way
There was a problem hiding this comment.
Not a blocker on my end, but you won't capture who closed the environment.
There was a problem hiding this comment.
Actually, we will be able to capture it. Whoever created the next environment is the one who closed this. So we will be able to retrieve this info.
andrescrz
left a comment
There was a problem hiding this comment.
On my end PR looks good and can go through. I recommend @thiagohora double checks the latest PromptDAO queries, as he requested the execution plan.
| created_at TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6), | ||
| created_by VARCHAR(255) NOT NULL DEFAULT '', |
There was a problem hiding this comment.
Not a blocker on my end, but you won't capture who closed the environment.
Details
Extends the prompt version environment model from a single
environmentfield to aSet<String> environments, allowing one version to be pinned to multiple environments simultaneously.Schema changes:
prompt_version_envstable (migration000077) tracks environment-to-version mappings with soft-delete viaended_at, replacing theenvironmentcolumn onprompt_versionsAPI changes:
PromptVersion.environment→environments: Set<String>(nullable, max 100 entries, each validated againstEnvironment.NAME_PATTERN)PromptVersionEnvironmentUpdate.environment→environments: Set<String>(non-null; empty set clears all; previously nullable string mapped to clear behavior)environmentssetsService / DAO changes:
PromptService.setVersionEnvironmentacceptsSet<String>and performs: close all current env assignments for the version, close any existing ownerships for the incoming envs on the same prompt, then batch-insert the new assignmentscreateVersionWithEnvironmentchecks for conflicts viafindTakenEnvironments(set-based) instead of single-version lookup; raises409listing all conflicting env namessavePromptVersionnow also callssaveEnvironmentsin the same transactionPromptVersionobjects use a correlated subquery (JSON_ARRAYAGG) to populateenvironmentsfrom the new tableEnvironmentService.findExistingNamesadded to validate the full set of requested envs before assignmentChange checklist
Issues
Testing
Integration tests in
PromptResourceTestupdated to cover:PUT /v1/private/prompts/versions/{id}/environmentenvironmentsis non-emptyDocumentation
No public documentation changes — internal API behavior change. Callers previously passing a single
environmentstring must migrate toenvironments: [<name>](orenvironments: []to clear).