Skip to content

feat(assets): collapse nested asset path into a single slash-joined tag#13994

Open
mattmillerai wants to merge 12 commits into
masterfrom
matt/asset-tags-cloud-shape
Open

feat(assets): collapse nested asset path into a single slash-joined tag#13994
mattmillerai wants to merge 12 commits into
masterfrom
matt/asset-tags-cloud-shape

Conversation

@mattmillerai
Copy link
Copy Markdown
Contributor

@mattmillerai mattmillerai commented May 19, 2026

Currently in draft while smoke testing the round-trip.

TL;DR

Local tag emission now matches cloud's slash-joined shape (["models", "diffusers/kolors/text_encoder"] instead of one tag per directory), and nested category paths also store the standalone bucket so the FE include_tags=models,checkpoints set filter keeps matching → ["models", "checkpoints/flux", "checkpoints"]. Retrieval is now insertion-ordered via per-tag added_at stagger.

What changed

  • Shape: parent subpath collapses into a single slash-joined, lowercased tag at tag[1]. Same uniform behavior for input/output.
  • Bucket-prefix expansion: when tag[1] is <known-bucket>/<rest>, the standalone bucket is also stored. Free-form user labels with slashes whose first segment is not a registered bucket (my-org/team-a) pass through unchanged.
  • Order: retrieval is now ORDER BY added_at, tag_name (replacing the alphabetical sort) at every per-asset response path — list_references_page, fetch_reference_asset_and_tags, and get_reference_tags all agree.
  • Within-batch order: every write path (batch_insert_seed_assets, set_reference_tags, add_tags_to_reference) staggers added_at by microseconds per tag and inserts in caller order, so the expanded bucket token lands at path-tier and user-added tags reliably land after both path tags and the expansion.
  • Inverse function + upload validation: resolve_destination_from_tags and the POST /api/assets category validator both accept the slash-joined shape, the legacy one-tag-per-directory shape, and any hybrid mix.

Net result: tag[0] is always the root, tag[1] is the slash-joined category subpath (matching cloud's positional contract), and the standalone bucket token is present in the set whenever the path is nested. Positional reads (MODEL_NODE_MAPPINGS[tag[1]]) and set-membership reads (include_tags=models,checkpoints) both work without slash-aware logic on the FE.

Behavior

Input path Before (master) After (this PR)
input/foo.png ["input"] ["input"]
output/00001.png ["output"] ["output"]
models/checkpoints/flux.safetensors ["models", "checkpoints"] ["models", "checkpoints"]
models/checkpoints/flux/foo.safetensors ["models", "checkpoints", "flux"] ["models", "checkpoints/flux", "checkpoints"]
models/checkpoints/some/nested/path/deep.safetensors ["models", "checkpoints", "some", "nested", "path"] ["models", "checkpoints/some/nested/path", "checkpoints"]
models/diffusers/Kolors/text_encoder/model.safetensors ["models", "diffusers", "kolors", "text_encoder"] ["models", "diffusers/kolors/text_encoder", "diffusers"]
models/loras/my/custom/path/v0001.safetensors ["models", "loras", "my", "custom", "path"] ["models", "loras/my/custom/path", "loras"]
input/portraits/2026/foo.png ["input", "portraits", "2026"] ["input", "portraits/2026"]

Path-derived tags come from get_name_and_tags_from_asset_path; the standalone-bucket token is added at the write layer via expand_bucket_prefixes, so HTTP uploads, path-scanning ingest, and user-tag mutations all converge on the same canonical shape.

Scope notes

  • Models keep working with include_tags=models,<bucket> set filters. The standalone bucket token in the stored set means the FE combo-widget query for any registered category (checkpoints, loras, diffusers, LLM, …) matches nested files the same way it matches flat ones — addresses the regression Simon flagged in review.
  • Input/output subpaths are NOT expanded. portraits isn't a registered bucket, so input/portraits/2026/foo.png stays ["input", "portraits/2026"]. Anyone filtering input/output with include_tags=input,portraits still needs to switch to include_tags=input/portraits.
  • Case was dropped (path-derived emission). An earlier revision tried to preserve case on the subpath, which broke the ingest path: ensure_tags_exist lowercases the tags row, but bulk insert wrote the FK with original case → FK violation on any uppercase nested path. Lowercase on both sides is the consistent fix. Cloud preserves case (e.g. LLM/Qwen-VL/...); aligning that is tracked as a follow-up.
  • Pre-existing rows retain alphabetical order. Rows ingested before this PR share a single added_at per batch, so retrieval falls back to the tag-name tiebreaker. No backfill migration shipped.

Smoke test

Through the real HTTP layer (POST /api/assetsPOST /api/assets/{id}/tagsGET /api/assets/{id}):

step tags returned
upload with tags=["models", "checkpoints", "unit-tests", "smoke"] ["models", "checkpoints", "unit-tests", "smoke"]
upload with tags=["models", "checkpoints/flux"] (nested, cloud shape) ["models", "checkpoints/flux", "checkpoints"]
GET /api/assets?include_tags=models&include_tags=checkpoints against the nested upload returns the nested asset (the FE combo-widget contract case)
POST /tags with ["aaa-user-tag"] path tags first, user tag at the end
POST /tags with ["zzz-z", "favorite", "aaa-experiment"] path tags first, then user batch (microsecond stagger preserves intra-batch order)

resolve_destination_from_tags resolves all shapes to the same destination:

tags base subdirs
["models", "diffusers/kolors/text_encoder"] (new) models/diffusers ["kolors", "text_encoder"]
["models", "diffusers", "kolors", "text_encoder"] (legacy) models/diffusers ["kolors", "text_encoder"]
["models", "no_such_cat/foo"] ValueError: unknown model category

Test plan

  • TestGetNameAndTagsFromAssetPath — flat input/output/models, diffusers nested, deep user-subpath.
  • TestResolveDestinationFromTags — both tag shapes (models + input + output), hybrid, unknown category, traversal rejection.
  • TestExpandBucketPrefixes (new) — flat unchanged, nested inserts bucket, deeply-nested expands only first segment, unknown prefix passes through, idempotent, doesn't duplicate existing bucket, preserves caller order.
  • TestBucketPrefixExpansion (new, in queries/test_tags.py) — set_reference_tags, add_tags_to_reference; idempotent replay, dedupe across multiple calls, unknown prefix pass-through.
  • TestBucketPrefixExpansionOnIngest (new, in services/test_bulk_ingest.py) — batch_insert_seed_assets for both flat and nested paths; FK pre-registration for expanded tokens.
  • TestTagRetrievalOrderset_reference_tags, add_tags_to_reference, add/remove cycles; asserts insertion order survives the DB round-trip with tag names that would fail loudly under alphabetical regression.
  • test_user_tag_http_smoke.py — full HTTP round-trip, including new test_nested_checkpoint_satisfies_fe_set_filter exercising the reviewer-flagged FE filter case end-to-end.
  • Updated integration tests in test_assets_missing_sync.py, test_crud.py, test_prune_orphaned_assets.py to use the slash-joined include_tags shape.
  • Full tests-unit/assets_test/ suite: 355 passed, 10 pre-existing skipped.

The /api/assets response previously emitted one tag per parent directory
between the root category and the filename. For nested categories like
diffusers, this produced ["models", "diffusers", "Kolors", "text_encoder"]
where consumers that look up a category via tags[1] would only see the
top-level bucket name and miss the model-specific sub-path that uniquely
identifies the component.

This collapses the parent subpath into a single slash-joined tag so the
result is ["models", "diffusers/Kolors/text_encoder"]. Consumers can now
read tags[1] as a stable category identifier regardless of how deep the
file lives in the bucket. Case is preserved on the subpath so providers
keyed on the original-case path (e.g. "diffusers/Kolors/text_encoder")
resolve correctly.

Same shape applies uniformly:

- input/foo.png                              -> ["input"]
- output/00001.png                           -> ["output"]
- models/checkpoints/flux.safetensors        -> ["models", "checkpoints"]
- models/diffusers/Kolors/text_encoder/m.sft -> ["models", "diffusers/Kolors/text_encoder"]
- models/loras/my/custom/path/v1.safetensors -> ["models", "loras/my/custom/path"]

Integration tests that filtered by individual subdirectory tags
(`include_tags=unit-tests,scope`) updated to use the new slash-joined
shape (`include_tags=unit-tests/scope`). Unit tests cover flat input,
flat output, flat models, diffusers-style nested, and deep user-subpath
cases.
…se refactor

normalize_tags lowercased every tag, which would have stripped case from
the slash-joined subpath (e.g. "diffusers/Kolors/text_encoder" ->
"diffusers/kolors/text_encoder") and broken consumer lookups keyed on
the original-case path. The refactored implementation inlines a strip +
dedup so the import is no longer needed.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR changes tag handling so nested parent directories collapse into a single slash-joined parent-subpath tag and get_name_and_tags_from_asset_path returns the filename plus [root, optional lowercased parent_subpath]. resolve_destination_from_tags now accepts legacy one-tag-per-directory, slash-joined, or hybrid inputs by expanding slash-joined entries. Tag insertion now staggers AssetReferenceTag.added_at by microseconds so queries ordering by added_at (with name as tiebreaker) preserves insertion order. Multiple unit and HTTP tests were updated or added to validate extraction, resolution, API include_tags usage, and deterministic ordering.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: collapsing nested asset paths into slash-joined tags.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively explains the changes, providing detailed tables, behavior examples, scope notes, and smoke test validation directly related to the codebase modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

The /api/assets response previously sorted tags alphabetically via
.order_by(Tag.name.asc()). That breaks the structurally meaningful
"root category first, then subpath" invariant the path-collapsing
change relies on: alphabetical sort puts a custom user tag (or even
the bare "models" root) at unpredictable positions, so positional
access like tags[1] is not reliable on local.

Cloud already preserves insertion order — its Ent WithTags() eager-
load has no explicit ORDER BY, so Postgres returns rows in physical
insertion order. Local's composite primary key on
(asset_reference_id, tag_name) means SQLite walks the index in
tag_name order even without an explicit ORDER BY, so just dropping
the clause isn't enough.

Switching to ORDER BY added_at ASC, tag_name ASC keeps the path
tags inserted via set_reference_tags in their original order
(microsecond-resolution timestamps disambiguate same-batch inserts;
tag_name is a deterministic tiebreaker for the rare collision case).
Custom tags added later via add_tags_to_reference land after the
path tags in their own added_at bucket.

Applies to both response-shaping queries:
- list_references_page (GET /api/assets, tag_map join)
- fetch_reference_asset_and_tags (GET /api/assets/{id})

Catalog/histogram queries in app/assets/database/queries/tags.py
keep their alphabetical sort — those endpoints are listing all tags,
not per-asset tags, and alphabetical is the right shape there.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
app/assets/database/queries/asset_reference.py (1)

330-335: Consider indexing for the new ordering columns.

The updated ordering by (added_at, tag_name) may benefit from a composite index on AssetReferenceTag(asset_reference_id, added_at, tag_name) to avoid in-memory sorting when fetching tags for multiple references. This applies to both list_references_page and fetch_reference_asset_and_tags queries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/assets/database/queries/asset_reference.py` around lines 330 - 335, Add a
composite DB index to support the new ordering to avoid in-memory sorts: create
an index on AssetReferenceTag covering (asset_reference_id, added_at, tag_name).
Update migration/schema so queries that call
.order_by(AssetReferenceTag.added_at.asc(), AssetReferenceTag.tag_name.asc()) —
notably list_references_page and fetch_reference_asset_and_tags — can use the
index; ensure the index is named clearly, tested, and added to your migration
files so the ORM and DB will use it for those queries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@app/assets/database/queries/asset_reference.py`:
- Around line 330-335: Add a composite DB index to support the new ordering to
avoid in-memory sorts: create an index on AssetReferenceTag covering
(asset_reference_id, added_at, tag_name). Update migration/schema so queries
that call .order_by(AssetReferenceTag.added_at.asc(),
AssetReferenceTag.tag_name.asc()) — notably list_references_page and
fetch_reference_asset_and_tags — can use the index; ensure the index is named
clearly, tested, and added to your migration files so the ORM and DB will use it
for those queries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d937c06-86f0-4de0-9391-0eea8abd6113

📥 Commits

Reviewing files that changed from the base of the PR and between 7ab346f and 36f9a6f.

📒 Files selected for processing (1)
  • app/assets/database/queries/asset_reference.py

@mattmillerai mattmillerai marked this pull request as draft May 20, 2026 03:25
…er added_at

Three bugs surfaced by an end-to-end smoke test of the read+write
round-trip; all in this PR's scope.

1. FK violation on uppercase paths
   get_name_and_tags_from_asset_path was preserving case on the
   subpath (e.g. "diffusers/Kolors/text_encoder"). ensure_tags_exist
   lowercases via normalize_tags before inserting into the tags
   table, so the asset_reference_tags.tag_name FK to tags.name
   failed for any path containing uppercase letters — including
   the diffusers case the PR was designed to support.

   Fix: lowercase the slash-joined subpath in
   get_name_and_tags_from_asset_path to match the canonicalization
   ensure_tags_exist applies. Providers keyed on original-case
   subpaths need to normalize their lookup key to lowercase.

2. resolve_destination_from_tags rejected the new tag shape
   The inverse function only accepted the legacy one-tag-per-dir
   shape (["models", "diffusers", "Kolors", "text_encoder"]).
   An upload using the slash-joined shape returned by /api/assets
   raised "unknown model category" or "invalid path component".

   Fix: pre-split every entry after tags[0] on "/" so both shapes
   resolve identically. For models, the first expanded segment is
   the category and the rest are subdirs; for input/output the
   full expansion becomes the subdirs.

3. Within-batch tag order was lost
   bulk_ingest wrote every tag in a single batch with the same
   added_at = current_time. The retrieval ORDER BY added_at, tag_name
   then fell back to the tag_name tiebreaker, sorting the path-derived
   pair alphabetically — putting "checkpoints/..." ahead of "models"
   since "c" < "m". The tags[0] = root contract was lost on bulk-
   ingested rows.

   Fix: stagger added_at by microseconds per tag index within a
   reference so the retrieval order matches the input list order.
   Path-derived tags now consistently land in position-0 = root,
   position-1 = subpath.

Tests
- TestGetNameAndTagsFromAssetPath updated: subpath is now lowercase.
- New TestResolveDestinationFromTags covers both tag shapes, the
  unknown-category case for slash-joined input, traversal rejection,
  and input/output paths.
- Full suite: 333 passed, 10 pre-existing skipped.
Cursor-reviews follow-up on PR #13994:

1. set_reference_tags / add_tags_to_reference now apply the same
   microsecond stagger as batch_insert_seed_assets. Per-tag get_utc_now()
   calls can collide at microsecond resolution on fast machines, dropping
   retrieval to the tag_name alphabetical tiebreaker. Using a single
   base_ts + timedelta(microseconds=i) preserves insertion order for any
   batch.

2. Docstring on get_name_and_tags_from_asset_path corrected: only the
   subpath is lowercased in code; the root category is lowercase by
   construction in get_asset_category_and_relative_path.

3. resolve_destination_from_tags docstring now states explicitly that
   hybrid shapes (mix of legacy multi-tag + new slash-joined within a
   single call) are accepted and resolve to the same destination.

4. New TestTagRetrievalOrder class in test_asset_info.py exercises the
   public write paths (set_reference_tags, add_tags_to_reference,
   remove_tags_from_reference) and asserts the public read paths
   (list_references_page, fetch_reference_asset_and_tags) return tags
   in insertion order rather than alphabetical. Tag names are chosen
   to fail loudly under alphabetical regression — "checkpoints" sorts
   before "models", "aaa-user-tag" sorts before every path tag, etc.

Full assets suite: 338 passed, 10 pre-existing skipped.
@mattmillerai mattmillerai marked this pull request as ready for review May 20, 2026 04:07
…esponse helper

Smoke test through the real HTTP upload + tag-add path exposed two
ordering bugs the unit-layer tests missed:

1. add_tags_to_reference did `to_add = sorted(want - current)` — an
   alphabetical pre-sort defeating the microsecond-stagger fix from the
   previous commit. The stagger was encoding alphabetical positions,
   not the caller's insertion order. Fix: build to_add by walking the
   already-normalized caller list and filtering against the current
   set, so the staggered added_at timestamps reflect what the caller
   actually requested.

2. get_reference_tags used .order_by(tag_name.asc()) — alphabetical.
   It's called by the upload response path; meanwhile
   list_references_page and fetch_reference_asset_and_tags were already
   updated to order by added_at. The mismatch meant POST /api/assets
   returned tags in alphabetical order but a subsequent GET returned
   them in insertion order. Fix: order get_reference_tags by added_at
   too, so all three response-path helpers agree.

New tests-unit/assets_test/test_user_tag_http_smoke.py exercises the
full HTTP layer: POST /api/assets to upload, POST /api/assets/{id}/tags
to add a user tag (using tag names like "aaa-user-tag" that would jump
to position 0 under alphabetical), GET /api/assets/{id} to verify
ordering. Catches the bugs above in CI going forward.

Full assets suite: 340 passed, 10 pre-existing skipped.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/assets/database/queries/tags.py (1)

160-173: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve caller order before timestamp staggering

Line 157 builds to_add via sorted(want - current), so the new staggered added_at on Lines 160-173 preserves alphabetical order, not input batch order. If input-order retention is intended, build to_add from norm while filtering membership.

💡 Proposed fix
-    want = set(norm)
-    to_add = sorted(want - current)
+    want = set(norm)
+    to_add = [t for t in norm if t not in current]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/assets/database/queries/tags.py` around lines 160 - 173, The current
construction of to_add (sorted(want - current)) causes the staggered added_at
timestamps in AssetReferenceTag (in set_reference_tags) to reflect alphabetical
order instead of the caller's input order; change the to_add creation to iterate
over norm (the normalized input batch) and filter out tags already in current so
the list preserves caller order, then use that to compute base_ts via
get_utc_now() and the per-item microsecond stagger for added_at when building
AssetReferenceTag entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@app/assets/database/queries/tags.py`:
- Around line 160-173: The current construction of to_add (sorted(want -
current)) causes the staggered added_at timestamps in AssetReferenceTag (in
set_reference_tags) to reflect alphabetical order instead of the caller's input
order; change the to_add creation to iterate over norm (the normalized input
batch) and filter out tags already in current so the list preserves caller
order, then use that to compute base_ts via get_utc_now() and the per-item
microsecond stagger for added_at when building AssetReferenceTag entries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2fb774e6-838d-4a04-a1fa-f6ca7c910d3f

📥 Commits

Reviewing files that changed from the base of the PR and between 36f9a6f and 7ff001d.

📒 Files selected for processing (5)
  • app/assets/database/queries/tags.py
  • app/assets/services/bulk_ingest.py
  • app/assets/services/path_utils.py
  • tests-unit/assets_test/queries/test_asset_info.py
  • tests-unit/assets_test/services/test_path_utils.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests-unit/assets_test/test_user_tag_http_smoke.py (1)

66-89: ⚡ Quick win

Consider fetching initial tags to avoid hardcoding the path-tag count.

Line 85 assumes exactly 4 path tags by computing len({"models", "checkpoints", "unit-tests", "smoke"}). This works if the uploaded tags aren't collapsed, but it's fragile—if the tag-derivation logic changes or if explicit tags are handled differently in the future, the test could break silently or produce confusing failures.

♻️ More robust approach

Fetch the initial tags before adding user tags, similar to the first test:

 def test_user_tag_batch_lands_after_path_tags_via_http(
     http: requests.Session, api_base: str, smoke_asset: dict
 ):
     ref_id = smoke_asset["id"]
+    initial_tags = _fetch_asset_tags(http, api_base, ref_id)
 
     # Add three user tags in a single request, in non-alphabetical input
     # order. They should all land after the path tags (microsecond stagger
     # in set_reference_tags / add_tags_to_reference is what makes this
     # work — without it, "aaa" would jump to position 0).
     r = http.post(
         f"{api_base}/api/assets/{ref_id}/tags",
         json={"tags": ["zzz-z", "favorite", "aaa-experiment"]},
         timeout=30,
     )
     assert r.status_code in (200, 201), r.text
 
     tags_after = _fetch_asset_tags(http, api_base, ref_id)
     assert tags_after[0] == "models"
     assert tags_after[1] == "checkpoints"
-    user_tail = tags_after[len({"models", "checkpoints", "unit-tests", "smoke"}):]
+    user_tail = tags_after[len(initial_tags):]
     assert set(user_tail) >= {"zzz-z", "favorite", "aaa-experiment"}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests-unit/assets_test/test_user_tag_http_smoke.py` around lines 66 - 89, The
test test_user_tag_batch_lands_after_path_tags_via_http hardcodes the number of
path tags via len({"models", "checkpoints", "unit-tests", "smoke"}) which is
fragile; instead call _fetch_asset_tags(http, api_base, ref_id) to capture the
initial tags before posting user tags, record that initial length (or the set of
path tags), then POST the new tags and compute user_tail using that initial
length (or by subtracting the initial set) so assertions (including
tags_after.index checks) compare against the real pre-existing path tag
positions rather than a hardcoded count.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests-unit/assets_test/test_user_tag_http_smoke.py`:
- Around line 66-89: The test test_user_tag_batch_lands_after_path_tags_via_http
hardcodes the number of path tags via len({"models", "checkpoints",
"unit-tests", "smoke"}) which is fragile; instead call _fetch_asset_tags(http,
api_base, ref_id) to capture the initial tags before posting user tags, record
that initial length (or the set of path tags), then POST the new tags and
compute user_tail using that initial length (or by subtracting the initial set)
so assertions (including tags_after.index checks) compare against the real
pre-existing path tag positions rather than a hardcoded count.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ef1b149b-048f-433a-b0ee-52fd1cc33e5d

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff001d and 00940fb.

📒 Files selected for processing (2)
  • app/assets/database/queries/tags.py
  • tests-unit/assets_test/test_user_tag_http_smoke.py

Copy link
Copy Markdown

@synap5e synap5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR breaks the asset tag contract required by the frontend.
The tags should include the root asset type and the model directory key - not the model’s subpath or path fragments.

Breaking example

The frontend asset-backed model combo widgets query model asset widgets query assets by exact tag membership, e.g. include_tags=models,checkpoints for CheckpointLoaderSimple. That means a checkpoint asset must have both the literal models tag and the literal model-directory/category tag checkpoints.

This PR changes scanned model tags so subfolders become part of the category tag:

models/checkpoints/flux/foo.safetensors
→ tags: ["models", "checkpoints/flux"]

That asset will no longer match the FE query

include_tags=models,checkpoints

Observed here in cloud:
Image

Additionally, the code lowercases path-derived category tags. That breaks mixed-case canonical model directory keys used by the frontend mapping, for example:

CogVideo/GGUF
FlashVSR
SEEDVR2
LLM/Qwen-VL/...

These become lowercased tags such as cogvideo/gguf, which will not match the FE’s MODEL_NODE_MAPPINGS keys when creating nodes from assets.

Path-derived tags for nested model layouts (e.g.
models/checkpoints/flux/foo.safetensors) emitted only the slash-joined
shape `["models", "checkpoints/flux"]`, which broke the frontend
combo-widget set-membership filter `include_tags=models,checkpoints` —
the literal `checkpoints` token was no longer present in the asset's
tag set.

Add `expand_bucket_prefixes` at the tag-write layer. When a tag's first
slash segment is a registered model category (or input/output/temp
root), the bucket is inserted as a standalone token immediately after
the slash-joined form. This preserves tag[1] as the slash-joined
positional contract cloud emits while restoring the set-membership
token the frontend filter requires.

The expansion is bounded to known buckets so free-form user labels
with slashes (`my-org/team-a`) pass through unchanged. The helper is
applied uniformly in `set_reference_tags`, `add_tags_to_reference`,
and `batch_insert_seed_assets` so HTTP uploads, user-tag mutations,
and path-scanning ingest all converge on the same canonical shape.

Also align the upload-route category validator with
`resolve_destination_from_tags` by extracting the first slash segment
of tag[1], so HTTP uploads matching cloud's slash-joined emission
shape are no longer rejected as `unknown models category`.
…k collisions

The per-tag microsecond stagger preserves intra-batch order, but two
back-to-back write batches on the same reference (e.g.
set_reference_tags for path tags, then add_tags_to_reference for user
tags) call get_utc_now() independently. On Windows the system clock can
return the same datetime for both calls if no OS tick elapsed between
the commits — both batches end up sharing microseconds and
ORDER BY added_at, tag_name falls back to the alphabetic tiebreaker,
sorting user tags ahead of path tags they were meant to follow.

Add _next_added_at_base(reference_id) that reads max(existing added_at)
and returns max(existing + 1us, get_utc_now()), guaranteeing the new
batch sorts strictly after anything previously written for that
reference. Used by set_reference_tags and add_tags_to_reference;
batch_insert_seed_assets stays on raw get_utc_now() since seed inserts
are always the first writes for a new reference.

The accompanying regression test pins get_utc_now() to a frozen value
so the previously-Windows-only race becomes a platform-independent
failure mode under test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants