Skip to content

feat(assets): register output files as assets after prompt execution#12812

Draft
luke-mino-altherr wants to merge 26 commits intomasterfrom
luke-mino-altherr/register-output-assets
Draft

feat(assets): register output files as assets after prompt execution#12812
luke-mino-altherr wants to merge 26 commits intomasterfrom
luke-mino-altherr/register-output-assets

Conversation

@luke-mino-altherr
Copy link
Contributor

Summary

Register output files as assets immediately after prompt execution, replacing the need for a filesystem sweep.

Changes

  • app/assets/services/ingest.py — Add ingest_existing_file(), a public one-call wrapper that handles stat → BLAKE3 hash → MIME detection → path-based tag derivation → _ingest_file_from_path.
  • app/assets/services/__init__.py — Export ingest_existing_file.
  • main.py — Add _register_execution_outputs() that iterates history_result['outputs'], filters to type=="output" files, and calls ingest_existing_file() for each. Hooked after e.execute() while the seeder is paused.

How it works

After each prompt execution, the executor's history_result['outputs'] already contains a dict of node_id → ui_data with every file each output node wrote. We iterate that structure and register each file as an asset — no filesystem sweep needed.

  • Only registers files with type == "output" (skips temp/preview files)
  • Stores prompt_id in user_metadata for provenance
  • Runs while the asset seeder is paused, gated behind asset_seeder.is_disabled()
  • Deduplicates by BLAKE3 hash (same content → same Asset, new AssetReference)
  • Derives tags automatically from file path

luke-mino-altherr and others added 26 commits March 3, 2026 15:51
…ge migrations

- Split monolithic queries.py into modular query modules (asset, asset_reference, common, tags)
- Absorb bulk_ops.py and tags.py into query modules
- Merge migrations 0002-0005 into single migration (0002_merge_to_asset_references)
- Update models.py (merge AssetInfo/AssetCacheState into AssetReference)
- Enable SQLite foreign key enforcement
- Add comprehensive query-layer tests

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c917d-82b5-7448-a04f-9cd59c69d0a2
- Create services/ package: asset_management, bulk_ingest, file_utils, hashing, ingest, metadata_extract, path_utils, schemas, tagging
- Move business logic out of helpers.py into service modules
- Remove manager.py and hashing.py (absorbed into services)
- Add blake3 to requirements.txt
- Add comprehensive service-layer tests

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c9209-37af-757a-b6e4-af59b4267362
…andling

- Refactor routes.py to call service functions directly (no manager layer)
- Extract multipart upload parsing into upload.py
- Update API schemas
- Fix path traversal validation to return 400 instead of 500
- Rename test_tags.py to test_tags_api.py
- Update existing API-level tests

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c9209-37af-757a-b6e4-af59b4267362
- Rewrite scanner.py with two-phase scanning architecture (fast scan + enrich)
- Add AssetSeeder for non-blocking background startup scanning
- Implement pause/resume/stop/restart controls and disable/enable for --disable-assets-autoscan
- Add non-destructive asset pruning with is_missing flag
- Wire seeder into main.py and server.py lifecycle
- Skip hidden files/directories, populate mime_type, optional blake3 hashing
- Add comprehensive seeder tests

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019c9209-37af-757a-b6e4-af59b4267362
list_files_recursively now uses followlinks=True so symlinked
directories under input/ and output/ roots are traversed, matching
the existing behavior of folder_paths.recursive_search for models.

Tracks (st_dev, st_ino) pairs of visited directories to detect and
break circular symlink loops safely.

Amp-Thread-ID: https://ampcode.com/threads/T-019c9220-21b8-7678-b428-9215ff1bb011
Co-authored-by: Amp <amp@ampcode.com>
…odels_dir

get_comfy_models_folders() previously filtered by startswith(models_root),
excluding extra model paths outside the main models directory. Now includes
every category with non-empty paths from folder_names_and_paths.

Amp-Thread-ID: https://ampcode.com/threads/T-019c9224-d83c-7797-8c02-e1e1ae2ee452
Co-authored-by: Amp <amp@ampcode.com>
…ive safety

commonpath raises ValueError on Windows when comparing paths on different
drives (e.g. C:\models vs D:\extra_models). Replace all usages in the
asset scanner with Path.is_relative_to() which handles cross-drive paths,
case-insensitivity, and prefix traps natively without try/except.

Amp-Thread-ID: https://ampcode.com/threads/T-019c9224-d83c-7797-8c02-e1e1ae2ee452
Co-authored-by: Amp <amp@ampcode.com>
- Filter hidden files/directories (dot-prefixed) in collect_models_files()
  using is_visible(), matching the existing behavior for input/output roots
- Exclude the 'custom_nodes' folder name from get_comfy_models_folders();
  custom nodes that register their own paths under other folder names
  will still be scanned as expected

Amp-Thread-ID: https://ampcode.com/threads/T-019c924b-591a-725e-b8b7-0d49ba1a5591
Co-authored-by: Amp <amp@ampcode.com>
…frontend

Replace --disable-assets-autoscan with --enable-assets so the assets
system (API routes, database sync, background scanning) is off by
default and must be explicitly opted into. Expose the flag as an
"assets" entry in SERVER_FEATURE_FLAGS so the frontend can read it
from GET /features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix missing import for compute_filename_for_reference in ingest.py
- Apply code review fixes across routes, queries, scanner, seeder,
  hashing, ingest, path_utils, main, and server
- Update and add tests for sync references and seeder

Amp-Thread-ID: https://ampcode.com/threads/T-019cb61a-ed54-738c-a05f-9b5242e513f3
Co-authored-by: Amp <amp@ampcode.com>
- Reject path separators (/, \, os.sep) in tag components for defense-in-depth
- Add comment explaining double-relpath normalization trick
- Add _require_assets_feature_enabled decorator returning 503 when disabled
- Call asset_seeder.disable() when --enable-assets is not passed
- Add iter_chunks to bulk_update_needs_verify, bulk_update_is_missing,
  and delete_references_by_ids to respect SQLite bind param limits
- Fix CacheStateRow.size_bytes NULL coercion (0 -> None) to avoid
  false needs_verify flags on assets with unknown size
- Add PermissionError catch in delete_asset_tags route (403 vs 500)
- Add hash-is-None guard in delete_orphaned_seed_asset
- Validate from_asset_id in reassign_asset_references
- Initialize _prune_first in __init__, remove getattr workaround
- Cap error accumulation in _add_error to 200
- Remove confirmed dead code: seed_assets, compute_filename_for_asset,
  ALLOWED_ROOTS, AssetNotFoundError, SetTagsResult, update_enrichment_level,
  Asset.to_dict, AssetReference.to_dict, _AssetSeeder.enable

Amp-Thread-ID: https://ampcode.com/threads/T-019cb610-1b55-74b6-8dbb-381d73c387c0
Co-authored-by: Amp <amp@ampcode.com>
- Extract validate_blake3_hash() into helpers.py, used by upload, schemas, routes
- Extract get_reference_with_owner_check() into queries, used by 4 service functions
- Extract build_prefix_like_conditions() into queries/common.py, used by 3 queries
- Replace 3 inlined tag queries with get_reference_tags() calls
- Consolidate AddTagsDict/RemoveTagsDict TypedDicts into AddTagsResult/RemoveTagsResult
  dataclasses, eliminating manual field copying in tagging.py
- Make iter_row_chunks delegate to iter_chunks
- Inline trivial compute_filename_for_reference wrapper (unused session param)
- Remove mark_assets_missing_outside_prefixes pass-through in bulk_ingest.py
- Clean up unused imports (os, time, dependencies_available)
- Disable assets routes on DB init failure in main.py

Amp-Thread-ID: https://ampcode.com/threads/T-019cb649-dd4e-71ff-9a0e-ae517365207b
Co-authored-by: Amp <amp@ampcode.com>
…s DB conflicts

When two ComfyUI processes share the same database file but point to
different input/output/model directories, each process's scan marks
the other's assets as missing, causing unreliable asset visibility.
This adds an exclusive lock so the second process fails fast at startup
with a clear message to use --database-url.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Content-Disposition: drop raw filename= parameter, use only RFC 5987
  filename*=UTF-8'' to prevent header injection via ; and special chars
- delete_asset: default delete_content to False (non-destructive) when
  query parameter is omitted
- create_asset_from_hash: return 400 MISSING_INPUT instead of 404 when
  hash not found and no file uploaded (client input error, not missing resource)
- seeder: clear _progress when returning to IDLE so get_status() does not
  return stale progress after scan completion
- hashing: handle non-seekable streams in _hash_file_obj by checking
  seekable() before attempting tell/seek
- bulk_ingest: filter lost_paths to only include paths tied to actually
  inserted asset IDs, preventing inflated counts from ON CONFLICT drops

Amp-Thread-ID: https://ampcode.com/threads/T-019cb67a-9822-7438-ab05-d09991a9f7f3
Co-authored-by: Amp <amp@ampcode.com>
The previous commit acquired the exclusive lock before Alembic migrations,
but Alembic opens its own connection — which was then blocked by our lock.
Move lock acquisition to after migrations complete in a dedicated connection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add deleted_at column to AssetReference model and migration
- soft_delete_reference_by_id sets deleted_at instead of removing rows
- DELETE /api/assets/{id} defaults to soft-delete; delete_content=true
  for hard-delete
- Add deleted_at IS NULL filters to read queries, tag queries, and
  scanner queries so soft-deleted refs are invisible
- restore_references_by_paths skips soft-deleted refs
- upsert_reference clears deleted_at on explicit re-ingest
- Add tests for soft-delete API behavior, scanner persistence, bulk
  insert, enrichment exclusion, and seed asset garbage collection

Amp-Thread-ID: https://ampcode.com/threads/T-019cb6fc-c05c-761f-b855-6d5d1c9defa2
Co-authored-by: Amp <amp@ampcode.com>
- Use filelock (FileLock) instead of PRAGMA locking_mode=EXCLUSIVE to
  prevent multi-process database access. The OS automatically releases
  the lock when the process exits, even on crashes or Ctrl+C.
- Add friendly error messages for database-is-locked and general
  database init failures when --enable-assets is set.
- Exit the process instead of silently disabling assets when the user
  explicitly passed --enable-assets and the database fails.
- Add filelock to requirements.txt.

Amp-Thread-ID: https://ampcode.com/threads/T-019cbab8-50d4-748c-9669-2506575dda44
Co-authored-by: Amp <amp@ampcode.com>
…g logs

- Add debug timing logs for each fast scan sub-step (sync_root, collect_paths, build_asset_specs) and info-level total timing
- Refactor enrich_asset to accept a session parameter instead of creating one per file
- enrich_assets_batch now opens one session for the entire batch, committing after each asset to keep transactions short
- Simplify enrichment tests by removing create_session mocking

Amp-Thread-ID: https://ampcode.com/threads/T-019cbb0b-8563-7199-b628-33e3c4fe9f41
Co-authored-by: Amp <amp@ampcode.com>
- Add HashCheckpoint dataclass for saving/resuming interrupted hash computations
- compute_blake3_hash now accepts interrupt_check and checkpoint parameters
- Returns (digest, None) on completion or (None, checkpoint) on interruption
- Update ingest.py caller to handle new tuple return type

Amp-Thread-ID: https://ampcode.com/threads/T-019cbb0b-8563-7199-b628-33e3c4fe9f41
Co-authored-by: Amp <amp@ampcode.com>
Move mimetypes.init() and all custom type registrations from server.py
and metadata_extract.py into a single init_mime_types() function called
once at startup in main.py.

Amp-Thread-ID: https://ampcode.com/threads/T-019cbb2a-513a-7458-9962-b4100e4f124d
Co-authored-by: Amp <amp@ampcode.com>
…lized

- Add _init_memory_db() path using Base.metadata.create_all + StaticPool
  since Alembic migrations don't work with in-memory SQLite (each
  connection gets its own separate database)
- Call init_mime_types() at module load in metadata_extract so custom
  types like application/safetensors are always registered

Amp-Thread-ID: https://ampcode.com/threads/T-019cbb5f-13d1-7429-8cfd-815625c4d032
Co-authored-by: Amp <amp@ampcode.com>
1. Seeder pause/resume: only resume after prompt execution if pause()
   returned True, preventing undo of user-initiated pauses.

2. Missing rollback in enrich_assets_batch: add sess.rollback() in
   exception handler to prevent broken session state for subsequent
   batch operations.

3. Hash checkpoint validation: store mtime_ns/file_size in
   HashCheckpoint and re-stat on resume instead of comparing the same
   stat result to itself.

4. Scan progress preserved: save _last_progress before clearing
   _progress in finally blocks so wait=true endpoint returns final
   stats instead of zeros.

5. Download XSS hardening: block dangerous MIME types (matching
   server.py) and add X-Content-Type-Options: nosniff header to
   asset content endpoint.

Amp-Thread-ID: https://ampcode.com/threads/T-019cbb6b-e97b-776d-8c43-2de8acd0d09e
Co-authored-by: Amp <amp@ampcode.com>
Add ingest_existing_file() to services/ingest.py as a public one-call
wrapper for registering on-disk files (stat, BLAKE3 hash, MIME detection,
path-based tag derivation).

After each prompt execution in the main loop, iterate
history_result['outputs'] and register files with type 'output' as
assets. Runs while the asset seeder is paused, gated behind
asset_seeder.is_disabled(). Stores prompt_id in user_metadata for
provenance tracking.

Amp-Thread-ID: https://ampcode.com/threads/T-019cc013-1444-73c8-81d6-07cae6e5e38d
Co-authored-by: Amp <amp@ampcode.com>
ingest_existing_file() now inserts a stub record (hash=NULL) first for
instant UX visibility, then computes the BLAKE3 hash and runs the full
ingest pipeline. No compute_hash flag exposed — both phases always run.

Amp-Thread-ID: https://ampcode.com/threads/T-019cc013-1444-73c8-81d6-07cae6e5e38d
Co-authored-by: Amp <amp@ampcode.com>
ingest_existing_file() now only inserts a stub record (hash=NULL) for
instant UX visibility. After registering outputs, triggers
asset_seeder.start_enrich() to compute hashes in the background.
This avoids blocking the prompt worker thread on hash computation.

Amp-Thread-ID: https://ampcode.com/threads/T-019cc013-1444-73c8-81d6-07cae6e5e38d
Co-authored-by: Amp <amp@ampcode.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant