feat(assets): register output files as assets after prompt execution#12812
Draft
luke-mino-altherr wants to merge 26 commits intomasterfrom
Draft
feat(assets): register output files as assets after prompt execution#12812luke-mino-altherr wants to merge 26 commits intomasterfrom
luke-mino-altherr wants to merge 26 commits intomasterfrom
Conversation
…ge migrations - Split monolithic queries.py into modular query modules (asset, asset_reference, common, tags) - Absorb bulk_ops.py and tags.py into query modules - Merge migrations 0002-0005 into single migration (0002_merge_to_asset_references) - Update models.py (merge AssetInfo/AssetCacheState into AssetReference) - Enable SQLite foreign key enforcement - Add comprehensive query-layer tests Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019c917d-82b5-7448-a04f-9cd59c69d0a2
- Create services/ package: asset_management, bulk_ingest, file_utils, hashing, ingest, metadata_extract, path_utils, schemas, tagging - Move business logic out of helpers.py into service modules - Remove manager.py and hashing.py (absorbed into services) - Add blake3 to requirements.txt - Add comprehensive service-layer tests Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019c9209-37af-757a-b6e4-af59b4267362
…andling - Refactor routes.py to call service functions directly (no manager layer) - Extract multipart upload parsing into upload.py - Update API schemas - Fix path traversal validation to return 400 instead of 500 - Rename test_tags.py to test_tags_api.py - Update existing API-level tests Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019c9209-37af-757a-b6e4-af59b4267362
- Rewrite scanner.py with two-phase scanning architecture (fast scan + enrich) - Add AssetSeeder for non-blocking background startup scanning - Implement pause/resume/stop/restart controls and disable/enable for --disable-assets-autoscan - Add non-destructive asset pruning with is_missing flag - Wire seeder into main.py and server.py lifecycle - Skip hidden files/directories, populate mime_type, optional blake3 hashing - Add comprehensive seeder tests Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019c9209-37af-757a-b6e4-af59b4267362
list_files_recursively now uses followlinks=True so symlinked directories under input/ and output/ roots are traversed, matching the existing behavior of folder_paths.recursive_search for models. Tracks (st_dev, st_ino) pairs of visited directories to detect and break circular symlink loops safely. Amp-Thread-ID: https://ampcode.com/threads/T-019c9220-21b8-7678-b428-9215ff1bb011 Co-authored-by: Amp <amp@ampcode.com>
…odels_dir get_comfy_models_folders() previously filtered by startswith(models_root), excluding extra model paths outside the main models directory. Now includes every category with non-empty paths from folder_names_and_paths. Amp-Thread-ID: https://ampcode.com/threads/T-019c9224-d83c-7797-8c02-e1e1ae2ee452 Co-authored-by: Amp <amp@ampcode.com>
…ive safety commonpath raises ValueError on Windows when comparing paths on different drives (e.g. C:\models vs D:\extra_models). Replace all usages in the asset scanner with Path.is_relative_to() which handles cross-drive paths, case-insensitivity, and prefix traps natively without try/except. Amp-Thread-ID: https://ampcode.com/threads/T-019c9224-d83c-7797-8c02-e1e1ae2ee452 Co-authored-by: Amp <amp@ampcode.com>
- Filter hidden files/directories (dot-prefixed) in collect_models_files() using is_visible(), matching the existing behavior for input/output roots - Exclude the 'custom_nodes' folder name from get_comfy_models_folders(); custom nodes that register their own paths under other folder names will still be scanned as expected Amp-Thread-ID: https://ampcode.com/threads/T-019c924b-591a-725e-b8b7-0d49ba1a5591 Co-authored-by: Amp <amp@ampcode.com>
…uring active scan Amp-Thread-ID: https://ampcode.com/threads/T-019c92af-47c7-7448-b111-4ebfbf5585e6 Co-authored-by: Amp <amp@ampcode.com>
…frontend Replace --disable-assets-autoscan with --enable-assets so the assets system (API routes, database sync, background scanning) is off by default and must be explicitly opted into. Expose the flag as an "assets" entry in SERVER_FEATURE_FLAGS so the frontend can read it from GET /features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix missing import for compute_filename_for_reference in ingest.py - Apply code review fixes across routes, queries, scanner, seeder, hashing, ingest, path_utils, main, and server - Update and add tests for sync references and seeder Amp-Thread-ID: https://ampcode.com/threads/T-019cb61a-ed54-738c-a05f-9b5242e513f3 Co-authored-by: Amp <amp@ampcode.com>
- Reject path separators (/, \, os.sep) in tag components for defense-in-depth - Add comment explaining double-relpath normalization trick - Add _require_assets_feature_enabled decorator returning 503 when disabled - Call asset_seeder.disable() when --enable-assets is not passed - Add iter_chunks to bulk_update_needs_verify, bulk_update_is_missing, and delete_references_by_ids to respect SQLite bind param limits - Fix CacheStateRow.size_bytes NULL coercion (0 -> None) to avoid false needs_verify flags on assets with unknown size - Add PermissionError catch in delete_asset_tags route (403 vs 500) - Add hash-is-None guard in delete_orphaned_seed_asset - Validate from_asset_id in reassign_asset_references - Initialize _prune_first in __init__, remove getattr workaround - Cap error accumulation in _add_error to 200 - Remove confirmed dead code: seed_assets, compute_filename_for_asset, ALLOWED_ROOTS, AssetNotFoundError, SetTagsResult, update_enrichment_level, Asset.to_dict, AssetReference.to_dict, _AssetSeeder.enable Amp-Thread-ID: https://ampcode.com/threads/T-019cb610-1b55-74b6-8dbb-381d73c387c0 Co-authored-by: Amp <amp@ampcode.com>
- Extract validate_blake3_hash() into helpers.py, used by upload, schemas, routes - Extract get_reference_with_owner_check() into queries, used by 4 service functions - Extract build_prefix_like_conditions() into queries/common.py, used by 3 queries - Replace 3 inlined tag queries with get_reference_tags() calls - Consolidate AddTagsDict/RemoveTagsDict TypedDicts into AddTagsResult/RemoveTagsResult dataclasses, eliminating manual field copying in tagging.py - Make iter_row_chunks delegate to iter_chunks - Inline trivial compute_filename_for_reference wrapper (unused session param) - Remove mark_assets_missing_outside_prefixes pass-through in bulk_ingest.py - Clean up unused imports (os, time, dependencies_available) - Disable assets routes on DB init failure in main.py Amp-Thread-ID: https://ampcode.com/threads/T-019cb649-dd4e-71ff-9a0e-ae517365207b Co-authored-by: Amp <amp@ampcode.com>
…s DB conflicts When two ComfyUI processes share the same database file but point to different input/output/model directories, each process's scan marks the other's assets as missing, causing unreliable asset visibility. This adds an exclusive lock so the second process fails fast at startup with a clear message to use --database-url. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Content-Disposition: drop raw filename= parameter, use only RFC 5987 filename*=UTF-8'' to prevent header injection via ; and special chars - delete_asset: default delete_content to False (non-destructive) when query parameter is omitted - create_asset_from_hash: return 400 MISSING_INPUT instead of 404 when hash not found and no file uploaded (client input error, not missing resource) - seeder: clear _progress when returning to IDLE so get_status() does not return stale progress after scan completion - hashing: handle non-seekable streams in _hash_file_obj by checking seekable() before attempting tell/seek - bulk_ingest: filter lost_paths to only include paths tied to actually inserted asset IDs, preventing inflated counts from ON CONFLICT drops Amp-Thread-ID: https://ampcode.com/threads/T-019cb67a-9822-7438-ab05-d09991a9f7f3 Co-authored-by: Amp <amp@ampcode.com>
The previous commit acquired the exclusive lock before Alembic migrations, but Alembic opens its own connection — which was then blocked by our lock. Move lock acquisition to after migrations complete in a dedicated connection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add deleted_at column to AssetReference model and migration
- soft_delete_reference_by_id sets deleted_at instead of removing rows
- DELETE /api/assets/{id} defaults to soft-delete; delete_content=true
for hard-delete
- Add deleted_at IS NULL filters to read queries, tag queries, and
scanner queries so soft-deleted refs are invisible
- restore_references_by_paths skips soft-deleted refs
- upsert_reference clears deleted_at on explicit re-ingest
- Add tests for soft-delete API behavior, scanner persistence, bulk
insert, enrichment exclusion, and seed asset garbage collection
Amp-Thread-ID: https://ampcode.com/threads/T-019cb6fc-c05c-761f-b855-6d5d1c9defa2
Co-authored-by: Amp <amp@ampcode.com>
- Use filelock (FileLock) instead of PRAGMA locking_mode=EXCLUSIVE to prevent multi-process database access. The OS automatically releases the lock when the process exits, even on crashes or Ctrl+C. - Add friendly error messages for database-is-locked and general database init failures when --enable-assets is set. - Exit the process instead of silently disabling assets when the user explicitly passed --enable-assets and the database fails. - Add filelock to requirements.txt. Amp-Thread-ID: https://ampcode.com/threads/T-019cbab8-50d4-748c-9669-2506575dda44 Co-authored-by: Amp <amp@ampcode.com>
…g logs - Add debug timing logs for each fast scan sub-step (sync_root, collect_paths, build_asset_specs) and info-level total timing - Refactor enrich_asset to accept a session parameter instead of creating one per file - enrich_assets_batch now opens one session for the entire batch, committing after each asset to keep transactions short - Simplify enrichment tests by removing create_session mocking Amp-Thread-ID: https://ampcode.com/threads/T-019cbb0b-8563-7199-b628-33e3c4fe9f41 Co-authored-by: Amp <amp@ampcode.com>
- Add HashCheckpoint dataclass for saving/resuming interrupted hash computations - compute_blake3_hash now accepts interrupt_check and checkpoint parameters - Returns (digest, None) on completion or (None, checkpoint) on interruption - Update ingest.py caller to handle new tuple return type Amp-Thread-ID: https://ampcode.com/threads/T-019cbb0b-8563-7199-b628-33e3c4fe9f41 Co-authored-by: Amp <amp@ampcode.com>
Move mimetypes.init() and all custom type registrations from server.py and metadata_extract.py into a single init_mime_types() function called once at startup in main.py. Amp-Thread-ID: https://ampcode.com/threads/T-019cbb2a-513a-7458-9962-b4100e4f124d Co-authored-by: Amp <amp@ampcode.com>
…lized - Add _init_memory_db() path using Base.metadata.create_all + StaticPool since Alembic migrations don't work with in-memory SQLite (each connection gets its own separate database) - Call init_mime_types() at module load in metadata_extract so custom types like application/safetensors are always registered Amp-Thread-ID: https://ampcode.com/threads/T-019cbb5f-13d1-7429-8cfd-815625c4d032 Co-authored-by: Amp <amp@ampcode.com>
1. Seeder pause/resume: only resume after prompt execution if pause() returned True, preventing undo of user-initiated pauses. 2. Missing rollback in enrich_assets_batch: add sess.rollback() in exception handler to prevent broken session state for subsequent batch operations. 3. Hash checkpoint validation: store mtime_ns/file_size in HashCheckpoint and re-stat on resume instead of comparing the same stat result to itself. 4. Scan progress preserved: save _last_progress before clearing _progress in finally blocks so wait=true endpoint returns final stats instead of zeros. 5. Download XSS hardening: block dangerous MIME types (matching server.py) and add X-Content-Type-Options: nosniff header to asset content endpoint. Amp-Thread-ID: https://ampcode.com/threads/T-019cbb6b-e97b-776d-8c43-2de8acd0d09e Co-authored-by: Amp <amp@ampcode.com>
Add ingest_existing_file() to services/ingest.py as a public one-call wrapper for registering on-disk files (stat, BLAKE3 hash, MIME detection, path-based tag derivation). After each prompt execution in the main loop, iterate history_result['outputs'] and register files with type 'output' as assets. Runs while the asset seeder is paused, gated behind asset_seeder.is_disabled(). Stores prompt_id in user_metadata for provenance tracking. Amp-Thread-ID: https://ampcode.com/threads/T-019cc013-1444-73c8-81d6-07cae6e5e38d Co-authored-by: Amp <amp@ampcode.com>
ingest_existing_file() now inserts a stub record (hash=NULL) first for instant UX visibility, then computes the BLAKE3 hash and runs the full ingest pipeline. No compute_hash flag exposed — both phases always run. Amp-Thread-ID: https://ampcode.com/threads/T-019cc013-1444-73c8-81d6-07cae6e5e38d Co-authored-by: Amp <amp@ampcode.com>
ingest_existing_file() now only inserts a stub record (hash=NULL) for instant UX visibility. After registering outputs, triggers asset_seeder.start_enrich() to compute hashes in the background. This avoids blocking the prompt worker thread on hash computation. Amp-Thread-ID: https://ampcode.com/threads/T-019cc013-1444-73c8-81d6-07cae6e5e38d Co-authored-by: Amp <amp@ampcode.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Register output files as assets immediately after prompt execution, replacing the need for a filesystem sweep.
Changes
app/assets/services/ingest.py— Addingest_existing_file(), a public one-call wrapper that handles stat → BLAKE3 hash → MIME detection → path-based tag derivation →_ingest_file_from_path.app/assets/services/__init__.py— Exportingest_existing_file.main.py— Add_register_execution_outputs()that iterateshistory_result['outputs'], filters totype=="output"files, and callsingest_existing_file()for each. Hooked aftere.execute()while the seeder is paused.How it works
After each prompt execution, the executor's
history_result['outputs']already contains a dict ofnode_id → ui_datawith every file each output node wrote. We iterate that structure and register each file as an asset — no filesystem sweep needed.type == "output"(skips temp/preview files)prompt_idinuser_metadatafor provenanceasset_seeder.is_disabled()