fix: krea FP8 + torch.compile incompatibility on H100 (#669) by livepeer-tessa · Pull Request #670 · daydreamlive/scope

livepeer-tessa · 2026-03-11T18:38:21Z

Problem

Issue #669 — Krea fails to load on remote inference after using longlive.

NotImplementedError: Float8Tensor dispatch: attempting to run unimplemented
operator/function: func=<OpOverload(op='aten.as_strided', overload='default')>

Root cause

On H100/Hopper, Krea loads with compile=True. If FP8 quantization is also active, Float8DynamicActivationFloat8WeightConfig + torch.compile(fullgraph=False) crash during warmup.

Why it happens: With fullgraph=False, graph breaks split the compiled function into sub-graphs. At the boundary between sub-graphs, AOT autograd's gen_alias_from_base calls aten.as_strided on the Float8Tensor output of a compiled linear layer to create an output alias — but as_strided is not implemented for Float8Tensor in torchao.

Why specifically after longlive: torch._dynamo.reset() is never called between pipeline switches. After longlive runs with FP8 (no compile), its Float8Tensor dispatch state persists in the global dynamo compilation cache. When Krea then loads with FP8 + compile, the residual cache state triggers the AOT autograd aliasing code path that would otherwise be avoided on a fresh worker.

Traceback path

wan2_1/blocks/denoise.py → wan2_1/components/generator.py
  → krea_realtime_video/modules/causal_model.py (torch.compiled block)
    → torch/nn/modules/linear.py: F.linear(input, self.weight, self.bias)
      → torchao Float8Tensor __torch_function__ dispatch (aten.linear)
        → aot_autograd runtime_wrappers: gen_alias_from_base
          → aliased_base_tensor.as_strided(...)  ← CRASH

Fix

1. krea_realtime_video/pipeline.py — When FP8 quantization is active, skip block.compile(). The two optimisations are currently mutually exclusive under fullgraph=False. FP8 alone still provides meaningful memory/compute savings on H100.

2. pipeline_manager.py — Call torch._dynamo.reset() on every pipeline unload to flush stale compiled graphs and Float8 dispatch state, preventing cross-pipeline cache pollution.

Testing

Reproduce with:

Load longlive with FP8 quantization on H100
Unload longlive
Load krea-realtime-video — should now succeed instead of crashing during warmup

Closes #669

Summary by CodeRabbit

Bug Fixes
- Improved FP8 quantization handling by skipping block-wide compilation when active to prevent incompatibilities.
- Fixed pipeline unloading to properly clear compilation cache, preventing stale compiled graphs from affecting subsequently loaded pipelines.

Float8DynamicActivationFloat8WeightConfig is not compatible with torch.compile(fullgraph=False). During warmup on H100 (where compile=True), AOT autograd's gen_alias_from_base calls aten.as_strided on Float8Tensor outputs, which is not implemented in torchao: NotImplementedError: Float8Tensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.as_strided', overload='default')> The crash manifests specifically after longlive (also FP8) because torch._dynamo's compile cache is never reset between pipeline switches, allowing longlive's Float8 dispatch state to persist and influence Krea's subsequent compile attempt. Two fixes: 1. krea_realtime_video/pipeline.py: when FP8 quantization is active, skip block.compile() — the two optimizations are currently mutually exclusive with fullgraph=False. FP8 alone still provides meaningful memory/compute savings on H100 without compile. 2. pipeline_manager.py: call torch._dynamo.reset() on every pipeline unload to clear stale compiled graphs and Float8 dispatch state, preventing cross-pipeline cache pollution. Fixes #669 Signed-off-by: livepeer-robot <robot@livepeer.org>

coderabbitai · 2026-03-11T18:38:40Z

📝 Walkthrough

Walkthrough

The changes implement safeguards against FP8 quantization incompatibilities by preventing block-wide compilation of attention modules when FP8 is active, and clearing the PyTorch Dynamo compilation cache upon pipeline unload to prevent stale compiled graphs from persisting across loads.

Changes

Cohort / File(s)	Summary
FP8 Quantization Guard `src/scope/core/pipelines/krea_realtime_video/pipeline.py`	Adds conditional logic to skip block compilation when FP8 quantization is active, logging a warning to prevent incompatibility issues with aten.as_strided on Float8Tensor. Compilation now only proceeds in the non-FP8 code path.
Dynamo Compilation Cache Cleanup `src/scope/server/pipeline_manager.py`	Resets the PyTorch Dynamo compilation cache in _unload_pipeline_by_id_unsafe after CUDA cleanup to prevent stale FP8-compiled graphs from leaking into subsequently loaded pipelines.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Float8 tensions and dynamo dreams—
A cleanup ensures smoother streams,
Guard rails in place, caches now clear,
Pipelines load fresh, no ghosts to fear! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR partially addresses issue `#669`: it implements fixes to prevent FP8 dispatch state from persisting (dynamo cache reset) and disables compile when FP8 is active. However, it does not include regression tests, CI guards, or improved logging/error messages as specified in the issue requirements.	Add regression tests for the longlive→krea swap case, implement CI checks to prevent reintroduction, and improve logging with actionable remediation steps as outlined in issue `#669`.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: fixing an FP8 and torch.compile incompatibility on H100, which is the core issue addressed in the PR.
Out of Scope Changes check	✅ Passed	All code changes are directly related to fixing the FP8 and torch.compile incompatibility issue. The modifications to krea_realtime_video/pipeline.py and pipeline_manager.py align precisely with the stated objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/krea-float8-compile-conflict

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-11T18:46:15Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-670--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-670--preview/ws`
Commit	`31ae08d`

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-670--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/scope/core/pipelines/krea_realtime_video/pipeline.py (1)

141-160: Add a regression for the LongLive FP8 → Krea reload path.

This guard only works in combination with the unload-side reset in src/scope/server/pipeline_manager.py, and the failure shows up only after the specific unload/reload sequence. Without an automated regression here, either half of the fix can drift and the crash will only reappear during H100 warmup.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scope/server/pipeline_manager.py`:
- Around line 698-702: The try/except around torch._dynamo.reset() currently
swallows errors and the code still publishes pipeline_unloaded; change the flow
so that if torch._dynamo.reset() raises you do NOT publish pipeline_unloaded and
instead either raise/return early from the unload path or set a persistent
worker-level flag (e.g., worker_force_no_compile = True) that subsequent load
logic will read to force compile=False; update the code paths that publish
pipeline_unloaded and the load logic that checks compile (the logic using
compile=any(... for x in ("h100","hopper"))) to respect this flag so stale
Dynamo/FP8 state cannot cause re-compilation attempts.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7aa4b080-c83b-41e5-9f74-596fcaa22c3b

📥 Commits

Reviewing files that changed from the base of the PR and between 5f6ee61 and 31ae08d.

📒 Files selected for processing (2)

src/scope/core/pipelines/krea_realtime_video/pipeline.py
src/scope/server/pipeline_manager.py

coderabbitai · 2026-03-11T18:48:50Z

src/scope/server/pipeline_manager.py

+        try:
+            torch._dynamo.reset()
+            logger.info("torch._dynamo cache reset")
+        except Exception as e:
+            logger.warning(f"torch._dynamo reset failed: {e}")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In PyTorch 2.9.1, if torch._dynamo.reset()is skipped or raises, do previously cachedtorch.compile graphs remain active for later compiles in the same worker process?

💡 Result:

Yes. In PyTorch 2.9.1, TorchDynamo stores torch.compile results in in-memory caches keyed off Python code objects (compiled bytecode + guards). Those cache entries persist for the lifetime of the worker process unless you explicitly clear them. [2], [3]

torch._dynamo.reset() is the API that clears those compile caches and restores Dynamo’s state as if you started a fresh process (it calls reset_code_caches() and clears multiple global Dynamo structures). If you skip it—or it raises before completing—then whatever caches were already created remain and can still be reused by later torch.compile calls in the same process (when the same underlying __code__ objects are encountered and guards match). [1], [2]

Sources:
[1] PyTorch v2.9.1 torch._dynamo.reset() implementation (torch/_dynamo/__init__.py)
[2] PyTorch 2.9 docs: Dynamo cache entries are stored/retrieved from a function’s __code__ object (_debug_get_cache_entry_list)
[3] PyTorch dev-discuss: each traced code object has its own cache entries (compiled bytecode + guards)

🏁 Script executed:

# Check the implementation of _unload_pipeline_by_id_unsafe around lines 698-702 sed -n '680,710p' src/scope/server/pipeline_manager.py

Repository: daydreamlive/scope

Length of output: 1304

🏁 Script executed:

# Check if there's any state tracking for the reset failure or if compile is disabled conditionally rg "compile\s*=" src/scope/server/pipeline_manager.py | head -20

Repository: daydreamlive/scope

Length of output: 90

🏁 Script executed:

# Also check what happens next when a pipeline is loaded - does it check any worker state? rg "_load_pipeline" src/scope/server/pipeline_manager.py -A 5 | head -40

Repository: daydreamlive/scope

Length of output: 1241

🏁 Script executed:

# Check if there's a compile flag or state that gets disabled on reset failure rg "torch\.compile|compile_kwargs|torch_compile" src/scope/server/pipeline_manager.py | head -20

Repository: daydreamlive/scope

Length of output: 142

🏁 Script executed:

# Get more context around the compile decision sed -n '1,50p' src/scope/server/pipeline_manager.py

Repository: daydreamlive/scope

Length of output: 1280

🏁 Script executed:

# Search for where compile is determined during pipeline load rg "compile" src/scope/server/pipeline_manager.py -B 3 -A 3 | head -60

Repository: daydreamlive/scope

Length of output: 942

🏁 Script executed:

# Check if there's any state field tracking reset failure or worker health rg "self\._.*reset|self\._.*failed|self\._.*error" src/scope/server/pipeline_manager.py | head -20

Repository: daydreamlive/scope

Length of output: 170

🏁 Script executed:

# Check if _load_pipeline_implementation has any fallback or state checking sed -n '300,400p' src/scope/server/pipeline_manager.py | grep -A 20 "_load_pipeline_implementation"

Repository: daydreamlive/scope

Length of output: 44

🏁 Script executed:

# Look for the full _load_pipeline_implementation method rg "_load_pipeline_implementation" src/scope/server/pipeline_manager.py -A 30 | head -50

Repository: daydreamlive/scope

Length of output: 2042

🏁 Script executed:

# Check if there's any exception handling around the compile call or pipeline init rg "KreaRealtimeVideoPipeline\|compile=" src/scope/server/pipeline_manager.py -B 5 -A 5

Repository: daydreamlive/scope

Length of output: 44

🏁 Script executed:

# Search for where KreaRealtimeVideoPipeline is instantiated rg "KreaRealtimeVideoPipeline" src/scope/server/pipeline_manager.py -B 2 -A 10

Repository: daydreamlive/scope

Length of output: 1127

🏁 Script executed:

# Look for the full context around compile decision sed -n '250,350p' src/scope/server/pipeline_manager.py

Repository: daydreamlive/scope

Length of output: 4177

Don't treat a Dynamo reset failure as a successful unload.

This fix depends on clearing the stale FP8/compile state here. If torch._dynamo.reset() raises, the code still reports pipeline_unloaded unconditionally, and the next krea-realtime-video load will attempt to compile with the same GPU-based decision (compile=any(...for x in ("h100", "hopper"))) while stale compile caches remain active in the worker process. This can re-enter the warmup crash path from the original FP8-to-Krea conflict.

Either fail the transition (prevent pipeline_unloaded publication on reset failure) or latch a worker-level flag to force compile=False on all subsequent loads until the worker restarts.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scope/server/pipeline_manager.py` around lines 698 - 702, The try/except around torch._dynamo.reset() currently swallows errors and the code still publishes pipeline_unloaded; change the flow so that if torch._dynamo.reset() raises you do NOT publish pipeline_unloaded and instead either raise/return early from the unload path or set a persistent worker-level flag (e.g., worker_force_no_compile = True) that subsequent load logic will read to force compile=False; update the code paths that publish pipeline_unloaded and the load logic that checks compile (the logic using compile=any(... for x in ("h100","hopper"))) to respect this flag so stale Dynamo/FP8 state cannot cause re-compilation attempts.

github-actions · 2026-03-11T18:49:43Z

✅ E2E Tests passed


Status	passed
fal App	`daydream/scope-pr-670--preview`
Run	View logs

Test Artifacts

Check the workflow run for screenshots.

…ale-cache recompile If torch._dynamo.reset() raises during pipeline unload, stale Dynamo/FP8 compile caches remain active in the worker process. Previously the code swallowed the exception and published pipeline_unloaded unconditionally, leaving the next krea-realtime-video load free to torch.compile against those stale caches — re-entering the warmup crash from the FP8→Krea conflict. Fix: set self._dynamo_reset_failed = True on reset failure. The Krea load path now checks this flag and forces compile=False for the lifetime of the worker, with a clear log warning to restart the process to re-enable compilation. Addresses CodeRabbit review comment on PR #670. Signed-off-by: livepeer-robot <robot@livepeer.org>

livepeer-tessa

Good catch @coderabbitai. Filed #671 to address this.

The fix goes with the worker-level flag approach: self._dynamo_reset_failed is set to True if torch._dynamo.reset() raises. Subsequent krea-realtime-video loads read it and force compile=False for the rest of the process lifetime, with a warning to restart. The unload itself still completes and pipeline_unloaded is still published (memory is freed — we just can't guarantee the Dynamo cache is clean).

Failing the unload entirely felt like the worse option since it'd leave the pipeline in a limbo state and block future load attempts.

livepeer-tessa requested review from emranemran and mjh1 March 11, 2026 18:45

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

livepeer-tessa mentioned this pull request Mar 11, 2026

fix: latch worker flag when torch._dynamo.reset() fails to prevent stale-cache recompile #671

Open

livepeer-tessa commented Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: krea FP8 + torch.compile incompatibility on H100 (#669)#670

fix: krea FP8 + torch.compile incompatibility on H100 (#669)#670
livepeer-tessa wants to merge 1 commit intomainfrom
fix/krea-float8-compile-conflict

livepeer-tessa commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

livepeer-tessa left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

livepeer-tessa commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Traceback path

Fix

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

github-actions bot commented Mar 11, 2026

🚀 fal.ai Preview Deployment

Testing

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 11, 2026

✅ E2E Tests passed

Test Artifacts

Uh oh!

livepeer-tessa left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

livepeer-tessa commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading