-
Notifications
You must be signed in to change notification settings - Fork 456
Open
Description
Bug Report: AssertionError in Extend Mode - Shape Mismatch
Description
The pipeline crashes with an AssertionError when using the extend mode in the Upload tab with Text2Music Parameters. The error occurs due to a shape mismatch between target_latents and x0 tensors.
Error Message
AssertionError: target_latents.shape=torch.Size([1, 8, 16, 1292]) x0.shape=torch.Size([1, 8, 16, 1528])
Full Stack Trace
2026-01-23 17:37:35.863 | INFO | acestep.pipeline_ace_step:text2music_diffusion_process:847 - cfg_type: apg, guidance_scale: 15, omega_scale: 10
Traceback (most recent call last):
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\queueing.py", line 766, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\route_utils.py", line 355, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\blocks.py", line 2152, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\blocks.py", line 1629, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
fn, *processed_input, limiter=self.limiter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\anyio\to_thread.py", line 63, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
func, args, abandon_on_cancel=abandon_on_cancel, limiter=limiter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 2502, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 986, in run
result = context.run(func, *args)
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\utils.py", line 1036, in wrapper
response = f(*args, **kwargs)
File "C:\Users\Jack\source\ACE-Step\acestep\ui\components.py", line 777, in extend_process_func
return text2music_process_func(
format.value,
...
)
File "C:\Users\Jack\source\ACE-Step\acestep\pipeline_ace_step.py", line 1627, in __call__
target_latents = self.text2music_diffusion_process(
duration=audio_duration,
...
ref_latents=ref_latents,
)
File "C:\Users\Jack\source\ACE-Step\acestep\cpu_offload.py", line 40, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Jack\source\ACE-Step\acestep\pipeline_ace_step.py", line 1050, in text2music_diffusion_process
target_latents.shape[-1] == x0.shape[-1]
AssertionError: target_latents.shape=torch.Size([1, 8, 16, 1292]) x0.shape=torch.Size([1, 8, 16, 1528])Location
- File:
acestep/pipeline_ace_step.py - Line: 1050 (in
text2music_diffusion_processmethod) - Function:
extend_process_func→text2music_process_func→text2music_diffusion_process
Steps to Reproduce
- Launch ACE-Step GUI
- Generate audio using Text2Music tab
- Go to the Upload tab under Text2Music
- Upload the generated audio
- Try to extend the audio (left or right)
- Pipeline crashes with AssertionError
Root Cause
After padding/trimming operations in extend mode, the concatenated target_latents tensor doesn't match the expected x0 shape due to:
- Rounding errors in frame_length calculations
- Trimming when exceeding
max_infer_fame_length(240 seconds) - Concatenation of tensors from different sources
Expected Behavior
The pipeline should handle shape mismatches gracefully by padding or trimming to ensure tensor compatibility.
Actual Behavior
Pipeline crashes with AssertionError, preventing audio generation in extend mode.
Environment
- OS: Windows 11
- Python: 3.11+
- ACE-Step Version: Latest (main branch)
- Mode: Extend (Upload tab)
Severity
🔴 CRITICAL - Extend mode is completely broken without a fix.
Proposed Solution
Add automatic shape alignment before the assertion:
# Fix shape mismatch between target_latents and x0
if target_latents.shape[-1] != x0.shape[-1]:
if target_latents.shape[-1] < x0.shape[-1]:
# Pad with zeros if target_latents is shorter
padding = x0.shape[-1] - target_latents.shape[-1]
target_latents = torch.nn.functional.pad(
target_latents, (0, padding), "constant", 0
)
else:
# Trim if target_latents is longer
target_latents = target_latents[..., :x0.shape[-1]]Related PR
A fix for this issue has been submitted in PR #373
Metadata
Metadata
Assignees
Labels
No labels