Skip to content

Bug: AssertionError in extend mode - Shape mismatch between target_latents and x0 #374

@iackov

Description

@iackov

Bug Report: AssertionError in Extend Mode - Shape Mismatch

Description

The pipeline crashes with an AssertionError when using the extend mode in the Upload tab with Text2Music Parameters. The error occurs due to a shape mismatch between target_latents and x0 tensors.

Error Message

AssertionError: target_latents.shape=torch.Size([1, 8, 16, 1292]) x0.shape=torch.Size([1, 8, 16, 1528])

Full Stack Trace

2026-01-23 17:37:35.863 | INFO     | acestep.pipeline_ace_step:text2music_diffusion_process:847 - cfg_type: apg, guidance_scale: 15, omega_scale: 10

Traceback (most recent call last):
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\queueing.py", line 766, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\route_utils.py", line 355, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\blocks.py", line 2152, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\blocks.py", line 1629, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 fn, *processed_input, limiter=self.limiter
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 )
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\anyio\to_thread.py", line 63, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           func, args, abandon_on_cancel=abandon_on_cancel, limiter=limiter
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           )
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 2502, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 986, in run
    result = context.run(func, *args)
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\gradio\utils.py", line 1036, in wrapper
    response = f(*args, **kwargs)
  File "C:\Users\Jack\source\ACE-Step\acestep\ui\components.py", line 777, in extend_process_func
    return text2music_process_func(
           format.value,
           ...
           )
  File "C:\Users\Jack\source\ACE-Step\acestep\pipeline_ace_step.py", line 1627, in __call__
    target_latents = self.text2music_diffusion_process(
                     duration=audio_duration,
                     ...
                     ref_latents=ref_latents,
                     )
  File "C:\Users\Jack\source\ACE-Step\acestep\cpu_offload.py", line 40, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\Jack\source\ACE-Step\venv\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Jack\source\ACE-Step\acestep\pipeline_ace_step.py", line 1050, in text2music_diffusion_process
    target_latents.shape[-1] == x0.shape[-1]
AssertionError: target_latents.shape=torch.Size([1, 8, 16, 1292]) x0.shape=torch.Size([1, 8, 16, 1528])

Location

  • File: acestep/pipeline_ace_step.py
  • Line: 1050 (in text2music_diffusion_process method)
  • Function: extend_process_functext2music_process_functext2music_diffusion_process

Steps to Reproduce

  1. Launch ACE-Step GUI
  2. Generate audio using Text2Music tab
  3. Go to the Upload tab under Text2Music
  4. Upload the generated audio
  5. Try to extend the audio (left or right)
  6. Pipeline crashes with AssertionError

Root Cause

After padding/trimming operations in extend mode, the concatenated target_latents tensor doesn't match the expected x0 shape due to:

  • Rounding errors in frame_length calculations
  • Trimming when exceeding max_infer_fame_length (240 seconds)
  • Concatenation of tensors from different sources

Expected Behavior

The pipeline should handle shape mismatches gracefully by padding or trimming to ensure tensor compatibility.

Actual Behavior

Pipeline crashes with AssertionError, preventing audio generation in extend mode.

Environment

  • OS: Windows 11
  • Python: 3.11+
  • ACE-Step Version: Latest (main branch)
  • Mode: Extend (Upload tab)

Severity

🔴 CRITICAL - Extend mode is completely broken without a fix.

Proposed Solution

Add automatic shape alignment before the assertion:

# Fix shape mismatch between target_latents and x0
if target_latents.shape[-1] != x0.shape[-1]:
    if target_latents.shape[-1] < x0.shape[-1]:
        # Pad with zeros if target_latents is shorter
        padding = x0.shape[-1] - target_latents.shape[-1]
        target_latents = torch.nn.functional.pad(
            target_latents, (0, padding), "constant", 0
        )
    else:
        # Trim if target_latents is longer
        target_latents = target_latents[..., :x0.shape[-1]]

Related PR

A fix for this issue has been submitted in PR #373

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions