Skip to content

support longcat-image block offload with 2 mgr#977

Merged
gushiqiao merged 2 commits intoModelTC:mainfrom
Lubenwei-nb123:feat/longcat_img_blk_offload
Apr 2, 2026
Merged

support longcat-image block offload with 2 mgr#977
gushiqiao merged 2 commits intoModelTC:mainfrom
Lubenwei-nb123:feat/longcat_img_blk_offload

Conversation

@Lubenwei-nb123
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CPU offloading capabilities for the LongCat Image model, specifically implementing block-level offloading. A new LongCatImageOffloadTransformerInfer class is added to manage asynchronous weight prefetching and double-buffering for transformer blocks. The LongCatImageTransformerInfer class is refactored to use a dispatching infer_func, and the main LongCatImageTransformerModel is updated to conditionally use the offload-enabled infer class and manage model/block movement between CPU and GPU. Weight classes (LongCatImageDoubleBlockWeights, LongCatImageSingleBlockWeights) are modified to support creating dedicated CUDA buffers for offloading. A new configuration file and a shell script are included to enable and demonstrate this feature. Feedback points out an unused self.block_idx attribute and a potential device mismatch error if an unsupported offload_granularity is configured.

current_stream = torch_device_module.current_stream()
self.offload_manager_double.compute_stream.wait_stream(current_stream)
for block_idx in range(len(blocks.double_blocks)):
self.block_idx = block_idx
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The attribute self.block_idx is set here but does not appear to be used anywhere within this class or its parent LongCatImageTransformerInfer. If it's not required for external hooks or profiling, it should be removed to avoid confusion.

Comment on lines +39 to +42
if self.cpu_offload and self.offload_granularity == "block":
self.transformer_infer_class = LongCatImageOffloadTransformerInfer
else:
self.transformer_infer_class = LongCatImageTransformerInfer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic here only handles offload_granularity == "block". If cpu_offload is enabled but offload_granularity is set to something else (e.g., "phase"), it falls back to the base LongCatImageTransformerInfer. However, the infer method (lines 92-98) only handles "model" and "block" granularities. If a different granularity is provided, the weights will remain on CPU while computation is attempted on GPU, leading to a device mismatch error. Consider adding a check or defaulting to a supported mode.

for block_idx in range(len(blocks.double_blocks)):
self.block_idx = block_idx

if self.offload_manager_double.need_init_first_buffer:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看了下,像是这里,如果每个step的block id=0的时候,都初始化下buffer,那么上面的wait_stream就不用加了,结果也对。感觉可能是上一个step结束的时候,某个step开始之前,swap_blocks在没有完成?所以需要wait下。不过我感觉不是很影响速度,可以先merge

@gushiqiao
Copy link
Copy Markdown
Contributor

pip install ruff pre-commit
pre-commit run --all-files
解决下ci

@Lubenwei-nb123 Lubenwei-nb123 force-pushed the feat/longcat_img_blk_offload branch from e3bcba7 to 5c52346 Compare April 2, 2026 09:51
@Lubenwei-nb123
Copy link
Copy Markdown
Contributor Author

pip install ruff pre-commit pre-commit run --all-files 解决下ci

done

@gushiqiao gushiqiao merged commit f4c5184 into ModelTC:main Apr 2, 2026
@Lubenwei-nb123 Lubenwei-nb123 deleted the feat/longcat_img_blk_offload branch April 2, 2026 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants