optimization prefill dp banlance, support multimodal dp balance. by hiworldwzj · Pull Request #1271 · ModelTC/LightLLM

hiworldwzj · 2026-04-15T13:32:36Z

No description provided.

gemini-code-assist

Code Review

This pull request refactors the Data Parallel (DP) prefill balance logic by centralizing state management within the InferStateInfo class, simplifying transformer layer implementations for DeepSeek models. The changes introduce automated switching between balanced and unbalanced states for input IDs and position embeddings. However, a critical initialization bug was found in infer_struct.py where state-swapping methods access attributes before they are defined, potentially causing AttributeErrors. The review feedback recommends using safer attribute checks and adopting standard Python naming conventions for internal methods to resolve these issues.

gemini-code-assist · 2026-04-15T13:34:45Z

+    def __change_to_unbalance(self):
+        self.input_ids = self._unbalance_input_ids
+        if hasattr(self, "position_ids"):
+            self.position_ids = self._unbalance_position_ids
+        if hasattr(self, "position_cos"):
+            self.position_cos = self._unbalance_position_cos
+        if hasattr(self, "position_sin"):
+            self.position_sin = self._unbalance_position_sin
+        return
+
+    def __change_to_balance(self):
+        self.input_ids = self._balance_input_ids
+        if hasattr(self, "position_ids"):
+            self.position_ids = self._balance_position_ids
+        if hasattr(self, "position_cos"):
+            self.position_cos = self._balance_position_cos
+        if hasattr(self, "position_sin"):
+            self.position_sin = self._balance_position_sin
+        return

    def _all_to_all_balance_get(self, data: torch.Tensor):
+        self.__change_to_balance()


The current implementation of state swapping will cause a critical crash during the initialization phase.

In prepare_prefill_dp_balance, the code calls _all_to_all_balance_get (e.g., at line 236) before self._balance_input_ids is defined (at line 249). Since _all_to_all_balance_get immediately calls self.__change_to_balance(), which attempts to access self._balance_input_ids, an AttributeError will be raised.

Additionally, the hasattr checks should target the backing attributes (e.g., _balance_position_ids) rather than the public ones. Since InferStateInfo initializes public attributes like position_ids to None in its constructor, hasattr will return True, but the backing attributes will be missing if the feature wasn't used or is still being initialized, leading to further crashes.

Renaming the methods to use a single underscore is also recommended to follow standard Python conventions for internal methods.

def _change_to_unbalance(self): if hasattr(self, "_unbalance_input_ids"): self.input_ids = self._unbalance_input_ids if hasattr(self, "_unbalance_position_ids"): self.position_ids = self._unbalance_position_ids if hasattr(self, "_unbalance_position_cos"): self.position_cos = self._unbalance_position_cos if hasattr(self, "_unbalance_position_sin"): self.position_sin = self._unbalance_position_sin return def _change_to_balance(self): if hasattr(self, "_balance_input_ids"): self.input_ids = self._balance_input_ids if hasattr(self, "_balance_position_ids"): self.position_ids = self._balance_position_ids if hasattr(self, "_balance_position_cos"): self.position_cos = self._balance_position_cos if hasattr(self, "_balance_position_sin"): self.position_sin = self._balance_position_sin return def _all_to_all_balance_get(self, data: torch.Tensor): self._change_to_balance()

gemini-code-assist · 2026-04-15T13:34:45Z

        return dest_data.view(-1, *old_shape[1:])

    def _all_to_all_unbalance_get(self, data: torch.Tensor):
+        self.__change_to_unbalance()


Update the call to use the renamed internal method to maintain consistency with the suggested fix for the initialization crash.

Suggested change

self.__change_to_unbalance()

self._change_to_unbalance()

hiworldwzj added 2 commits April 15, 2026 12:03

fix

bcaae91

fix

63a0297

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

hiworldwzj added 8 commits April 15, 2026 13:46

fix

233b0b2

fix

66cfedf

fix

a88a9c3

fix

5990483

fix

ea4f034

fix

eac6f84

fix

2abcbe9

fix

830fa3b

hiworldwzj merged commit 529f9ca into main Apr 16, 2026
1 check passed

hiworldwzj deleted the wzj_fix branch April 16, 2026 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization prefill dp banlance, support multimodal dp balance.#1271

optimization prefill dp banlance, support multimodal dp balance.#1271
hiworldwzj merged 10 commits intomainfrom
wzj_fix

hiworldwzj commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hiworldwzj commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant