-
Notifications
You must be signed in to change notification settings - Fork 741
feat: grab num_experts info from model info if possible #4060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
WalkthroughThe changes add support for handling the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
benchmarks/profiler/utils/model_info.py (1)
132-146: Consider logging when num_experts cannot be determined for MoE models.The extraction logic is well-structured with appropriate attribute prioritization. However, when a model is identified as MoE but num_experts cannot be determined (remains None), this could lead to issues during profiling.
Consider adding a warning log when
config.is_moeis True butnum_expertsis None after the loop, as this helps operators understand potential profiling constraints.Example addition after line 146:
for attr in expert_attrs: if hasattr(config, attr): value = getattr(config, attr) if value is not None: num_experts = value break + + if num_experts is None: + logger.warning( + f"MoE model detected but num_experts could not be determined from config attributes: {expert_attrs}" + )Note: You'll need to add
import loggingat the top and initialize a logger if not already present.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/profiler/profile_sla.py(1 hunks)benchmarks/profiler/utils/model_info.py(1 hunks)benchmarks/profiler/utils/search_space_autogen.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.2)
benchmarks/profiler/profile_sla.py
126-126: Abstract raise to an inner function
(TRY301)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (4)
benchmarks/profiler/utils/search_space_autogen.py (2)
63-70: LGTM! Clean conditional logging implementation.The conditional string construction properly checks for num_experts presence and includes it in the log message only when available. The approach is clear and maintainable.
96-96: LGTM! Proper attribute assignment with None handling.Using
model_info.get("num_experts")correctly handles cases where num_experts isn't available, allowing downstream code to check for None and proceed accordingly.benchmarks/profiler/utils/model_info.py (1)
148-153: LGTM! Proper dictionary extension.Adding
num_expertsto the returned dictionary is straightforward and maintains consistency with the existing return structure. The None value is appropriately handled by downstream consumers.benchmarks/profiler/profile_sla.py (1)
127-131: LGTM! Helpful logging of filtering transformation.The informational log clearly communicates when and why GPU counts were filtered, showing both the original and filtered sets along with the constraint. This aids debugging and understanding of the profiling flow.
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Overview:
num_physical_experts % ep_size != 0 will throw an error; instead of num_physical_experts = 256, ep_size range = [6, 14, 22, 30], grab num_experts info from the model if possible (fall back to defaults otherwise).
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes
Chores