-
Notifications
You must be signed in to change notification settings - Fork 110
Update benchmark_inference.py to support TP with thunderfx #2625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…g-AI/lightning-thunder into dtensor-tp-colwise-test
…g-AI/lightning-thunder into enable-moe-tp-thunderfx
| if isinstance(offsets, DTensor): | ||
| assert offsets.placements == (Replicate(),) | ||
| offsets = offsets.to_local() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this we will see -
[rank0]: File "/usr/local/lib/python3.12/dist-packages/torch/distributed/tensor/_sharding_prop.py", line 539, in propagate_op_sharding_non_cached
[rank0]: raise NotImplementedError(
[rank0]: NotImplementedError: Operator aten.unbind.int does not have a sharding strategy registered.
due to for offset in offsets (which calls unbind on tensor).
|
|
||
| group_sizes = _group_sizes_from_offsets(offsets) | ||
| group_outs = [] | ||
| for group_a, group_b in zip(a.split(group_sizes), b.unbind()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NotImplementedError: Operator aten.unbind.int does not have a sharding strategy registered.
| return self.get_next_token(input_ids, past_key_values) | ||
|
|
||
| @torch.inference_mode() | ||
| # TODO: Running `torchrun --nproc-per-node 2 thunder/benchmarks/benchmark_inference.py --input-length 32 --output-length 32 --mode eager --num-iterations 10` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, wouldn't tensor parallel work with inference_mode, but single device works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this only failed with TP and worked fine for single device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. and torch.compile and thunderfx also fail, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for thunderfx + TP.
torch.compile + TP fails with the following error irrespective of torch.inference_mode.
[rank1]: File "/tmp/torchinductor_root/6d/c6dlxelqgtfg5yx3w6xwsyntdcsyogve3v5sxqwhd7t7b67b4rn6.py", line 1356, in call
[rank1]: assert_size_stride(buf10, (1, 32, 64), (2048, 64, 1), 'torch.ops.aten.polar.default')
[rank1]: AssertionError: expected size 32==32, stride 1==64 at dim=1; expected size 64==64, stride 32==1 at dim=2
[rank1]: Error in op: torch.ops.aten.polar.defaultThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood. that sounds more convoluted than I thought 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet. Will file one today, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you created an issue to the pytorch repo about the inference_mode problem?
I wasn't able to repro the error for torch.compile + TP on the latest container.
As for the RuntimeError: Cannot set version_counter for inference tensor when running with eager, I have a feeling that we are probably missing something, will take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for revisiting this!
t-vi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kshitij12345 @crcrpar
Depends on #2611 (requires the changes from #2611 for thunderfx path to work).
Eager (2 layer model on RTX6000 Ada)
CMD:
torchrun --nproc-per-node 2 thunder/benchmarks/benchmark_inference.py --input-length 32 --output-length 32 --mode eager --num-iterations 10thunderfx (2 layer model on RTX6000 Ada)
CMD:
torchrun --nproc-per-node 2 thunder/benchmarks/benchmark_inference.py --input-length 32 --output-length 32 --mode thunder --num-iterations 10thunderfx and nv_enable_linear=True (2 layer model on RTX6000 Ada)
CMD:
torchrun --local-ranks-filter 0 --nproc-per-node 2 thunder/benchmarks/benchmark_inference.py --input-length 32 --output-length 32 --mode thunder --num-iterations 10 --enable-nv-linear