Adding a mini benchmark for cross entropy loss #4472

protonu · 2025-05-16T23:45:54Z

This add a benchmark for cross entropy loss where we can vary the vocab size.
There are no linear operators before cross entropy.
The target/labels are pad/sliced to mimic other models we have seen. The input is cast to FP32 and squeeze also to mimic other Hugging Face models we have seen.

github-actions · 2025-05-16T23:46:41Z

Review updated until commit 0f0c72e

Description

Added a mini benchmark for cross entropy loss
Introduced SyntheticMiniModel class for benchmarking
Expanded vocab size variations for benchmarking
Included cast and squeeze operations in the model

Changes walkthrough 📝

Relevant files

Enhancement

core.py `Improve type hints and IO bytes computation` benchmarks/python/core.py Added type hint for `tensor_props` parameter Modified `compute_total_iobytes` to handle tensor inputs	+4/-3
cross_entropy_loss.py `Add synthetic model for cross entropy loss benchmark` benchmarks/python/cross_entropy_loss.py Introduced `SyntheticMiniModel` class Added `mini_model`, `inputs`, `grads`, and `generate_vocab_sizes` methods	+66/-0

Tests

test_cross_entropy_loss.py `Add tests for synthetic cross entropy loss benchmark` benchmarks/python/test_cross_entropy_loss.py Imported `SyntheticMiniModel` Added `test_cross_entropy_mini_benchmark_fwd` and `test_cross_entropy_mini_benchmark_bwd` tests	+46/-1

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Possible Issue

The conditions in lines 158 and 161 seem incorrect. The or condition should likely be and to ensure that inputs and outputs are not torch.Tensor instances.

if not isinstance(inputs, Iterable) or isinstance(inputs, torch.Tensor):
    inputs = [inputs]
if not isinstance(outputs, Iterable) or isinstance(outputs, torch.Tensor):
    outputs = [outputs]

Performance Concern

The generate_vocab_sizes method generates a large number of vocab sizes, which could lead to a significant increase in benchmarking time. Consider optimizing the number of vocab sizes or adding a mechanism to limit the number of benchmarks.

@staticmethod
def generate_vocab_sizes():
    sizes_from_models = [
        49152,  # Starcoder
        129280,  # DeepSeek-R1
        128256,  # Llama3
        202048,  # Llama4
        256000,  # Gemma2
        131072,  # Mistral
        152064,  # Qwen2
        32064,  # Phi3.5
        100352,  # Phi4
        50264,  # GPT-2
    ]

    powers_of_2 = [2**i * 1024 for i in range(4, 9)]

    combined_set = sorted(set(sizes_from_models) | set(powers_of_2))

    # for each vocab size in the set we increment in steps 64 in +/- 5 directions
    # which gives the total number of vocab sizes to benchmark
    variations = set()
    step = 64
    for num in combined_set:
        for i in range(1, 6):
            variations.add(num + (i * step))
            variations.add(num - (i * step))

        variations.add(num)

    return sorted(variations)

Hardcoded Batch Size

The batch size is hardcoded to 4096 in both test_cross_entropy_mini_benchmark_fwd and test_cross_entropy_mini_benchmark_bwd. Consider making this a parameter or configurable value to allow for more flexible testing.

    batch_size = 4096

    def fwd_call(inp):
        return SyntheticMiniModel.mini_model(*inp)

    inputs = SyntheticMiniModel.inputs(int(batch_size), int(vocab_size))

    fwd_fn = with_executor(executor, fwd_call)
    run_benchmark(benchmark, fwd_fn, inputs)


@pytest.mark.parametrize(
    "executor", ["eager", "torchcompile", "thunder", "thunder-torchcompile"]
)
@pytest.mark.parametrize("vocab_size", SyntheticMiniModel.generate_vocab_sizes())
def test_cross_entropy_mini_benchmark_bwd(benchmark, executor: str, vocab_size: int):
    if executor == "torchcompile":
        clear_dynamo_cache()

    # picking a value that doesn't OOM for large vocab sizes
    batch_size = 4096

jjsjann123 · 2025-05-20T22:48:49Z

benchmarks/python/core.py

-                if isinstance(out, torch.Tensor):
-                    iobytes += out.element_size() * out.numel()
+
+            if isinstance(outputs, torch.Tensor) and outputs.dim() == 0:


in line 161 we are wrapping a single tensor output as outputs = [outputs].
So the check here would always be false?

I am not sure if that's quite right.
'''
a = torch.tensor(5)
print(isinstance(a, Iterable))
'''
This should be true.
So outputs could be a single torch tensor and not get wrapped. Please let me know if I am mistaken.

Oh you are right and I didn't know that!

@Priya2698 so it looks like line 158 - 161 isn't doing the right thing?

benchmarks/python/core.py

benchmarks/python/cross_entropy_loss.py

benchmarks/python/test_cross_entropy_loss.py

jjsjann123

LGTM

…ropy_new_benchmark

protonu · 2025-06-03T17:18:19Z

!test

crcrpar · 2025-06-04T08:47:56Z

python/nvfuser_direct/_C_DIRECT.cpython-312-x86_64-linux-gnu.so

Would this dynamic library seriously need to be in this directory?

Thanks for catching that! That was a mistake

This add a benchmark for cross entropy loss where we can vary the vocab size. There are no linear operators before cross entropy. The target/labels are pad/sliced to mimic other models we have seen. The input is cast to FP32 and squeeze also to mimic other Hugging Face models we have seen.

protonu added 11 commits May 8, 2025 14:41

adding a preseg pass to move take_along_axis ahead of some unary ops:

2400fc1

clean up

65f4ec8

minor fix

f0abf0a

addressing reviewer comments

c325a8e

adding comments

1f5ffbc

more comments

025c59c

more comments

012921f

Merge branch 'main' into pbasu_preseg_move_take_along_axis

f69ef56

replay transforms

59c7838

modifying a replay

607135f

comments and used vars

a624ee7

protonu force-pushed the pbasu_cross_entropy_new_benchmark branch from 9db2924 to 8364a62 Compare May 16, 2025 23:47

adding a mini benchmark for cross entropy loss

beca294

protonu force-pushed the pbasu_cross_entropy_new_benchmark branch from 8364a62 to beca294 Compare May 16, 2025 23:52

protonu added 5 commits May 20, 2025 09:50

Merge branch 'main' into pbasu_preseg_move_take_along_axis

6bed773

increasing the span of vocab sizes

8f3e87d

removing extra file

78b1932

adding a cast and squeeze to the model

c97246a

modify batch size

77fd131

protonu changed the title ~~adding a mini benchmark for cross entropy loss~~ Adding a mini benchmark for cross entropy loss May 20, 2025

protonu requested review from jjsjann123, kevinstephano and xwang233 May 20, 2025 20:40

protonu marked this pull request as ready for review May 20, 2025 20:45

minor edit

a1bc629

jjsjann123 reviewed May 20, 2025

View reviewed changes

protonu added 2 commits May 21, 2025 07:12

address reviewer comments

f08208f

address reviewer comments

f455c27

protonu requested a review from jjsjann123 May 21, 2025 14:53

address reviewer comment

f255057

protonu mentioned this pull request May 21, 2025

Add custom decompositions for cross entropy loss for the nvfuser executor Lightning-AI/lightning-thunder#2043

Merged

jjsjann123 approved these changes May 29, 2025

View reviewed changes

protonu added 6 commits May 29, 2025 15:56

Merge branch 'main' into pbasu_preseg_move_take_along_axis

b02b111

minor edits

6d9a57c

cleanup

e03928d

Merge branch 'main' into pbasu_preseg_move_take_along_axis

4cdef52

Merge branch 'main' into pbasu_preseg_move_take_along_axis

a973812

Merge branch 'pbasu_preseg_move_take_along_axis' into pbasu_cross_ent…

e773ebb

…ropy_new_benchmark

Base automatically changed from pbasu_preseg_move_take_along_axis to main June 3, 2025 17:17

Merge branch 'main' into pbasu_cross_entropy_new_benchmark

0f0c72e

protonu merged commit 9f28e23 into main Jun 3, 2025
51 of 53 checks passed

protonu deleted the pbasu_cross_entropy_new_benchmark branch June 3, 2025 19:55

crcrpar reviewed Jun 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding a mini benchmark for cross entropy loss #4472

Adding a mini benchmark for cross entropy loss #4472

Uh oh!

protonu commented May 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 16, 2025 •

edited

Loading

Uh oh!

jjsjann123 May 20, 2025

Uh oh!

protonu May 21, 2025

Uh oh!

jjsjann123 May 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjsjann123 left a comment

Uh oh!

protonu commented Jun 3, 2025

Uh oh!

Uh oh!

crcrpar Jun 4, 2025

Uh oh!

protonu Jun 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adding a mini benchmark for cross entropy loss #4472

Adding a mini benchmark for cross entropy loss #4472

Uh oh!

Conversation

protonu commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

jjsjann123 May 20, 2025

Choose a reason for hiding this comment

Uh oh!

protonu May 21, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

protonu commented Jun 3, 2025

Uh oh!

Uh oh!

crcrpar Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

protonu Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

protonu commented May 16, 2025 •

edited

Loading

github-actions bot commented May 16, 2025 •

edited

Loading