added `mse_loss` #218

k223kim · 2024-04-18T05:32:45Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Fixes #174 .

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

YES!

for more information, see https://pre-commit.ci

k223kim · 2024-04-22T01:08:16Z

Hi @mruberry!
Just wanted to follow up the mse_loss issue and ask some specific questions regarding the implementation. Before anything, without any DecorateInfo in mse_loss_opinfo, the current implementation has the following errors:

FAILED thunder/tests/test_grad.py::test_vjp_correctness_mse_loss_torch_cpu_float64 - AssertionError: Scalars are not close!
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_bfloat16 - RuntimeError: "mse_cpu" not implemented for 'BFloat16'
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_float16 - RuntimeError: "mse_backward_cpu_out" not implemented for 'Half'
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_float32 - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_float64 - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_mse_loss_torch_cpu_bfloat16 - RuntimeError: "mse_cpu" not implemented for 'BFloat16'

My question is, I am not sure if the error is due to my implementation itself, or if PyTorch is not supporting something. In the case of the "not implemented error", I am tempted to resolve these with the following DecorateInfo:

test_directives=(
    DecorateInfo(
        pytest.mark.skip,
        "test_core_vs_torch_consistency",
        dtypes=(datatypes.bfloat16,),
        devicetypes=(devices.DeviceType.CPU,),
    ),
    DecorateInfo(
        pytest.mark.skip,
        "test_phantom_grad_vs_torch_consistency",
        dtypes=(datatypes.bfloat16, datatypes.float16,),
        devicetypes=(devices.DeviceType.CPU,),
    ),
),

How do I know if PyTorch supports this or not?
2. Regarding other assertion errors, as it is throwing errors in for a dtype of float64, I am assuming this is because of my implementation. Any suggestions or comments would be appreciated! I am suspecting the implementation of _mse_loss_backward_impl in executors/torchex.pyis incorrect in some way. But again, I am not sure :( Would appreciate any help!

for more information, see https://pre-commit.ci

thunder/clang/__init__.py

thunder/core/utils.py

thunder/executors/torchex.py

thunder/tests/opinfos.py

thunder/torch/__init__.py

mruberry · 2024-04-22T20:17:17Z

Hi @mruberry! Just wanted to follow up the mse_loss issue and ask some specific questions regarding the implementation. Before anything, without any DecorateInfo in mse_loss_opinfo, the current implementation has the following errors:

FAILED thunder/tests/test_grad.py::test_vjp_correctness_mse_loss_torch_cpu_float64 - AssertionError: Scalars are not close!
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_bfloat16 - RuntimeError: "mse_cpu" not implemented for 'BFloat16'
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_float16 - RuntimeError: "mse_backward_cpu_out" not implemented for 'Half'
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_float32 - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_mse_loss_torch_cpu_float64 - AssertionError: Tensor-likes are not close!
FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_mse_loss_torch_cpu_bfloat16 - RuntimeError: "mse_cpu" not implemented for 'BFloat16'

My question is, I am not sure if the error is due to my implementation itself, or if PyTorch is not supporting something. In the case of the "not implemented error", I am tempted to resolve these with the following DecorateInfo:

test_directives=(
    DecorateInfo(
        pytest.mark.skip,
        "test_core_vs_torch_consistency",
        dtypes=(datatypes.bfloat16,),
        devicetypes=(devices.DeviceType.CPU,),
    ),
    DecorateInfo(
        pytest.mark.skip,
        "test_phantom_grad_vs_torch_consistency",
        dtypes=(datatypes.bfloat16, datatypes.float16,),
        devicetypes=(devices.DeviceType.CPU,),
    ),
),

How do I know if PyTorch supports this or not?

These skips make a lot of sense. I wouldn't worry about supporting grad by adding mse_loss_backward and a custom grad formula transform in this PR. @IvanYashchuk may suggest otherwise, and he's our grad expert, though.

Regarding other assertion errors, as it is throwing errors in for a dtype of float64, I am assuming this is because of my implementation. Any suggestions or comments would be appreciated! I am suspecting the implementation of _mse_loss_backward_impl in executors/torchex.pyis incorrect in some way. But again, I am not sure :( Would appreciate any help!

This may be an interesting follow-up issue for a separate PR that targets optimizing this operation's grad formula, but I think this issue may disappear on this PR if this PR doesn't add the grad formula.

mruberry · 2024-04-22T20:20:39Z

Cool PR, @k223kim!

Let's take this PR out of "draft," — I think it's ready for review!

I added some inline comments, particularly about understanding what's happening with the broadcasting better. I'm curious to hear more about that!

I suggest remove the custom grad formula from this PR, and seeing if that simplifies the testing. A separate PR can optimize the gradient computation.

for more information, see https://pre-commit.ci

thunder/clang/__init__.py

thunder/core/utils.py

thunder/executors/torchex.py

thunder/tests/opinfos.py

thunder/torch/__init__.py

thunder/tests/opinfos.py

mruberry

This looks really good, @k223kim. There are mostly clean-up changes that should be simple to make. There is a test that's failing:

FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_mse_loss_nvfuser_cuda_float16 - AssertionError: Tensor-likes are not close!

Mismatched elements: 1 / 32 (3.1%)
Greatest absolute difference: 0.00048828125 at index (1, 9) (up to 1e-05 allowed)
Greatest relative difference: 0.0014553070068359375 at index (1, 9) (up to 0.001 allowed)

and that's OK, the deviation is pretty small. This PR just needs to update the OpInfo to account for this by increasing that test's tolerance. An example of how this is done can be found here:

lightning-thunder/thunder/tests/opinfos.py

Line 4394 in e0ab648

DecorateInfo(

Essentially, a decorator is specified in the OpInfo and it is automatically added to the generated test. In this case it looks like the decorator needs to set

 custom_comparator(partial(assert_close, atol=1e-3, rtol=1e-2)),

If that doesn't work we can increase the test tolerance even more. Let me know if you have any questions about this!

k223kim · 2024-04-24T15:04:50Z

Hi @mruberry! Thanks for the detailed comments :) it's helping me a lot! I can add the following to opinfos.py

DecorateInfo(
    custom_comparator(partial(assert_close, atol=1e-3, rtol=1e-2)),
    executors=("nvfuser",),
    dtypes=(datatypes.float16,),
)

However, my concern is that, I am not able to reproduce the error that you have shown above. Would this be an issue? (on my end, all tests are passing..)

mruberry · 2024-04-24T15:22:49Z

Hi @mruberry! Thanks for the detailed comments :) it's helping me a lot! I can add the following to opinfos.py
DecorateInfo(
    custom_comparator(partial(assert_close, atol=1e-3, rtol=1e-2)),
    executors=("nvfuser",),
    dtypes=(datatypes.float16,),
)
However, my concern is that, I am not able to reproduce the error that you have shown above. Would this be an issue? (on my end, all tests are passing..)

I understand, and that's OK, we want to see the CI passing, and the CI and your local hardware may have some differences that cause a small precision difference.

k223kim · 2024-04-24T16:40:52Z

Hey @mruberry !
One last question, regarding the custom grad formula for mse_loss, what would be the next steps? Should I submit a separate PR regarding that? Or is there a better way to work on it?
(cc. @IvanYashchuk )

mruberry · 2024-04-24T17:22:54Z

Hey @mruberry ! One last question, regarding the custom grad formula for mse_loss, what would be the next steps? Should I submit a separate PR regarding that? Or is there a better way to work on it? (cc. @IvanYashchuk )

Absolutely a separate PR would be great. The reason to add a custom grad formula would be to improve performance. So it'd be nice if the PR showed performance of mse_loss forward->backward before the PR, and then after the PR, so we can be sure it improves performance.

(Benchmarking with CUDA devices can be a little tricky, but basically you want to run the operations, sync the CUDA device with the CPU, and measure that time.)

thunder/tests/opinfos.py

mruberry

Cool! Nice work, @k223kim!

mruberry · 2024-04-24T20:39:57Z

@t-vi @Borda This may require some special push to merge. The test failure is flaky and unrelated

k223kim and others added 12 commits April 16, 2024 17:27

feat: rough mse_loss implementation

3b8ccca

feat: added is_cpu_scalar_tensor for maybe_broadcast

62cb80a

feat: added mse_loss in torch/__init__.py

8b4a503

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ef9ac6

for more information, see https://pre-commit.ci

feat: updated preserve_cpu_scalar_tensors to be False by default

e859a1c

feat: updated torchex with mse_loss

296c9e1

feat: pre-commit

a41305c

feat: added test cases for mse_loss

5f5deee

feat: pre-commit

af5635b

feat: updated test code for mse_loss

831f2f1

[pre-commit.ci] auto fixes from pre-commit.com hooks

26e7d30

for more information, see https://pre-commit.ci

feat: updated test code for mse_loss

1906e9a

k223kim and others added 2 commits April 22, 2024 10:08

feat: added decorateinfo for mse_loss

8221f54

[pre-commit.ci] auto fixes from pre-commit.com hooks

9050609

for more information, see https://pre-commit.ci

mruberry requested a review from nikitaved April 22, 2024 19:40