[Pytorch] Nvidia-DLFramework-Inspect support #1441

pggPL · 2025-01-30T11:49:40Z

Description

Nvidia-DLFramework-Inspect will be the common debug/logging API for Nvidia frameworks. Integration to the Transformer Engine has 3 aims:

allow to disable/enable FP8 in the particular GEMMs, run current scaling in some GEMMs etc.
allow to easily log the statistics for each of the tensor in every GEMM,
make testing new precision/recipes integrated with the TE easier.

Link to the nvidia-dlframework-inspect. IMPORTANT To run this PR one need to use branch from that PR.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring
Testing

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

transformer_engine/pytorch/module/linear.py

ptrendx · 2025-02-07T22:52:10Z

Please move this PR to be against main.

pggPL · 2025-02-12T12:03:32Z

/te-ci pytorch

* TE 2.0 code drop Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [PyTorch] Fix linter warnings (NVIDIA#1426) * Fix linter warnings in basic linear op Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix linter warnings in grouped linear module Signed-off-by: Tim Moon <tmoon@nvidia.com> * Disable Userbuffers support in te.Sequential Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add path to disable cudnn norm for mxfp8 (NVIDIA#1432) * Add path to disable cudnn norm for mxfp8 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pad MXFP8 scale inverses at the time of creation (NVIDIA#1431) * Create scale_inv for block scaling already padded Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove old file, fix CG test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change default value of env Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [PyTorch] Respect existing quantizer usages in functional linear API (NVIDIA#1440) Respect existing quantizer usages in functional linear API Signed-off-by: Tim Moon <tmoon@nvidia.com> * Nvidia-DLFramework-Inspect support * Update FE from 1.10-rc to 1.10 (NVIDIA#1438) Update FE 1.10-rc to 1.10 Signed-off-by: Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * removed unnecesssary files Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * removed unnecesssary files Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * license fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [PyTorch] Debug NeMo distributed optimizer (NVIDIA#1444) Debug errors with NeMo distributed optimizer Avoid internal quantized tensor class in params and when setting data attr. Debug view function in MXFP8Tensor. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Rename block scaling recipe (NVIDIA#1442) Rename MXFP8 recipe Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions (NVIDIA#1437) * Generalized MXFP8 fused kernels w.r.t. input tensor dimensions Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/common.cu Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed unnecessary test scenarios Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Reverted the previous commit as it generated a compilation error (caused by to string conversion) Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/common.cu Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_cast_mxfp8.cu Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Fixed the bug with partial dbias writes in trimmed chunks Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Generalized MXFP8 dequantize kernel Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * Add the virtual destructor to the Quantizer class (NVIDIA#1446) Add the virtual destructor to the Quantizer Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [Core] Debug unaligned MXFP8 dequantize tests (NVIDIA#1450) * Skip MXFP8 dequantize tests with invalid alignment Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove test case with unaligned rows Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> * Generalization of the FP8 dgated activations kernel (NVIDIA#1448) * Relax FP8 gated activations requirements Expanded MXFP8 and FP8 tests coverage Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix scale_inv check in test Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Update tests/cpp/operator/test_cast_mxfp8.cu Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> * Changes from review Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Lift the 2D restriction on MXFP8 scales Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix the scale_inv dimension check for MXFP8 Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip columnwise MXFP8 tests for 1D tensors Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Skip 2x MXFP8 tests with 1D tensors Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Adjusting tolerances for dbias Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Smaller test cases Signed-off-by: Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * one test api fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [PyTorch/C++] Comm+GEMM overlap compatibility with QuantizedTensor (NVIDIA#1427) * C++ code and TE/PyTorch general_gemm updated to support TP overlap with cppqtensor Signed-off-by: Alp Dener <adener@nvidia.com> CommOverlap objects can now return overlap buffers to PyTorch as QuantizedTensors Signed-off-by: Alp Dener <adener@nvidia.com> updated comm+GEMM overlap test for pure GEMM, both BF16 and FP8 working with QuantizedTensor Signed-off-by: Alp Dener <adener@nvidia.com> te.Linear and te.LayerNormMLP updated for TP overlap w/ QuantizedTensor. All overlaps work in BF16. All ovrlaps except bulk WGRAD work in FP8. Signed-off-by: Alp Dener <adener@nvidia.com> completed TP overlap QuantizedTensor updates for LayerNormLinear, but issues with quantized normalization Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps working with bf16, all but bulk WGRAD working with FP8 Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps work with Float8Tensor, except bulk wgrad in LayerNormMLP (works in other modules) Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps working with QuantizedTensor in BF16 and FP8 Signed-off-by: Alp Dener <adener@nvidia.com> cleaned up pytest formatting Signed-off-by: Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed atomic GEMM tests for comm+GEMM overlap (deprecated in CUDA) and updated test sizing Signed-off-by: Alp Dener <adener@nvidia.com> * all TP overlap tests fixed on H100, a few failures remain in sanity tests Signed-off-by: Alp Dener <adener@nvidia.com> * Minor fix, lint, format Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix mxfp8 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Minor changes/cleanup Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Populate column-wise data in FP8 LayerNorm/RMSNorm funcs if provided Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix fused attn tests Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Initialize LN output with correct device Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix UB distributed tests Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix for non-fp8 cases Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Alp Dener <adener@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> * [PyTorch] Remove MXFP8 scale-inv padding in MXFP8 all-gather (NVIDIA#1455) * Remove MXFP8 scale-inv padding in MXFP8 all-gather Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Zero out padding in MXFP8 scale-inverses Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [common] Generalized MXFP8 gated kernels w.r.t. input tensor dimensions (NVIDIA#1449) * Fixed scaling tensor alignment/padding Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes from review Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed alignment and padding in scaled tensors. Refactoring. Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Skipped scenarios for non-mod(32) tensors Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * More fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Some fixes to the CPU reference Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typo in the kernel. Restricted the last dim to multiples of 32 Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Fixed TMA writes overlap Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove the largest test cases for numerical stability Signed-off-by: Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix MXFP8 normalization (NVIDIA#1457) * Fix MXFP8 normalization Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [PyTorch] Reduce tensor dimensions in MXFP8 tests (NVIDIA#1435) * Relax dim constraint in MXFP8 tests Dims are multiples of 32 instead of 128. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Make tensor dims multiples of 32 Signed-off-by: Tim Moon <tmoon@nvidia.com> * Avoid MXFP8 GEMM with MXFP8 output Signed-off-by: Tim Moon <tmoon@nvidia.com> * Reduce tensor sizes in non-quantized TP test Signed-off-by: Tim Moon <tmoon@nvidia.com> * Increase GEMM sizes in distributed te.Sequential tests Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> * Expand sanity tests to include MXFP8 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fixed Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint and license fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nvinspect_api to debug_api Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * end debug Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * one gpu tests passing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes all tests Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * new small test Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Charlene Yang <charleney@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Alp Dener <adener@nvidia.com> Co-authored-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by: Alp Dener <adener@nvidia.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL · 2025-02-12T17:34:18Z

/te-ci pytorch L1

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

docs/debug.rst

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

tests/pytorch/distributed/run_numerics.py

tests/pytorch/distributed/test_numerics.py

tests/pytorch/test_numerics.py

transformer_engine/debug/__init__.py

transformer_engine/debug/features/__init__.py

transformer_engine/debug/__init__.py

transformer_engine/debug/features/api.py

ptrendx · 2025-02-27T22:45:20Z

transformer_engine/debug/features/api.py

+        if api_name in ["inspect_tensor", "inspect_tensor_postquantize"]:
+            assert ret is None
+        if api_name == "modity_tensor":
+            assert type(ret) in [


2 questions:

what about the torch.Parameter?

this introduces a pretty hidden place for listing all Tensor types - could we move that array to some more visible place, like quantized_tensor.py?

I moved it to quantized_tensor.

transformer_engine/debug/features/fake_quant_fp8.py

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

pggPL · 2025-03-12T18:46:32Z

/te-ci pytorch L1

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL · 2025-03-13T23:14:05Z

/te-ci pytorch L1

pggPL · 2025-03-14T10:35:19Z

/te-ci pytorch L1

pggPL · 2025-03-14T16:55:38Z

/te-ci pytorch L1

ptrendx · 2025-03-17T16:30:11Z

/te-ci pytorch

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL · 2025-03-18T08:25:07Z

/te-ci pytorch L1

ksivaman · 2025-03-21T15:59:26Z

docs/debug/1_getting_started.rst

+
+Let's look inside them!
+
+In the main log file, you can find detailed information about the transformer's layer GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended.


Suggested change

In the main log file, you can find detailed information about the transformer's layer GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended.

In the main log file, you can find detailed information about the transformer layer's GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended.

ksivaman

Some of the environment variable names used are very generic (FEATURE_DIRS, CONFIG_FILE, DEBUG) and need changing. Could you add the NVTE_ prefix to them and make it more descriptive?

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

pggPL · 2025-03-25T14:31:02Z

This PR was split into 4 smaller PRs.

pggPL force-pushed the nvdlfw_inspect_support branch from 8f6dbd5 to f940ba3 Compare January 30, 2025 21:31

timmoon10 requested changes Jan 31, 2025

View reviewed changes

transformer_engine/pytorch/module/linear.py Outdated Show resolved Hide resolved

pggPL changed the base branch from release_v2.0 to main February 7, 2025 23:16

pggPL marked this pull request as ready for review February 10, 2025 11:46

pggPL force-pushed the nvdlfw_inspect_support branch from 7380ee1 to 7467f1e Compare February 12, 2025 17:09

pggPL force-pushed the nvdlfw_inspect_support branch from 7467f1e to c90f5ac Compare February 12, 2025 17:09

pggPL and others added 3 commits February 12, 2025 09:21

lint fix

d96284b

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

91c91ef

for more information, see https://pre-commit.ci

fix

492209d

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL and others added 4 commits February 13, 2025 02:52

debug_name change to name

c482ffe

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c0d2571

for more information, see https://pre-commit.ci

fixes

1afa2f1

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Merge branch 'main' into nvdlfw_inspect_support

8b622a5

timmoon10 self-requested a review February 13, 2025 19:37

ptrendx reviewed Feb 14, 2025

View reviewed changes

docs/debug.rst Outdated Show resolved Hide resolved

subtitle change

ed9984a

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

ptrendx reviewed Feb 25, 2025

View reviewed changes

tests/pytorch/distributed/run_numerics.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 25, 2025

View reviewed changes

tests/pytorch/distributed/test_numerics.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 25, 2025

View reviewed changes

tests/pytorch/test_numerics.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 25, 2025

View reviewed changes

transformer_engine/debug/__init__.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 25, 2025

View reviewed changes

transformer_engine/debug/features/__init__.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 27, 2025

View reviewed changes

transformer_engine/debug/__init__.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 27, 2025

View reviewed changes

transformer_engine/debug/features/api.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 27, 2025

View reviewed changes

transformer_engine/debug/features/api.py Outdated Show resolved Hide resolved

ptrendx reviewed Feb 27, 2025

View reviewed changes

transformer_engine/debug/features/fake_quant_fp8.py Outdated Show resolved Hide resolved

pggPL and others added 2 commits March 12, 2025 19:45

tests fix

4a9cec2

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

160b454

for more information, see https://pre-commit.ci

pggPL and others added 10 commits March 13, 2025 12:43

docs update

b74748b

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fixes

42b306c

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

overflows -> saturations

6a94b2a

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Merge remote-tracking branch 'upstream/main' into nvdlfw_inspect_support

3ac1988

[pre-commit.ci] auto fixes from pre-commit.com hooks

2538c6f

for more information, see https://pre-commit.ci

lint fix

b8b07f9

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Merge remote-tracking branch 'upstream/main' into nvdlfw_inspect_support

526dc0b

fix

7398c78

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix

db1cf10

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix

02a27e0

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Merge remote-tracking branch 'upstream/main' into nvdlfw_inspect_support

8fbac12

pggPL added 2 commits March 18, 2025 09:01

Merge remote-tracking branch 'upstream/main' into nvdlfw_inspect_support

04d75d1

setup fix

fa647e9

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

pggPL mentioned this pull request Mar 20, 2025

Can we only replace part of nn.Linear with te.Linear and others keep unchanged? #1595

Open

ksivaman reviewed Mar 21, 2025

View reviewed changes

ksivaman requested changes Mar 21, 2025

View reviewed changes

pggPL and others added 4 commits March 25, 2025 13:15

Merge remote-tracking branch 'upstream/main' into nvdlfw_inspect_support

dc4bcc6

fixed envvars

480ce35

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

docs

dcaca46

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

89bba29

for more information, see https://pre-commit.ci

pggPL closed this Mar 25, 2025


		Let's look inside them!

		In the main log file, you can find detailed information about the transformer's layer GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended.

[Pytorch] Nvidia-DLFramework-Inspect support #1441

[Pytorch] Nvidia-DLFramework-Inspect support #1441

Uh oh!

Conversation

pggPL commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

Uh oh!

ptrendx commented Feb 7, 2025

Uh oh!

pggPL commented Feb 12, 2025

Uh oh!

pggPL commented Feb 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ptrendx Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

pggPL Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pggPL commented Mar 12, 2025

Uh oh!

pggPL commented Mar 13, 2025

Uh oh!

pggPL commented Mar 14, 2025

Uh oh!

pggPL commented Mar 14, 2025

Uh oh!

ptrendx commented Mar 17, 2025

Uh oh!

pggPL commented Mar 18, 2025

Uh oh!

ksivaman Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

pggPL commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pggPL commented Jan 30, 2025 •

edited

Loading