Skip to content

Conversation

@pggPL
Copy link
Collaborator

@pggPL pggPL commented Jan 30, 2025

Description

Nvidia-DLFramework-Inspect will be the common debug/logging API for Nvidia frameworks. Integration to the Transformer Engine has 3 aims:

  • allow to disable/enable FP8 in the particular GEMMs, run current scaling in some GEMMs etc.
  • allow to easily log the statistics for each of the tensor in every GEMM,
  • make testing new precision/recipes integrated with the TE easier.

Link to the nvidia-dlframework-inspect. IMPORTANT To run this PR one need to use branch from that PR.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring
  • Testing

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@pggPL pggPL force-pushed the nvdlfw_inspect_support branch from 8f6dbd5 to f940ba3 Compare January 30, 2025 21:31
@ptrendx
Copy link
Member

ptrendx commented Feb 7, 2025

Please move this PR to be against main.

@pggPL pggPL changed the base branch from release_v2.0 to main February 7, 2025 23:16
@pggPL pggPL marked this pull request as ready for review February 10, 2025 11:46
@pggPL
Copy link
Collaborator Author

pggPL commented Feb 12, 2025

/te-ci pytorch

@pggPL pggPL force-pushed the nvdlfw_inspect_support branch from 7380ee1 to 7467f1e Compare February 12, 2025 17:09
* TE 2.0 code drop

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [PyTorch] Fix linter warnings (NVIDIA#1426)

* Fix linter warnings in basic linear op

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix linter warnings in grouped linear module

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Disable Userbuffers support in te.Sequential

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add path to disable cudnn norm for mxfp8 (NVIDIA#1432)

* Add path to disable cudnn norm for mxfp8

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pad MXFP8 scale inverses at the time of creation (NVIDIA#1431)

* Create scale_inv for block scaling already padded

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Remove old file, fix CG test

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fixes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change default value of env

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* [PyTorch] Respect existing quantizer usages in functional linear API (NVIDIA#1440)

Respect existing quantizer usages in functional linear API

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Nvidia-DLFramework-Inspect support

* Update FE from 1.10-rc to 1.10 (NVIDIA#1438)

Update FE 1.10-rc to 1.10

Signed-off-by: Charlene Yang <charleney@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* removed unnecesssary files

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* removed unnecesssary files

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* lint fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* license fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [PyTorch] Debug NeMo distributed optimizer (NVIDIA#1444)

Debug errors with NeMo distributed optimizer

Avoid internal quantized tensor class in params and when setting data attr. Debug view function in MXFP8Tensor.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Rename block scaling recipe (NVIDIA#1442)

Rename MXFP8 recipe

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* [common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions (NVIDIA#1437)

* Generalized MXFP8 fused kernels w.r.t. input tensor dimensions

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update transformer_engine/common/common.cu

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed unnecessary test scenarios

Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>

* Reverted the previous commit as it generated a compilation error (caused by to string conversion)

Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update transformer_engine/common/common.cu

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cast_mxfp8.cu

Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>

* Fixed the bug with partial dbias writes in trimmed chunks

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Generalized MXFP8 dequantize kernel

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* Add the virtual destructor to the Quantizer class (NVIDIA#1446)

Add the virtual destructor to the Quantizer

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [Core] Debug unaligned MXFP8 dequantize tests (NVIDIA#1450)

* Skip MXFP8 dequantize tests with invalid alignment

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Remove test case with unaligned rows

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Generalization of the FP8 dgated activations kernel (NVIDIA#1448)

* Relax FP8 gated activations requirements
Expanded MXFP8 and FP8 tests coverage

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix scale_inv check in test

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Update tests/cpp/operator/test_cast_mxfp8.cu

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>

* Changes from review

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Lift the 2D restriction on MXFP8 scales

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fix the scale_inv dimension check for MXFP8

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Skip columnwise MXFP8 tests for 1D tensors

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Skip 2x MXFP8 tests with 1D tensors

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Adjusting tolerances for dbias

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Smaller test cases

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

---------

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* one test api fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [PyTorch/C++] Comm+GEMM overlap compatibility with QuantizedTensor (NVIDIA#1427)

* C++ code and TE/PyTorch general_gemm updated to support TP overlap with cppqtensor

Signed-off-by: Alp Dener <adener@nvidia.com>

CommOverlap objects can now return overlap buffers to PyTorch as QuantizedTensors

Signed-off-by: Alp Dener <adener@nvidia.com>

updated comm+GEMM overlap test for pure GEMM, both BF16 and FP8 working with QuantizedTensor

Signed-off-by: Alp Dener <adener@nvidia.com>

te.Linear and te.LayerNormMLP updated for TP overlap w/ QuantizedTensor. All overlaps work in BF16. All ovrlaps except bulk WGRAD work in FP8.

Signed-off-by: Alp Dener <adener@nvidia.com>

completed TP overlap QuantizedTensor updates for LayerNormLinear, but issues with quantized normalization

Signed-off-by: Alp Dener <adener@nvidia.com>

all overlaps working with bf16, all but bulk WGRAD working with FP8

Signed-off-by: Alp Dener <adener@nvidia.com>

all overlaps work with Float8Tensor, except bulk wgrad in LayerNormMLP (works in other modules)

Signed-off-by: Alp Dener <adener@nvidia.com>

all overlaps working with QuantizedTensor in BF16 and FP8

Signed-off-by: Alp Dener <adener@nvidia.com>

cleaned up pytest formatting

Signed-off-by: Alp Dener <adener@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed atomic GEMM tests for comm+GEMM overlap (deprecated in CUDA) and updated test sizing

Signed-off-by: Alp Dener <adener@nvidia.com>

* all TP overlap tests fixed on H100, a few failures remain in sanity tests

Signed-off-by: Alp Dener <adener@nvidia.com>

* Minor fix, lint, format

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix mxfp8

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Minor changes/cleanup

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Populate column-wise data in FP8 LayerNorm/RMSNorm funcs if provided

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix linter warnings

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix fused attn tests

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Initialize LN output with correct device

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix UB distributed tests

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix for non-fp8 cases

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------

Signed-off-by: Alp Dener <adener@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>

* [PyTorch] Remove MXFP8 scale-inv padding in MXFP8 all-gather (NVIDIA#1455)

* Remove MXFP8 scale-inv padding in MXFP8 all-gather

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Zero out padding in MXFP8 scale-inverses

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [common] Generalized MXFP8 gated kernels w.r.t. input tensor dimensions (NVIDIA#1449)

* Fixed scaling tensor alignment/padding

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changes from review

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed alignment and padding in scaled tensors. Refactoring.

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* Skipped scenarios for non-mod(32) tensors

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* More fixes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Some fixes to the CPU reference

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed typo in the kernel. Restricted the last dim to multiples of 32

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* Fixed TMA writes overlap

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove the largest test cases for numerical stability

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

---------

Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Przemek Tredak <ptredak@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* Fix MXFP8 normalization (NVIDIA#1457)

* Fix MXFP8 normalization

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [PyTorch] Reduce tensor dimensions in MXFP8 tests (NVIDIA#1435)

* Relax dim constraint in MXFP8 tests

Dims are multiples of 32 instead of 128.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Make tensor dims multiples of 32

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Avoid MXFP8 GEMM with MXFP8 output

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Reduce tensor sizes in non-quantized TP test

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Increase GEMM sizes in distributed te.Sequential tests

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Expand sanity tests to include MXFP8

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* polishing

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* polishing

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* polishing

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* polishing

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* refactor

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* refactor

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* lint fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* lint fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lint fixed

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lint fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lint and license fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* nvinspect_api to debug_api

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* end debug

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* one gpu tests passing

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes all tests

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* lint fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* new small test

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

---------

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>
Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>
Signed-off-by: Alp Dener <adener@nvidia.com>
Co-authored-by: Przemek Tredak <ptredak@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>
Co-authored-by: Alp Dener <adener@nvidia.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL pggPL force-pushed the nvdlfw_inspect_support branch from 7467f1e to c90f5ac Compare February 12, 2025 17:09
pggPL and others added 3 commits February 12, 2025 09:21
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL
Copy link
Collaborator Author

pggPL commented Feb 12, 2025

/te-ci pytorch L1

pggPL and others added 4 commits February 13, 2025 02:52
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@timmoon10 timmoon10 self-requested a review February 13, 2025 19:37
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
if api_name in ["inspect_tensor", "inspect_tensor_postquantize"]:
assert ret is None
if api_name == "modity_tensor":
assert type(ret) in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions:

  • what about the torch.Parameter?
  • this introduces a pretty hidden place for listing all Tensor types - could we move that array to some more visible place, like quantized_tensor.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to quantized_tensor.

pggPL and others added 2 commits March 12, 2025 19:45
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL
Copy link
Collaborator Author

pggPL commented Mar 12, 2025

/te-ci pytorch L1

pggPL and others added 10 commits March 13, 2025 12:43
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL
Copy link
Collaborator Author

pggPL commented Mar 13, 2025

/te-ci pytorch L1

2 similar comments
@pggPL
Copy link
Collaborator Author

pggPL commented Mar 14, 2025

/te-ci pytorch L1

@pggPL
Copy link
Collaborator Author

pggPL commented Mar 14, 2025

/te-ci pytorch L1

@ptrendx
Copy link
Member

ptrendx commented Mar 17, 2025

/te-ci pytorch

@pggPL
Copy link
Collaborator Author

pggPL commented Mar 18, 2025

/te-ci pytorch L1

Let's look inside them!
In the main log file, you can find detailed information about the transformer's layer GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the main log file, you can find detailed information about the transformer's layer GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended.
In the main log file, you can find detailed information about the transformer layer's GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended.

Copy link
Member

@ksivaman ksivaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the environment variable names used are very generic (FEATURE_DIRS, CONFIG_FILE, DEBUG) and need changing. Could you add the NVTE_ prefix to them and make it more descriptive?

pggPL and others added 4 commits March 25, 2025 13:15
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@pggPL
Copy link
Collaborator Author

pggPL commented Mar 25, 2025

This PR was split into 4 smaller PRs.

@pggPL pggPL closed this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants