-
Notifications
You must be signed in to change notification settings - Fork 577
[Pytorch] Nvidia-DLFramework-Inspect support #1441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8f6dbd5 to
f940ba3
Compare
|
Please move this PR to be against main. |
|
/te-ci pytorch |
7380ee1 to
7467f1e
Compare
* TE 2.0 code drop Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [PyTorch] Fix linter warnings (NVIDIA#1426) * Fix linter warnings in basic linear op Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix linter warnings in grouped linear module Signed-off-by: Tim Moon <tmoon@nvidia.com> * Disable Userbuffers support in te.Sequential Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add path to disable cudnn norm for mxfp8 (NVIDIA#1432) * Add path to disable cudnn norm for mxfp8 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pad MXFP8 scale inverses at the time of creation (NVIDIA#1431) * Create scale_inv for block scaling already padded Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove old file, fix CG test Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change default value of env Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [PyTorch] Respect existing quantizer usages in functional linear API (NVIDIA#1440) Respect existing quantizer usages in functional linear API Signed-off-by: Tim Moon <tmoon@nvidia.com> * Nvidia-DLFramework-Inspect support * Update FE from 1.10-rc to 1.10 (NVIDIA#1438) Update FE 1.10-rc to 1.10 Signed-off-by: Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * removed unnecesssary files Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * removed unnecesssary files Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * license fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [PyTorch] Debug NeMo distributed optimizer (NVIDIA#1444) Debug errors with NeMo distributed optimizer Avoid internal quantized tensor class in params and when setting data attr. Debug view function in MXFP8Tensor. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Rename block scaling recipe (NVIDIA#1442) Rename MXFP8 recipe Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions (NVIDIA#1437) * Generalized MXFP8 fused kernels w.r.t. input tensor dimensions Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/common.cu Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed unnecessary test scenarios Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Reverted the previous commit as it generated a compilation error (caused by to string conversion) Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/common.cu Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_cast_mxfp8.cu Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Fixed the bug with partial dbias writes in trimmed chunks Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Generalized MXFP8 dequantize kernel Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * Add the virtual destructor to the Quantizer class (NVIDIA#1446) Add the virtual destructor to the Quantizer Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [Core] Debug unaligned MXFP8 dequantize tests (NVIDIA#1450) * Skip MXFP8 dequantize tests with invalid alignment Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove test case with unaligned rows Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> * Generalization of the FP8 dgated activations kernel (NVIDIA#1448) * Relax FP8 gated activations requirements Expanded MXFP8 and FP8 tests coverage Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix scale_inv check in test Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Update tests/cpp/operator/test_cast_mxfp8.cu Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> * Changes from review Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Lift the 2D restriction on MXFP8 scales Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix the scale_inv dimension check for MXFP8 Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip columnwise MXFP8 tests for 1D tensors Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Skip 2x MXFP8 tests with 1D tensors Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Adjusting tolerances for dbias Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Smaller test cases Signed-off-by: Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * one test api fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [PyTorch/C++] Comm+GEMM overlap compatibility with QuantizedTensor (NVIDIA#1427) * C++ code and TE/PyTorch general_gemm updated to support TP overlap with cppqtensor Signed-off-by: Alp Dener <adener@nvidia.com> CommOverlap objects can now return overlap buffers to PyTorch as QuantizedTensors Signed-off-by: Alp Dener <adener@nvidia.com> updated comm+GEMM overlap test for pure GEMM, both BF16 and FP8 working with QuantizedTensor Signed-off-by: Alp Dener <adener@nvidia.com> te.Linear and te.LayerNormMLP updated for TP overlap w/ QuantizedTensor. All overlaps work in BF16. All ovrlaps except bulk WGRAD work in FP8. Signed-off-by: Alp Dener <adener@nvidia.com> completed TP overlap QuantizedTensor updates for LayerNormLinear, but issues with quantized normalization Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps working with bf16, all but bulk WGRAD working with FP8 Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps work with Float8Tensor, except bulk wgrad in LayerNormMLP (works in other modules) Signed-off-by: Alp Dener <adener@nvidia.com> all overlaps working with QuantizedTensor in BF16 and FP8 Signed-off-by: Alp Dener <adener@nvidia.com> cleaned up pytest formatting Signed-off-by: Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed atomic GEMM tests for comm+GEMM overlap (deprecated in CUDA) and updated test sizing Signed-off-by: Alp Dener <adener@nvidia.com> * all TP overlap tests fixed on H100, a few failures remain in sanity tests Signed-off-by: Alp Dener <adener@nvidia.com> * Minor fix, lint, format Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix mxfp8 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Minor changes/cleanup Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Populate column-wise data in FP8 LayerNorm/RMSNorm funcs if provided Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix fused attn tests Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Initialize LN output with correct device Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix UB distributed tests Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix for non-fp8 cases Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Alp Dener <adener@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> * [PyTorch] Remove MXFP8 scale-inv padding in MXFP8 all-gather (NVIDIA#1455) * Remove MXFP8 scale-inv padding in MXFP8 all-gather Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Zero out padding in MXFP8 scale-inverses Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [common] Generalized MXFP8 gated kernels w.r.t. input tensor dimensions (NVIDIA#1449) * Fixed scaling tensor alignment/padding Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes from review Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed alignment and padding in scaled tensors. Refactoring. Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Skipped scenarios for non-mod(32) tensors Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * More fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Some fixes to the CPU reference Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typo in the kernel. Restricted the last dim to multiples of 32 Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Fixed TMA writes overlap Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove the largest test cases for numerical stability Signed-off-by: Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix MXFP8 normalization (NVIDIA#1457) * Fix MXFP8 normalization Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [PyTorch] Reduce tensor dimensions in MXFP8 tests (NVIDIA#1435) * Relax dim constraint in MXFP8 tests Dims are multiples of 32 instead of 128. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Make tensor dims multiples of 32 Signed-off-by: Tim Moon <tmoon@nvidia.com> * Avoid MXFP8 GEMM with MXFP8 output Signed-off-by: Tim Moon <tmoon@nvidia.com> * Reduce tensor sizes in non-quantized TP test Signed-off-by: Tim Moon <tmoon@nvidia.com> * Increase GEMM sizes in distributed te.Sequential tests Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> * Expand sanity tests to include MXFP8 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * polishing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fixed Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint and license fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nvinspect_api to debug_api Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * end debug Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * one gpu tests passing Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes all tests Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * lint fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * new small test Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Charlene Yang <charleney@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Alp Dener <adener@nvidia.com> Co-authored-by: Przemek Tredak <ptredak@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by: Alp Dener <adener@nvidia.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
7467f1e to
c90f5ac
Compare
for more information, see https://pre-commit.ci
|
/te-ci pytorch L1 |
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
| if api_name in ["inspect_tensor", "inspect_tensor_postquantize"]: | ||
| assert ret is None | ||
| if api_name == "modity_tensor": | ||
| assert type(ret) in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 questions:
- what about the torch.Parameter?
- this introduces a pretty hidden place for listing all Tensor types - could we move that array to some more visible place, like quantized_tensor.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to quantized_tensor.
for more information, see https://pre-commit.ci
|
/te-ci pytorch L1 |
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci pytorch L1 |
2 similar comments
|
/te-ci pytorch L1 |
|
/te-ci pytorch L1 |
|
/te-ci pytorch |
|
/te-ci pytorch L1 |
docs/debug/1_getting_started.rst
Outdated
| Let's look inside them! | ||
| In the main log file, you can find detailed information about the transformer's layer GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In the main log file, you can find detailed information about the transformer's layer GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended. | |
| In the main log file, you can find detailed information about the transformer layer's GEMMs behavior. You can see that ``fc1`` and ``fc2`` fprop GEMMs are run in high precision, as intended. |
ksivaman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the environment variable names used are very generic (FEATURE_DIRS, CONFIG_FILE, DEBUG) and need changing. Could you add the NVTE_ prefix to them and make it more descriptive?
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
for more information, see https://pre-commit.ci
|
This PR was split into 4 smaller PRs. |
Description
Nvidia-DLFramework-Inspect will be the common debug/logging API for Nvidia frameworks. Integration to the Transformer Engine has 3 aims:
Link to the nvidia-dlframework-inspect. IMPORTANT To run this PR one need to use branch from that PR.
Type of change
Checklist: