Skip to content

Comments

Add support for permute operation to Direct Python Bindings#4701

Merged
rdspring1 merged 2 commits intomainfrom
direct_tp4
Jul 22, 2025
Merged

Add support for permute operation to Direct Python Bindings#4701
rdspring1 merged 2 commits intomainfrom
direct_tp4

Conversation

@rdspring1
Copy link
Collaborator

@rdspring1 rdspring1 commented Jun 28, 2025

@github-actions
Copy link

github-actions bot commented Jun 28, 2025

Review updated until commit b6004e2

Description

  • Added support for permute operation in Direct Python Bindings.

  • Implemented permute handling in PythonTranslator.

  • Updated opinfo to support direct bindings for permute.

  • Adjusted error handling in test_direct_ops.


Changes walkthrough 📝

Relevant files
Enhancement
ops.cpp
Add permute operation                                                                       

python/python_direct/ops.cpp

  • Added permute function to ops module.
  • Included error checking for permute dimensions.
  • +26/-0   
    python_translate.cpp
    Implement permute handling in PythonTranslator                     

    python/python_direct/python_translate.cpp

  • Implemented handlePermute method.
  • Updated handle method to call handlePermute for permutations.
  • Added error check for input and output dtype consistency.
  • +23/-2   
    opinfos.py
    Update opinfo for permute                                                               

    tests/python/opinfo/opinfos.py

    • Added supports_direct_bindings for permute operation.
    +1/-0     
    test_direct_ops.py
    Adjust error handling in tests                                                     

    tests/python/opinfo/test_direct_ops.py

    • Temporarily skipped regex check in error handling.
    +2/-3     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Permutation Handling

    The handling of permutations in handlePermute should be validated to ensure it correctly identifies and translates permutations without introducing errors.

    void handle(const LoadStoreOp* lsop) final {
      // TODO short-circuit: lsop is a permutation.
      if (lsop->out()->isA<TensorView>() &&
          lsop->out()->as<TensorView>()->hasRoot()) {
        return handlePermute(lsop);
      }
    
      NVF_ERROR(
          lsop->in()->dtype() == lsop->out()->dtype(),
          "Expected the dtype for input and output to be the same");
      visited_vals_.insert(lsop->out());
      static const std::vector<std::string> argument_names = {"dtype"};
      printer_.generateKwargsOperation(
          "fd.ops.cast",
          std::make_tuple(lsop->in()),
          argument_names,
          std::make_tuple(lsop->out()->dtype()),
          {lsop->out()});
    }
    
    void handlePermute(const LoadStoreOp* lsop) {
      TensorView* out_tv = lsop->out()->as<TensorView>();
    
      std::optional<std::vector<int64_t>> new2old = ir_utils::computePermutation(
          out_tv->getRootDomain(), out_tv->getLogicalDomain());
      NVF_ERROR(new2old.has_value(), "Expected permutation");
    
      visited_vals_.insert(lsop->out());
      static const std::vector<std::string> argument_names = {"dims"};
      printer_.generateKwargsOperation(
          "fd.ops.permute",
          std::make_tuple(lsop->in()),
          argument_names,
          std::make_tuple(new2old.value()),
          {lsop->out()});
    }
    Error Handling

    The skipped regex check in test_errors should be reviewed to ensure that the error messages from direct bindings are appropriately handled and do not mask potential issues.

    with pytest.raises(exception_type):
        errors_test_fn(op, sample)

    @rdspring1 rdspring1 changed the base branch from main to direct_tp2 June 28, 2025 23:54
    @rdspring1 rdspring1 changed the title Direct tp4 Add Linear Layer Support to Direct Python Bindings Jun 28, 2025
    @rdspring1 rdspring1 added Multi-GPU Direct Bindings Python extension with direct mapping to NvFuser CPP objects. Thunder-Inference-Demo Cutlass labels Jun 28, 2025
    @rdspring1 rdspring1 removed the Cutlass label Jun 28, 2025
    @rdspring1 rdspring1 changed the base branch from direct_tp2 to direct_tp3 June 30, 2025 22:36
    @rdspring1 rdspring1 force-pushed the direct_tp3 branch 3 times, most recently from eb81af7 to f4a5eee Compare July 5, 2025 00:35
    @rdspring1 rdspring1 changed the title Add Linear Layer Support to Direct Python Bindings Add support for permute operation to Direct Python Bindings Jul 5, 2025
    rdspring1 added a commit that referenced this pull request Jul 8, 2025
    This PR add MultiGpu Support to Direct Python Bindings.
    
    PR Stack:
    
    - #4689 **<<< This PR.**
    - #4697
    - #4698
    - #4704 
    - #4701 
    
    cc: @kshitij12345
    @rdspring1 rdspring1 marked this pull request as ready for review July 21, 2025 16:18
    @rdspring1 rdspring1 force-pushed the direct_tp4 branch 2 times, most recently from 14f2a01 to a3ba693 Compare July 21, 2025 16:22
    @rdspring1
    Copy link
    Collaborator Author

    !test

    rdspring1 added a commit that referenced this pull request Jul 22, 2025
    This PR adds support for cast operations to Direct Python Bindings.
    
    PR Stack:
    - #4689
    - #4697  **<<< This PR.**
    - #4698
    - #4704
    - #4701
    - #4809
    rdspring1 added a commit that referenced this pull request Jul 22, 2025
    This PR adds support for matmul and linear ops to Direct Python
    Bindings.
    
    PR Stack:
    - #4689
    - #4697
    - #4698 **<<< This PR.**
    - #4704
    - #4701
    - #4809
    rdspring1 added a commit that referenced this pull request Jul 22, 2025
    …ings (#4704)
    
    This PR adds size, shape, define_vector, and reshape ops to direct
    bindings.
    
    PR Stack:
    - #4689
    - #4697
    - #4698
    - #4704 **<<< This PR.**
    - #4701 
    - #4809
    Base automatically changed from direct_tp3 to main July 22, 2025 19:54
    @rdspring1
    Copy link
    Collaborator Author

    !build

    @rdspring1 rdspring1 merged commit a4f7e67 into main Jul 22, 2025
    17 checks passed
    @rdspring1 rdspring1 deleted the direct_tp4 branch July 22, 2025 22:32
    rdspring1 added a commit that referenced this pull request Jul 23, 2025
    …ings (#4809)
    
    This PR changes `test_dtensor.py` and `test_deepseek_v3.py` to use
    direct bindings.
    
    Modified `tests/python/multidevice/conftest.py` to have
    `multidevice_test` fixture for legacy tests and
    `multidevice_direct_test` for tests using direct_bindings.
    
    Included quality of life improvements:
    * Fixes #4560 by supporting basic printing of multi-device scheduled
    fusions. The schedule operations are not created in the definition.
    
    PR Stack:
    - #4697
    - #4698
    - #4704
    - #4701
    - #4809 **<<< This PR.**
    nsarka pushed a commit to nsarka/Fuser that referenced this pull request Jul 28, 2025
    This PR add MultiGpu Support to Direct Python Bindings.
    
    PR Stack:
    
    - NVIDIA#4689 **<<< This PR.**
    - NVIDIA#4697
    - NVIDIA#4698
    - NVIDIA#4704 
    - NVIDIA#4701 
    
    cc: @kshitij12345
    nsarka pushed a commit to nsarka/Fuser that referenced this pull request Jul 28, 2025
    This PR adds support for cast operations to Direct Python Bindings.
    
    PR Stack:
    - NVIDIA#4689
    - NVIDIA#4697  **<<< This PR.**
    - NVIDIA#4698
    - NVIDIA#4704
    - NVIDIA#4701
    - NVIDIA#4809
    nsarka pushed a commit to nsarka/Fuser that referenced this pull request Jul 28, 2025
    …IA#4698)
    
    This PR adds support for matmul and linear ops to Direct Python
    Bindings.
    
    PR Stack:
    - NVIDIA#4689
    - NVIDIA#4697
    - NVIDIA#4698 **<<< This PR.**
    - NVIDIA#4704
    - NVIDIA#4701
    - NVIDIA#4809
    nsarka pushed a commit to nsarka/Fuser that referenced this pull request Jul 28, 2025
    …ings (NVIDIA#4704)
    
    This PR adds size, shape, define_vector, and reshape ops to direct
    bindings.
    
    PR Stack:
    - NVIDIA#4689
    - NVIDIA#4697
    - NVIDIA#4698
    - NVIDIA#4704 **<<< This PR.**
    - NVIDIA#4701 
    - NVIDIA#4809
    nsarka pushed a commit to nsarka/Fuser that referenced this pull request Jul 28, 2025
    )
    
    This PR adds support for replacing linear layers with TensorParallel
    NvFuser layer in deepseek model using Direct Python Bindings.
    
    PR Stack:
    - NVIDIA#4689
    - NVIDIA#4697
    - NVIDIA#4698
    - NVIDIA#4704
    - NVIDIA#4701 **<<< This PR.**
    - NVIDIA#4809
    nsarka pushed a commit to nsarka/Fuser that referenced this pull request Jul 28, 2025
    …ings (NVIDIA#4809)
    
    This PR changes `test_dtensor.py` and `test_deepseek_v3.py` to use
    direct bindings.
    
    Modified `tests/python/multidevice/conftest.py` to have
    `multidevice_test` fixture for legacy tests and
    `multidevice_direct_test` for tests using direct_bindings.
    
    Included quality of life improvements:
    * Fixes NVIDIA#4560 by supporting basic printing of multi-device scheduled
    fusions. The schedule operations are not created in the definition.
    
    PR Stack:
    - NVIDIA#4697
    - NVIDIA#4698
    - NVIDIA#4704
    - NVIDIA#4701
    - NVIDIA#4809 **<<< This PR.**
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Direct Bindings Python extension with direct mapping to NvFuser CPP objects. Python API Issues related to the Python API

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants