Skip to content

Generalized cudax::copy_bytes for mdspan#7676

Merged
fbusato merged 36 commits intoNVIDIA:mainfrom
fbusato:SoL-d2h-h2d
Mar 17, 2026
Merged

Generalized cudax::copy_bytes for mdspan#7676
fbusato merged 36 commits intoNVIDIA:mainfrom
fbusato:SoL-d2h-h2d

Conversation

@fbusato
Copy link
Copy Markdown
Contributor

@fbusato fbusato commented Feb 13, 2026

closes #7554

Description

cuda::copy_bytes(mspan) is currently limited to same layout, contigous mdspan.

This PR extends copy_bytes for mdspan-based host/device transfers so it works correctly and efficiently across contiguous and strided layouts, including different rank and mixed layout/order cases.
Common use cases are: padded tensors, column-major views, subviews, and other layout_stride mappings.

Main features:

  • Optimal (lowest) number of transfers.
  • Compatibily with any strided layout, including edge cases, such as transposition.
  • Use cuMemcpyBatchAsync on CTK 13+.

The main challenge is use two different stride orderings for different purposes:

  • Stride-sorted order: finding common contiguous regions / tile size.
  • Original order: required for correct logical index mapping when source and destination stride permutations differ.

Details:

  • If source/destination have the same stride order, iterating in stride-sorted order is safe and efficient.
  • If stride orders differ, using sorted order for address generation can mismatch source/destination linearization. In that case, iterator addressing switches to original tensor order, while still using the computed tile size from layout analysis. This preserves correctness and still exploits contiguity where available.

@fbusato fbusato self-assigned this Feb 13, 2026
@fbusato fbusato requested review from a team as code owners February 13, 2026 23:23
@fbusato fbusato added the cudax Feature intended for the cudax experimental library label Feb 13, 2026
@fbusato fbusato requested a review from caugonnet February 13, 2026 23:23
@fbusato fbusato added this to CCCL Feb 13, 2026
@fbusato fbusato requested a review from ericniebler February 13, 2026 23:23
@github-project-automation github-project-automation bot moved this to Todo in CCCL Feb 13, 2026
@fbusato fbusato changed the title Generalized cudax::copy_bytes for mspan Generalized cudax::copy_bytes for mdspan Feb 13, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Feb 13, 2026
@github-actions

This comment has been minimized.

fbusato and others added 8 commits March 11, 2026 11:53
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@fbusato fbusato enabled auto-merge (squash) March 13, 2026 23:47
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 2h 17m: Pass: 100%/137 | Total: 3d 01h | Max: 1h 53m | Hits: 90%/279164

See results here.

@fbusato fbusato merged commit d94dc52 into NVIDIA:main Mar 17, 2026
153 of 154 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudax Feature intended for the cudax experimental library

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FEA]: Relax restrictions on cross-system copy support for cudax::copy_bytes(mdspan)

4 participants