Skip to content

thrust: add 4-iterator mismatch overloads (bounded last2)#8504

Open
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
edenfunf:feat/thrust-mismatch-last2
Open

thrust: add 4-iterator mismatch overloads (bounded last2)#8504
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
edenfunf:feat/thrust-mismatch-last2

Conversation

@edenfunf
Copy link
Copy Markdown
Contributor

Summary

Adds the C++14 4-iterator form of thrust::mismatch that accepts an explicit end iterator for the second range, preventing out-of-bounds reads when the two ranges have different lengths.

Closes #3601

New overloads

// Without execution policy
thrust::mismatch(first1, last1, first2, last2);
thrust::mismatch(first1, last1, first2, last2, pred);

// With execution policy
thrust::mismatch(exec, first1, last1, first2, last2);
thrust::mismatch(exec, first1, last1, first2, last2, pred);

Implementation

Layer Strategy
CUDA backend (cuda/detail/mismatch.h) min(n1,n2) bounds a zip_iterator fed to find_if_not ??single-pass GPU kernel, same pattern as existing 3-iterator path
Generic fallback (generic/mismatch.inl) Compute distances, advance a bounded last1 to min(n1,n2), delegate to existing 3-iterator path ??forward-iterator safe
Public API (mismatch.h / mismatch.inl) Doxygen docs and dispatch wired following existing overload conventions

Tests (all pass on RTX 5070 / sm_89)

TestMismatchBoundedSimple (DECLARE_VECTOR_UNITTEST ??host + device + custom_numeric):

  • Equal-length ranges with mismatch
  • Equal-length ranges, no mismatch
  • Range1 shorter ??stops at range1 exhaustion
  • Range2 shorter ??stops at range2 exhaustion
  • Mismatch before either range ends (range2 shorter)
  • With explicit binary predicate
  • Empty first range

TestMismatchBoundedWithExec (DECLARE_UNITTEST ??device execution policy):

  • Bounded mismatch with thrust::device
  • Bounded mismatch with thrust::device + predicate

Add `thrust::mismatch(first1, last1, first2, last2)` and
`thrust::mismatch(first1, last1, first2, last2, pred)` overloads
(with and without execution policy) to match the C++14 standard
4-iterator form that prevents reading past a shorter second range.

CUDA-backend: uses `min(n1, n2)` to bound a zip_iterator fed to
`find_if_not`, keeping the same single-pass GPU kernel as the
existing 3-iterator path.

Generic fallback: computes distances, advances a bounded last1 to
`min(n1,n2)`, then delegates to the existing 3-iterator
`find_if_not` path — forward-iterator safe.

Public API declarations and Doxygen docs added to `mismatch.h`;
dispatch wired in `mismatch.inl`.

Tests: `TestMismatchBoundedSimple` (DECLARE_VECTOR_UNITTEST) covers
equal-length, shorter range1, shorter range2, early mismatch, with
predicate, and empty range cases. `TestMismatchBoundedWithExec`
exercises the device execution-policy path and predicate overload.

Closes NVIDIA#3601
@edenfunf edenfunf requested a review from a team as a code owner April 17, 2026 16:47
@edenfunf edenfunf requested a review from davebayer April 17, 2026 16:47
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 17, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 17, 2026
@edenfunf
Copy link
Copy Markdown
Contributor Author

@davebayer Hi, could you please take a look at this PR when you have time?Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

[FEA]: add thrust::mismatch overloads with last2 parameter

1 participant