thrust: add 4-iterator mismatch overloads (bounded last2)#8504
Open
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
Open
thrust: add 4-iterator mismatch overloads (bounded last2)#8504edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Add `thrust::mismatch(first1, last1, first2, last2)` and `thrust::mismatch(first1, last1, first2, last2, pred)` overloads (with and without execution policy) to match the C++14 standard 4-iterator form that prevents reading past a shorter second range. CUDA-backend: uses `min(n1, n2)` to bound a zip_iterator fed to `find_if_not`, keeping the same single-pass GPU kernel as the existing 3-iterator path. Generic fallback: computes distances, advances a bounded last1 to `min(n1,n2)`, then delegates to the existing 3-iterator `find_if_not` path — forward-iterator safe. Public API declarations and Doxygen docs added to `mismatch.h`; dispatch wired in `mismatch.inl`. Tests: `TestMismatchBoundedSimple` (DECLARE_VECTOR_UNITTEST) covers equal-length, shorter range1, shorter range2, early mismatch, with predicate, and empty range cases. `TestMismatchBoundedWithExec` exercises the device execution-policy path and predicate overload. Closes NVIDIA#3601
Contributor
Contributor
Author
|
@davebayer Hi, could you please take a look at this PR when you have time?Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the C++14 4-iterator form of
thrust::mismatchthat accepts an explicit end iterator for the second range, preventing out-of-bounds reads when the two ranges have different lengths.Closes #3601
New overloads
Implementation
cuda/detail/mismatch.h)min(n1,n2)bounds azip_iteratorfed tofind_if_not??single-pass GPU kernel, same pattern as existing 3-iterator pathgeneric/mismatch.inl)advancea boundedlast1tomin(n1,n2), delegate to existing 3-iterator path ??forward-iterator safemismatch.h/mismatch.inl)Tests (all pass on RTX 5070 / sm_89)
TestMismatchBoundedSimple(DECLARE_VECTOR_UNITTEST??host + device + custom_numeric):TestMismatchBoundedWithExec(DECLARE_UNITTEST??device execution policy):thrust::devicethrust::device+ predicate