Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a check for overflow of n_results variable in CrsGraphWrapperImpl::queryImpl function #1012

Closed
wants to merge 2 commits into from

Conversation

Shihab-Shahriar
Copy link
Contributor

@Shihab-Shahriar Shihab-Shahriar commented Jan 2, 2024

Due to a failure in initialization, I had a bug where I generated too many neighbor pairs. I think this can happen anytime there are too many primitives in a small place (leading to N^2 pairs).

This check might help a future user to quickly track down this sort of bugs.

Please feel free to edit the error message.

(I wonder if this is something kokkos should try to catch i.e. when a user tries to allocate a view with a negative dimension size.)

@Shihab-Shahriar
Copy link
Contributor Author

I understand this check won't always catch the overflow, it's possible for n_results to be positive again. I'm not sure catching that (by iterating through offset) is worth a separate kernel launch.

Comment on lines 213 to 218
if(n_results < 0){
// overflow
throw std::runtime_error("The search resulted in too many neighbor pairs. "
"Please ensure the primitive positions are not "
"in a degenerate state.");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, just

Suggested change
if(n_results < 0){
// overflow
throw std::runtime_error("The search resulted in too many neighbor pairs. "
"Please ensure the primitive positions are not "
"in a degenerate state.");
}
// Catch possible overflow
ARBORX_ASSERT(n_results >= 0);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more helpful than the current state. Though as a user of this library (or for that matter any library), I would always prefer to avoid stepping through the library code to find out what an internal variable does.

@aprokop
Copy link
Contributor

aprokop commented Jan 2, 2024

@Shihab-Shahriar This is definitely an interesting proposition. In general, there are multiple places where overflow could occur, both in ArboX and Kokkos, this is just one of them. An issue with this particular approach to fixing it is that it won't work if we make counts to be unsigned, or if the overflow occurs within exclusive scan.

What was the original failing behavior? Did the code crash? If it did, what was the error message?

Have you tried running the original buggy code with any of the sanitizers, such as -fsanitize=undefined?

@aprokop
Copy link
Contributor

aprokop commented Jan 2, 2024

(I wonder if this is something kokkos should try to catch i.e. when a user tries to allocate a view with a negative dimension size.)

@masterleinad Am I right in that Kokkos just treats all of the passed dimensions as size_t and thus it's not even visible to it?

@masterleinad
Copy link
Collaborator

@masterleinad Am I right in that Kokkos just treats all of the passed dimensions as size_t and thus it's not even visible to it?

Yes, the View constructor takes size_t arguments for the extents.

@Shihab-Shahriar
Copy link
Contributor Author

Shihab-Shahriar commented Jan 3, 2024

@aprokop the error I initially had from kokkos was "not enough memory to allocate for "indices" array" in the query function. What made this tricky is that it tried to allocate 1.5GB of space, I have 4GB VRAM, so it was natural to think I just need to move to a bigger GPU. And that would have masked this error.

On the issue of handling this in kokkos, I think an overload of realloc() function with int type, solely to do the dimension check and then pass it off to regular realloc might work.

@Shihab-Shahriar
Copy link
Contributor Author

Opened an issue on kokkos repo.

@aprokop
Copy link
Contributor

aprokop commented Apr 5, 2024

Not doing it on ArborX side, will rely on Kokkos.

@aprokop aprokop closed this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants