New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix failing GPU tests. #19848
Fix failing GPU tests. #19848
Conversation
* Add PREDIFF symbolic link to multiGpu dir * Update multiGPU .good file to recognize multiple printouts of arrays * Update fnWithForall to use new locale model * Update memory allocation in diags --- Signed-off-by: Andy Stone <stonea@users.noreply.github.com>
@@ -1,5 +1,2 @@ | |||
warning: The prototype GPU support implies --no-checks. This may impact debuggability. To suppress this warning, compile with --no-checks explicitly | |||
warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11060. Assuming the latest supported version 10.1 | |||
ptxas warning : Unresolved extern variable 'chpl_nodeID' in whole program compilation, ignoring extern qualifier | |||
warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11060. Assuming the latest supported version 10.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I apologize. Thanks for fixing my mess.
Array: 1 1 1 1 1 1 1 1 1 1 | ||
Array: 2 2 2 2 2 2 2 2 2 2 | ||
Array: 3 3 3 3 3 3 3 3 3 3 | ||
Array: 32 32 32 32 32 32 32 32 32 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test had a prediff before. I removed it and updated the output, but on my workstation with 1 gpu. You get printouts from each GPU. This should be fine for now, but we'll probably need to revise our tests once we have testing on different systems that don't have 8 gpus per node.
@@ -6,7 +6,7 @@ Start | |||
0 (cpu): diags.chpl:10: allocate 48B of _EndCount(AtomicT(int(64)),int(64)) at 0xPREDIFFED | |||
0 (cpu): diags.chpl:10: free at 0xPREDIFFED | |||
0 (cpu): diags.chpl:12: allocate 168B of [domain(1,int(64),false)] locale at 0xPREDIFFED | |||
0 (cpu): diags.chpl:12: allocate 16B of array elements at 0xPREDIFFED | |||
0 (cpu): diags.chpl:12: allocate 128B of array elements at 0xPREDIFFED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be from your locale model changes? This line is where we do here.getChild
vs here.gpus
. Apparently we were already doing some allocation there already, but now we are making a bigger allocation?
I wonder if we run this manually with --devel
this line would change to point to a more precise location in the modules where this allocation is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be from your locale model changes? This line is where we do here.getChild vs here.gpus. Apparently we were already doing some allocation there already, but now we are making a bigger allocation?
Yeah, I think so.
Previous PR was: "Fix failing GPU tests. chapel-lang#19848" --- Signed-off-by: Andy Stone <stonea@users.noreply.github.com>
Forgot to add PREDIFF symlink in previous PR. Previous PR was: "Fix failing GPU tests. #19848" [Reviewed by nobody; trivial change to fix test failure]
Previous PR was: "Fix failing GPU tests. chapel-lang#19848" --- Signed-off-by: Andy Stone <stonea@users.noreply.github.com> Signed-off-by: Brandon Neth <nethbrandone@gmail.com>
There are a few changes:
Add PREDIFF symbolic link to multiGpu dir
The outermost PREDIFF script is written generically enough to handle this. Anyway I've updated the good files to not explicitly do this and am updated the multi/ directory to follow the precedent of having a symbolically linked PREDIFF to the more generic script.
Update multiGPU .good file to recognize multiple printouts of arrays
I'm not sure why this has changed (maybe due to changes in the locale model?).
But it seems like the new output is correct (we're supposed to reprint the arrays for each GPU, right?). @e-kayrakli Could you please double check this?
Update fnWithForall to use new locale model
Instead of doing
getChild(1)
dogpus[0]
.Update memory allocation in diags
See this diff, from:
< 0 (cpu): diags.chpl:12: allocate 16B of array elements at 0xPREDIFFED
to:
> 0 (cpu): diags.chpl:12: allocate 128B of array elements at 0xPREDIFFED
I'm guessing this reflects an internal change in the array module and we should just update the .good file but I want a second opinion. Can you comment @e-kayrakli?