Enable samples input tag in batched drivers#4224
Conversation
3333887 to
f183eb1
Compare
There was a problem hiding this comment.
Please put up an issue on the topic of new input tags.
There needs to be a clear discussion on the end point. I have ideas that I believe can work, as do you + no doubt others. This will take some time.
Before tackling the optimizer we need to tackle at least VMC and also make sure that the new input tags are sufficiently verbose and self-explanatory. e.g. we will need prefixes total_ input items that are "global" and also to make sure that they don't conflict with historically used terms. Importantly we need a reasonably complete plan for all the main QMC driver input revisions before going ahead with user facing changes. I think a very simple solution is possible but discussion is needed ahead of implementation.
e18c48d to
f183eb1
Compare
PDoakORNL
left a comment
There was a problem hiding this comment.
This looks good to me outside of the unsigned steps_per_block.
| SFNBranch& branch_engine; | ||
| IndexType recalculate_properties_period; | ||
| IndexType step = -1; | ||
| const size_t steps_per_block; |
There was a problem hiding this comment.
since IndexType is signed and steps_per_block is unsigned and we're likely to compare step and steps_per_block and this potentially an issue. As is doing arithmatic operations with mixed signed and unsiged types. I think steps_per_block should be signed.
There was a problem hiding this comment.
"steps_per_block" should be unsigned by nature.
for (int step = 0; step < steps_per_block_; ++step)
I have not heard of unsigned loop upper bound being an issue.
There was a problem hiding this comment.
The issue is the signed/unsigned comparison, which is not a great habit. We probably have several of them in the code which makes me wonder if compiler warnings have been disabled for this case -- usually there will be a warning.
There was a problem hiding this comment.
since we guarantee the counter step non-negative, there should be no concern. If a real case that matters, I would prefer changing to size_t or int depending on the actual need.
There was a problem hiding this comment.
Something being positive definite is not a good reason to use an unsigned type. If you are going to do any arithmetic with it save some grief and use a signed type.
|
I would like for all the input tags that refer to the "overall total amount of work" to have the normalization included in their name. This is to reduce the risk for confusion and mistakes over what the user might expect. What we choose for "samples" will carry through to other things such as some eventual "total_steps" (to include sum over blocks as well as steps and walkers). While not exactly the same thing, for walkers we have "total_walkers", i.e. the total number of walkers pushed in VMC/DMC, equivalent to the total number of walkers summed over all threads, crowds, mpi processes etc. For samples we could use "total_samples" or perhaps "total_samples_per_cycle". I do not think we should use "samples" since it does not include the normalization. What do you prefer? |
Since |
OK, so lets run with |
Let me review the current change and bring it up to date. Will also update docs in this PR. |
|
@prckent I updated the code. After reviewing the code, the current implementation interprets "samples" input as a requested number. VMC may produce equal or more. DMC is similar to VMC when using fixed population or a rough estimate when using dynamic population. I don't quite get what you mean "normalization", feel free to edit the documentation. |
prckent
left a comment
There was a problem hiding this comment.
Interesting to support samples in DMC. Do you know a use case or were you aiming for consistency?
| const RealType reserve_walkers, | ||
| int num_crowds); | ||
|
|
||
| static size_t determineStepsPerBlock(IndexType global_walkers, |
There was a problem hiding this comment.
Comment with definition needed.
Can you also make everything const so that it is clear e.g. none of the arguments are changed?
There was a problem hiding this comment.
It is a static member function and arguments are passed by value. So everything is stateless and const by nature.
| void testMeasureImbalance() { measureImbalance("Test"); } | ||
| void testDetermineStepsPerBlock() | ||
| { | ||
| CHECK(QMCDriverNew::determineStepsPerBlock(6, 36, 2, 3) == 2); |
There was a problem hiding this comment.
Thanks for adding these. I think sufficient to avoid needing to add integration tests.
Only a bare minimal implementation aiming for consistency. |
|
Test this please |
|
@PDoakORNL I think the signed comparison issue you pointed to is adequately protected. OK to merge? |
PDoakORNL
left a comment
There was a problem hiding this comment.
This sort of thing is not limited to here in the code but I don't like it much. Feel free to override my request for changes its bikeshed topic and its a typical sort of scientific code defect.
But if you mind is not set on using unsiged types because the number is positive definite please read:
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-mix
Maybe you can write better code than our future LLM masters.
| int nblocks = qmcdriver_input.get_max_blocks(); | ||
| int nsteps = qmcdriver_input.get_max_steps(); | ||
| return nblocks * nsteps * local_walkers; | ||
| return num_blocks * samples_per_block * local_walkers; |
There was a problem hiding this comment.
since this is an unsigned multiplication it will silently wrap if it overflows, that is not UB and is not something a sanitizer will catch either.
The only location compute_samples_per_rank is called it is assigned into a signed integer. This I believe is UB the very large unsigned values but likely to be a straight across bit by bit assignment on most machines so the signed integer will be negative. When passed to sample stack then a different thing will happen, I think UNIT_MAX + 1 - something.
This would be the case if I have 1000 block, 30000 samples per black, and have 100 local walkers.
The unsigned int won't "fit" in the signed integer it gets assigned to. For much larger numbers the mult will wrap and I'll just end up with a strangely low number of samples. Not sure what that will do the ensuing loops.
Anyway this is burning too much time to consider and there is not control of the edge case caused by large inputs as far as I can tell in this code.
There was a problem hiding this comment.
The only location compute_samples_per_rank is called it is assigned into a signed integer.
I noticed. I fixed it. compute_samples_per_rank itself is fine, everything should be unsigned as we are handling all kinds of sizes.
I actually agree with what C++ core guidelines say "Do not mix". My issue here is actually the data type of steps, it should be unsigned as well. The reason we keep it signed was the fact that we are using -1 to indicate no input given. I actually feel the need to change the default value to 0. In a broader scope, all the use -1 in input handling should be revisited. |
|
Test this please |
Peter confirmed OK to merge via Slack
Proposed changes
Add
samplesas a way of control independently from run details (MPI ranks, threads)What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
epyc-server
Checklist