Skip to content

Enable samples input tag in batched drivers#4224

Merged
prckent merged 11 commits into
QMCPACK:developfrom
ye-luo:batched-driver-samples
Apr 4, 2024
Merged

Enable samples input tag in batched drivers#4224
prckent merged 11 commits into
QMCPACK:developfrom
ye-luo:batched-driver-samples

Conversation

@ye-luo
Copy link
Copy Markdown
Contributor

@ye-luo ye-luo commented Sep 9, 2022

Proposed changes

Add samples as a way of control independently from run details (MPI ranks, threads)

What type(s) of changes does this code introduce?

  • New feature

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

epyc-server

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'
  • Yes. Code added or changed in the PR has been clang-formatted
  • Yes. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • Yes. Documentation has been added (if appropriate)

@ye-luo ye-luo force-pushed the batched-driver-samples branch from 3333887 to f183eb1 Compare September 9, 2022 05:31
@prckent prckent self-requested a review September 9, 2022 06:58
Copy link
Copy Markdown
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put up an issue on the topic of new input tags.

There needs to be a clear discussion on the end point. I have ideas that I believe can work, as do you + no doubt others. This will take some time.

Before tackling the optimizer we need to tackle at least VMC and also make sure that the new input tags are sufficiently verbose and self-explanatory. e.g. we will need prefixes total_ input items that are "global" and also to make sure that they don't conflict with historically used terms. Importantly we need a reasonably complete plan for all the main QMC driver input revisions before going ahead with user facing changes. I think a very simple solution is possible but discussion is needed ahead of implementation.

@prckent prckent marked this pull request as draft September 9, 2022 07:07
@ye-luo ye-luo force-pushed the batched-driver-samples branch 2 times, most recently from e18c48d to f183eb1 Compare January 3, 2023 20:35
Copy link
Copy Markdown
Contributor

@PDoakORNL PDoakORNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me outside of the unsigned steps_per_block.

SFNBranch& branch_engine;
IndexType recalculate_properties_period;
IndexType step = -1;
const size_t steps_per_block;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since IndexType is signed and steps_per_block is unsigned and we're likely to compare step and steps_per_block and this potentially an issue. As is doing arithmatic operations with mixed signed and unsiged types. I think steps_per_block should be signed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"steps_per_block" should be unsigned by nature.

for (int step = 0; step < steps_per_block_; ++step)

I have not heard of unsigned loop upper bound being an issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is the signed/unsigned comparison, which is not a great habit. We probably have several of them in the code which makes me wonder if compiler warnings have been disabled for this case -- usually there will be a warning.

Copy link
Copy Markdown
Contributor Author

@ye-luo ye-luo Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we guarantee the counter step non-negative, there should be no concern. If a real case that matters, I would prefer changing to size_t or int depending on the actual need.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something being positive definite is not a good reason to use an unsigned type. If you are going to do any arithmetic with it save some grief and use a signed type.

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Mar 30, 2024

I would like for all the input tags that refer to the "overall total amount of work" to have the normalization included in their name. This is to reduce the risk for confusion and mistakes over what the user might expect. What we choose for "samples" will carry through to other things such as some eventual "total_steps" (to include sum over blocks as well as steps and walkers). While not exactly the same thing, for walkers we have "total_walkers", i.e. the total number of walkers pushed in VMC/DMC, equivalent to the total number of walkers summed over all threads, crowds, mpi processes etc. For samples we could use "total_samples" or perhaps "total_samples_per_cycle". I do not think we should use "samples" since it does not include the normalization. What do you prefer?

@ye-luo
Copy link
Copy Markdown
Contributor Author

ye-luo commented Apr 1, 2024

I would like for all the input tags that refer to the "overall total amount of work" to have the normalization included in their name. This is to reduce the risk for confusion and mistakes over what the user might expect. What we choose for "samples" will carry through to other things such as some eventual "total_steps" (to include sum over blocks as well as steps and walkers). While not exactly the same thing, for walkers we have "total_walkers", i.e. the total number of walkers pushed in VMC/DMC, equivalent to the total number of walkers summed over all threads, crowds, mpi processes etc. For samples we could use "total_samples" or perhaps "total_samples_per_cycle". I do not think we should use "samples" since it does not include the normalization. What do you prefer?

Since samples_per_thread has been removed in the batched drivers. samples is the only input option in use and there is no intention to add more sample related control. It does exactly mean total_samples, for maximizing backward compatibility and considering its briefness. I prefer keeping samples.

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 1, 2024

It does exactly mean total_samples, for maximizing backward compatibility and considering its briefness. I prefer keeping samples.

OK, so lets run with samples. Can always add more options if confusion persists. The documentation needs to be updated to make the normalization clear. I am happy to write that change. However unfortunately I have been stalling this PR for long enough that this PR has conflicts. Would you prefer I update the docs here?

@ye-luo
Copy link
Copy Markdown
Contributor Author

ye-luo commented Apr 1, 2024

It does exactly mean total_samples, for maximizing backward compatibility and considering its briefness. I prefer keeping samples.

OK, so lets run with samples. Can always add more options if confusion persists. The documentation needs to be updated to make the normalization clear. I am happy to write that change. However unfortunately I have been stalling this PR for long enough that this PR has conflicts. Would you prefer I update the docs here?

Let me review the current change and bring it up to date. Will also update docs in this PR.

@ye-luo
Copy link
Copy Markdown
Contributor Author

ye-luo commented Apr 2, 2024

@prckent I updated the code. After reviewing the code, the current implementation interprets "samples" input as a requested number. VMC may produce equal or more. DMC is similar to VMC when using fixed population or a rough estimate when using dynamic population.

I don't quite get what you mean "normalization", feel free to edit the documentation.

@ye-luo ye-luo marked this pull request as ready for review April 2, 2024 19:16
Copy link
Copy Markdown
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to support samples in DMC. Do you know a use case or were you aiming for consistency?

const RealType reserve_walkers,
int num_crowds);

static size_t determineStepsPerBlock(IndexType global_walkers,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment with definition needed.
Can you also make everything const so that it is clear e.g. none of the arguments are changed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a static member function and arguments are passed by value. So everything is stateless and const by nature.

void testMeasureImbalance() { measureImbalance("Test"); }
void testDetermineStepsPerBlock()
{
CHECK(QMCDriverNew::determineStepsPerBlock(6, 36, 2, 3) == 2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these. I think sufficient to avoid needing to add integration tests.

@ye-luo
Copy link
Copy Markdown
Contributor Author

ye-luo commented Apr 4, 2024

Interesting to support samples in DMC. Do you know a use case or were you aiming for consistency?

Only a bare minimal implementation aiming for consistency.

@ye-luo
Copy link
Copy Markdown
Contributor Author

ye-luo commented Apr 4, 2024

Test this please

prckent
prckent previously approved these changes Apr 4, 2024
@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 4, 2024

@PDoakORNL I think the signed comparison issue you pointed to is adequately protected. OK to merge?

Copy link
Copy Markdown
Contributor

@PDoakORNL PDoakORNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of thing is not limited to here in the code but I don't like it much. Feel free to override my request for changes its bikeshed topic and its a typical sort of scientific code defect.

But if you mind is not set on using unsiged types because the number is positive definite please read:
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-mix

Maybe you can write better code than our future LLM masters.

int nblocks = qmcdriver_input.get_max_blocks();
int nsteps = qmcdriver_input.get_max_steps();
return nblocks * nsteps * local_walkers;
return num_blocks * samples_per_block * local_walkers;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is an unsigned multiplication it will silently wrap if it overflows, that is not UB and is not something a sanitizer will catch either.
The only location compute_samples_per_rank is called it is assigned into a signed integer. This I believe is UB the very large unsigned values but likely to be a straight across bit by bit assignment on most machines so the signed integer will be negative. When passed to sample stack then a different thing will happen, I think UNIT_MAX + 1 - something.

This would be the case if I have 1000 block, 30000 samples per black, and have 100 local walkers.
The unsigned int won't "fit" in the signed integer it gets assigned to. For much larger numbers the mult will wrap and I'll just end up with a strangely low number of samples. Not sure what that will do the ensuing loops.

Anyway this is burning too much time to consider and there is not control of the edge case caused by large inputs as far as I can tell in this code.

Copy link
Copy Markdown
Contributor Author

@ye-luo ye-luo Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only location compute_samples_per_rank is called it is assigned into a signed integer.

I noticed. I fixed it. compute_samples_per_rank itself is fine, everything should be unsigned as we are handling all kinds of sizes.

@ye-luo
Copy link
Copy Markdown
Contributor Author

ye-luo commented Apr 4, 2024

This sort of thing is not limited to here in the code but I don't like it much. Feel free to override my request for changes its bikeshed topic and its a typical sort of scientific code defect.

But if you mind is not set on using unsiged types because the number is positive definite please read: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-mix

Maybe you can write better code than our future LLM masters.

I actually agree with what C++ core guidelines say "Do not mix". My issue here is actually the data type of steps, it should be unsigned as well. The reason we keep it signed was the fact that we are using -1 to indicate no input given. I actually feel the need to change the default value to 0. In a broader scope, all the use -1 in input handling should be revisited.

Copy link
Copy Markdown
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto addresses the typing issue raised by Peter.

We should be careful in future about computations such as nsteps*nblocks*total_walkers, which could come close to (u)int32 ranges in extreme cases.

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 4, 2024

Test this please

@prckent prckent dismissed PDoakORNL’s stale review April 4, 2024 20:22

Peter confirmed OK to merge via Slack

@prckent prckent merged commit 5921824 into QMCPACK:develop Apr 4, 2024
@ye-luo ye-luo deleted the batched-driver-samples branch April 4, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants