Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global driver_version switch for mapping input to drivers #3897

Merged
merged 28 commits into from
Mar 18, 2022

Conversation

PDoakORNL
Copy link
Contributor

@PDoakORNL PDoakORNL commented Mar 4, 2022

Note: After discussion and review, the switch was decided to be named driver_version (not epoch)

This removes the

<qmc method=vmc_batch|dmc_batch|linear_batch>

and replaces them with

    <project id="qmc_short" series="0">
     <parameter name="driver_version">batched</parameter>
   </project>
...
<qmc method=vmc|dmc|linear>

Proposed changes

ProjectData can now be queried get_driver_epoch() and explicitly different input parsing can easily be done for batched drivers vs. legacy.

The unit tests pass.
Most of the short and deterministic batched tests pass. I'm not sure any of the system level batched optimizer test pass. They seem extremely slow.

In the future this will greatly simplify preventing legacy/new input mismatches. i.e. #3875

What type(s) of changes does this code introduce?

  • New feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Testing changes (e.g. new unit/integration/performance tests)
  • Documentation changes

Does this introduce a breaking change?

  • Yes
    For legacy no, for batched yes.
    Your existing batched driver input will stop working.
    You must make the change shown at the top of the PR to have them continue working.

What systems has this change been tested on?

Leconte

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

  • Yes/No. This PR is up to date with current the current state of 'develop'
  • Yes/No. Code added or changed in the PR has been clang-formatted
  • Yes/No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • Yes/No. Documentation has been added (if appropriate)

@ye-luo
Copy link
Contributor

ye-luo commented Mar 4, 2022

I still would like to run xxx_batch. I thought the intended behavior was when the global switch is on. "XXX" is redirected to "XXX_batch".

@PDoakORNL
Copy link
Contributor Author

I thought the desire was to drop the vmc_batch name. @jtkrogel convinced me that not having the second driver name was better and it definitely is considering the legacy handling of input. ProjectData is high level and parsed very early, we need to parse qmchamiltonian differently based on whether the driver is batched or not and this greatly simplifies that.

@prckent
Copy link
Contributor

prckent commented Mar 4, 2022

xxx_batch will be history. This is the intended change. Much friendlier for users and we want to block off the old drivers in any case.

I think we should try to improve on "driver_epoch". There is a lot more than the driver being switched here, plus this tag will be visible for "eternity" given the transition plan. Ask on Slack for ideas.

@jtkrogel
Copy link
Contributor

jtkrogel commented Mar 4, 2022

Any chance we change <parameter name="driver_epoch">batched</parameter> to <parameter name="driver">batched</parameter>? It is about as simple as is needed.

Nexus currently uses driver='batched' and driver='legacy' to switch between these. Whatever is decided, I will update Nexus to match.

@prckent
Copy link
Contributor

prckent commented Mar 4, 2022

Nexus currently uses driver='batched' and driver='legacy' to switch between these. Whatever is decided, I will update Nexus to match.

Any arguments to not use these? Consistency is a strong argument for.

@@ -103,13 +103,14 @@ constexpr std::array<const char*, 2> valid_dmc_input_sections{
constexpr int valid_dmc_input_dmc_index = 0;
constexpr int valid_dmc_input_dmc_batch_index = 1;

/** As far as I can tell these are no longer valid */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to update the WFOptDriverInput class, and I'll change these tests then as well.

@ye-luo
Copy link
Contributor

ye-luo commented Mar 4, 2022

Technically, determine the behavior fully relying on a global state is convenient but bad both for the code parsing and human reading.

As a user, I request the existing input with "vmc_batch" to continue functioning. It is bad to upset users making breaking changes. The cost of keep "vmc_batch" tag to work is nothing and there is no ambiguity.

Here is what consider a sane way.
When "vmc_legacy" or "vmc_batch" is inputted as a method, it should hit exactly the requested driver.
Only when the input is "vmc", driver factory maps it to "vmc_legacy" or "vmc_batch" based on the global state.

For users new to this complexity, they may choose to just rely on the global flag. This is also a desired method to help migration.

Once the legacy vmc driver is deleted, then we can remove "vmc_legacy/batch", there is no ambiguity.

@prckent
Copy link
Contributor

prckent commented Mar 7, 2022

Will post with a suggested transition plan when I have more time this week. I think we could support _batch but with a warning that this will be deprecated in future, similar to how we need a warning when the global flag is not specified. Once Nexus is updated no one should be invoking xxx_batch in any new inputs.

@PDoakORNL
Copy link
Contributor Author

Not having a high level switch between the legacy architecture and the new architecture makes it quite torturous to reject legacy estimators in the qmchamiltonian section as requested in #3875. I support introducing separate code paths for input parsing as the legacy input handling is a mess that is tightly entangled with the legacy simulation objects.

We can support the current _batch as they are now but I don't think we should make the sphaghetti to go back and check previous parsed and constructured objects like the hamiltonian pool.

@ye-luo
Copy link
Contributor

ye-luo commented Mar 7, 2022

Not having a high level switch between the legacy architecture and the new architecture makes it quite torturous to reject legacy estimators in the qmchamiltonian section as requested in #3875.

We do need a high level switch. We agreed on that. I was asking to keep "_batch" functioning. It is only a key mapping issue.

if (curName != "qmc")
qmc_mode = curName;

const int nchars = qmc_mode.size();

// Begin to separate batch input reading from the legacy input parsing
if (project_data_.get_driver_epoch() == "batched")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a bool argument (force_batch) of readSection for and expand unit tests of readSection.
Also check getEngineName() after the driver being created in unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you cannot create a QMCDriverFactory without ProjectData and the global driver epoch is set there, I don't think its a good idea to also allow it to be forced directly. There is one source of truth for this parameter and it is from the ProjectData the driver factory is constructed with.

The _batch are supported for the convenience of users and tests currently using them but are not valid input when the driver epoch is set. Based on the driver epoch input, the qmcsystem and hamiltonian node will be parsed differently so we don't want to encourage further use of the _batch or introduce _legacy as a tag for drivers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that ProjectData is required by QMCDriverFactory. Then there is no need to add a bool argument. I have not read your latest commit but I guess you have both epoch cases covered in the unit test now. What did you mean by "the qmcsystem and hamiltonian node will be parsed differently" did you mean "hamiltonian of qmcsystem" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also going to allow an <estimators> node for global estimator definitions. This will only be valid if in the the batched epoch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. global "estimators" will outside "hamiltonian" but inside "qmcsystem"

@PDoakORNL
Copy link
Contributor Author

Having driver refer to a classification for a group of drivers when its a parameter of a project node, but still having singular drivers types: vmc, linear, dmc in the code, documentation, name attibute of qmc node, or when the node name of a qmc section seems like recipe for confusion.

@ye-luo
Copy link
Contributor

ye-luo commented Mar 10, 2022

@PDoakORNL could you run clang-format on all the changed source file ? Some diff looks strange.
I will get you back the review early this afternoon.

Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver factory change looks good.
Having concerns on the changes in ProjectData.

src/QMCDrivers/tests/test_QMCDriverFactory.cpp Outdated Show resolved Hide resolved
src/Utilities/ProjectData.h Outdated Show resolved Hide resolved
src/Utilities/ProjectData.h Outdated Show resolved Hide resolved
@ye-luo ye-luo changed the title Global batch switch Global batched driver switch Mar 15, 2022
Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments around ProjectData class change.

src/Utilities/tests/test_project_data.cpp Outdated Show resolved Hide resolved
src/Utilities/ProjectData.h Outdated Show resolved Hide resolved
tests/molecules/He_ae/det_He_opt_batch.xml Outdated Show resolved Hide resolved
src/Utilities/ProjectData.cpp Outdated Show resolved Hide resolved
@ye-luo
Copy link
Contributor

ye-luo commented Mar 16, 2022

Test this please

ye-luo
ye-luo previously approved these changes Mar 16, 2022
@prckent prckent self-requested a review March 16, 2022 18:33
Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did we end up with? I could not determine from the discussion here.

@PDoakORNL
Copy link
Contributor Author

Test this please

1 similar comment
@ye-luo
Copy link
Contributor

ye-luo commented Mar 16, 2022

Test this please

@ye-luo
Copy link
Contributor

ye-luo commented Mar 17, 2022

Test this please

ye-luo
ye-luo previously approved these changes Mar 17, 2022
* In the application context project data can indicate the input be read in the context of
* the batched driver architecture.
* param[in] cur qmc section node
* param[in] force_batch forces input to be evaluated as if project driver type = batched
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be force_batch parameter.

@@ -48,12 +48,23 @@ class QMCDriverFactory
QMCRunType new_run_type = QMCRunType::DUMMY;
};

/** Application uses this constructor
* param[in] project_data this is stored as a reference and this state controls later behavior.
* For both the driver factory i.e. driver epoch. And the drivers it creates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover reference to 'driver epoch'

@@ -113,6 +132,9 @@ class ProjectData : public OhmmsElementBase

///max cpu seconds
int max_cpu_secs_;

// The driver epoch of the project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over mention of epoch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this Mark. We should confirm grep -i epoch returns nothing.

@PDoakORNL PDoakORNL disabled auto-merge March 17, 2022 21:29
@PDoakORNL
Copy link
Contributor Author

my refactor didn't reach into the comments. Did a straight text based recursive grep and I don't see any epoch's related to this only the time api.

@ye-luo
Copy link
Contributor

ye-luo commented Mar 17, 2022

Test this please

Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @prckent good for you now?

@prckent prckent self-requested a review March 18, 2022 02:14
Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing a test for the case where driver_version is not specified but vmc_batch is called explicitly (or one of the other _batched drivers). This should work currently but in future when the driver_version tag is made a requirement will break (and be updated to expect_fail status). A quick clone of one of the deterministic tests would handle this.

Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no switch + *_batch call is already tested. (This is what current users of the batched code will have in their inputs).

@prckent prckent merged commit 750d260 into QMCPACK:develop Mar 18, 2022
@ye-luo ye-luo changed the title Global batched driver switch Global driver_version switch for mapping input to drivers Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants