Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occasional optimizer failure via improved filter to accept more eigenvalues #4917

Merged
merged 4 commits into from
Apr 29, 2024

Conversation

markdewing
Copy link
Contributor

Sometimes no eigenvalue is found from the mapping (filter and sort), which leads the code to select the first eigenvalue returned from the solver, which often sends the optimizer parameters in a very undesirable direction.

The results of this situation look something like this qmca output:

                            LocalEnergy               Variance           ratio 
opt1_j2_7_6  series 0  -74.870455 +/- 0.024794   5.933186 +/- 0.535541   0.0792 
opt1_j2_7_6  series 1  -75.001842 +/- 0.037365   7.256994 +/- 2.932719   0.0968 
opt1_j2_7_6  series 2  -74.929538 +/- 0.016134   4.463142 +/- 0.154065   0.0596 
opt1_j2_7_6  series 3  -38087874.983748 +/- 13522761.916354   52176250689832632320.000000 +/- 20073852128999710720.000000   1369891355506.0911 
opt1_j2_7_6  series 4  -35506733.280610 +/- 11037598.694781   51117489304034312192.000000 +/- 17345565406866282496.000000   1439656216753.3770 
opt1_j2_7_6  series 5  -9073700.394155 +/- 3700257.273053   11376395351848120320.000000 +/- 5387829541237716992.000000   1253776833889.7781 

The workaround is to detect the situation where no acceptable eigenvalue is found, and run a second filter which accepts eigenvalues above the current VMC energy ( E_0 = H(0,0) ). It cuts off at E_0 + 100, to mirror the lower cutoff.

The solution could be simpler, but the fix was chosen to keep the existing eigenvalue selection unchanged as much as possible.

Some notes on the details of eigenvalue selection:

Why 2.0? I don't know. It was introduced in 4b5b0a3 (2011). Prior to that change, the code searched for the eigenvalue closest to E_0 that falls between E_0 and E_0 - 1e10. After that change, the range was narrowed to E_0 and E_0 - 100, and the code searched for the eigenvalue closest to E_0 - 2.0. My guess is that a slightly lower eigenvalue might lead to faster convergence if there are a number of eigenvalues close to E_0 (??) but that really ought to be tested.

Why E_0 - 100 for the cutoff? I also don't know. My guess is it's to cut off obviously spurious eigenvalues which could arise from noise. Since this is a fixed value, it could cause problems for systems with very large total energies.

What type(s) of changes does this code introduce?

Delete the items that do not apply

  • Bugfix

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

desktop

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

  • Yes. This PR is up to date with current the current state of 'develop'
  • Yes. Code added or changed in the PR has been clang-formatted
  • No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • No. Documentation has been added (if appropriate)

Sometimes no eigenvalue is found from the mapping (filter and sort), which
leads the code to select the first eigenvalue returned from the solver,
which often sends the optimizer parameters in a very undesirable direction.

The workaround is to detect the situation where no acceptable eigenvalue
is found, and run a second filter which accepts eigenvalues above the
current VMC energy ( H(0,0) ).
The solution could be simpler, but the fix was chosen to keep the
existing eigenvalue selection unchanged as much as possible.
@prckent
Copy link
Contributor

prckent commented Feb 9, 2024

Test this please

@prckent
Copy link
Contributor

prckent commented Feb 9, 2024

Very interesting! Thanks Mark. Can you describe the situation that triggered this?

@markdewing
Copy link
Contributor Author

The context is testing for #4901. The cusp correction depends on a discrete parameter (locations where the second derivative of the correction matches the second derivative the basis function). I had set up some scripts to scan across that parameter, and this problem with the optimizer would show up in a few percent of the cases.

I had seen it before with some orbital rotation optimization, but hadn't tracked down the issue (It doesn't require orbital rotation to be involved - the problem could show up just optimizing J2 and J1, though it was more frequent when also optimizing J3)

@ye-luo
Copy link
Contributor

ye-luo commented Feb 12, 2024

It doesn't seem the case that E_0 = H(0,0) but E_0 = (H * Inv)[0,0] where inv is the inverse of overlap matrix.

@markdewing
Copy link
Contributor Author

For the overlap matrix, the first row and column are the identity matrix, so the inverse also ends up as the identity matrix in the first row and column, and (H * Inv)[0, 0] ends up being the same as H[0,0].

Copy link
Contributor

@prckent prckent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark.

No issues with this PR since this doesn't change the relevant logic, but I do wonder about handling of excited states in future when more of the wavefunction is optimized.

@prckent prckent changed the title Fix occasional optimizer failure. Fix occasional optimizer failure via improved filter to accept more eigenvalues Apr 29, 2024
@prckent
Copy link
Contributor

prckent commented Apr 29, 2024

Test this please

@prckent prckent enabled auto-merge April 29, 2024 17:57
@prckent
Copy link
Contributor

prckent commented Apr 29, 2024

Test this please

@prckent prckent merged commit efdf45b into QMCPACK:develop Apr 29, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants