Skip to content

Nexus: use data hashing in heuristic equilibration detection algorithm#4557

Merged
mdewing merged 5 commits into
QMCPACK:developfrom
jtkrogel:nx_deterministic_equil
Apr 26, 2023
Merged

Nexus: use data hashing in heuristic equilibration detection algorithm#4557
mdewing merged 5 commits into
QMCPACK:developfrom
jtkrogel:nx_deterministic_equil

Conversation

@jtkrogel
Copy link
Copy Markdown
Contributor

@jtkrogel jtkrogel commented Apr 18, 2023

Proposed changes

The heuristic equilibration detection algorithm used by qmca when -e is unspecified uses a random number to avoid stastistical bias in the selection of the equilibration length. This approach generates an estimate of the mean that varies each time qmca is run. While statistically correct, users expect deterministic behavior. This PR seeds the RNG based on the hash of the data being analyzed. This preserves use of the RNG for unbiasedness while presenting deterministic behavior to the user.

Addresses #4556. The reported means are now consistent when multiple files are used (or when the same file is used repeatedly):

>qmca -q e */*/*.scalar.dat
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.290314 +/- 0.029507 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.336379 +/- 0.093084 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.176166 +/- 0.030095 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.198560 +/- 0.027040 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.274630 +/- 0.026488 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.168970 +/- 0.031373 

>qmca -q e d*/*/*.scalar.dat
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.290314 +/- 0.029507 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.336379 +/- 0.093084 
debug/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.176166 +/- 0.030095 

>qmca -q e o*/*/*.scalar.dat
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_4990_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.198560 +/- 0.027040 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5000_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -197.274630 +/- 0.026488 
orig/vmc_perf_pbe_u_None_3x3x1_1x1x1_5010_ccecparep_legacy_500_500_1.5_50_True_hf/vmc  series 0  LocalEnergy           =  -198.168970 +/- 0.031373 

What type(s) of changes does this code introduce?

  • New feature

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Workstation

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'

@mdewing
Copy link
Copy Markdown
Contributor

mdewing commented Apr 18, 2023

There are still some ways this might not reproduce the value.

  • The hash function (https://docs.python.org/3/reference/datamodel.html#object.__hash__ ) includes a random salt for string and byte values. The data here is probably only numeric (numpy array?), so this most likely won't be an issue. Since Python definitions don't have a declared type, the type of x is not immediately obvious.
  • The hash value is not guaranteed to be the same between Python versions (but it usually is).
  • Changing the data (removing the last line, for instance), could result in a larger than expected change in the output.

These are probably sufficiently unlikely scenarios that having a consistent output via hashing is the more useful solution, but I did want to get them listed and considered.

@jtkrogel
Copy link
Copy Markdown
Contributor Author

The data are numeric (numpy float64). If a deterministic solution other than hashing is desired, please state it and I will implement that.

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 18, 2023

This would work for me. Have you thought about printing out the equilibration length? Giving an indication that equilibration length was taken into account was a good suggestion. (Definitely could be done in another PR)

@mdewing
Copy link
Copy Markdown
Contributor

mdewing commented Apr 20, 2023

Test this please

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 25, 2023

Test this please

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 25, 2023

Test this please

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 26, 2023

Test this please

@prckent
Copy link
Copy Markdown
Contributor

prckent commented Apr 26, 2023

Test this please

@mdewing mdewing merged commit 421d1b5 into QMCPACK:develop Apr 26, 2023
@prckent prckent mentioned this pull request Aug 18, 2023
@jtkrogel jtkrogel deleted the nx_deterministic_equil branch November 19, 2024 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants