Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Available simulation results? #20

Open
dombrno opened this issue Mar 21, 2018 · 22 comments
Open

Available simulation results? #20

dombrno opened this issue Mar 21, 2018 · 22 comments
Assignees

Comments

@dombrno
Copy link
Contributor

dombrno commented Mar 21, 2018

Good evening,
I am using ALPSCORE/CT-HYB for the calculation of dynamic susceptibilities. This involves calculating G2, inverting the BSE equation, and performing the analytic continuation. The analytic continuation uses Maxent and thus needs some error information as input data. In order to feed a god estimate of the error to Maxent, I am considering a Jackknife resampling procedure on the output of the simulation. Hence, my questions:
a) is there anything better/easier than this available somewhere within the ALPS project?
b) if not, is the detailed simulation data available in the h5 output file?

Thanks a lot for your help.

@shinaoka
Copy link
Collaborator

shinaoka commented Mar 21, 2018

Hi, I did not implement estimate of error bars simply because I did not see a need at that point.
There is no available error information in current h5 output file.

How do you propagate the errors of average sign in a Jackknife resampling?

@dombrno
Copy link
Contributor Author

dombrno commented Mar 21, 2018

The idea would be to mimick a series of shorter QMC runs by splitting the full set of points of the simulation from a single run into bunches. We would then invert the BSE and obtain the dynamic susceptibility in imaginary bosonic frequency, based on each of these bunches. For this, what we would need is G2(l,l', omega_n) at each measurement point of the QMC simulation, if that is dumped anywhere?
Said differently, we are trying to avoid having to run a number of independent runs by analyzing the data of the long run we already have. but probably these data are not dumped, I would imagine this would be several Go large.

@shinaoka shinaoka self-assigned this Mar 21, 2018
@dombrno
Copy link
Contributor Author

dombrno commented Mar 21, 2018

In the old times, in the very first version of Alps, there was a function "evaluate" which actually had to be called once the QMC was finished, and did the all the averaging etc. Based on this architecture, it would have been possible to do the procedure mentioned above. But I understand that the new architecture probably does not give access to this fine-grained detail, which is not necessary, since the averaging is done at the end of the QMC run before results are dumped.

@shinaoka
Copy link
Collaborator

It may be possible with the accumulator or alea libraries in ALPSCore.
Markus Wallerberger may be the right person to answer this technical question.
Alex, Emanuel, could you assign him to this thread?

@shinaoka
Copy link
Collaborator

Alex, Emanuel, any idea?

@egull
Copy link
Contributor

egull commented Mar 27, 2018

Yes. We have the FullBinning observable in the old ALPS. That one will keep a number of bins from the simulation (typically 128, I think) and fill them with the values. They are then written into the HDF5 file. A BS equation calculation can then take the data from the bins and post process them.
A word of warning though: this will require 128 times the storage for the HDF5 file (and in memory), which is substantial for vertex functions.
Hiroshi, to enable this you would have to find the accumulator and change it from mean or nobinning to fullbinning. @dombrno would then have to rerun the calculation.
With the new ALPS we have the same capabilities, just the rewrite of the ALEA has the binning procedure optimized a bit. A paper is in preparation.

@dombrno
Copy link
Contributor Author

dombrno commented Mar 27, 2018

I understand, that makes sense: I remember that in the course of adapting the old Alps code (segment) for my needs (expand it to two orbitals, and use non diagonal hybridization function), I replaced the proposed accumulators with others which did not do any binning, and enjoyed a much more compact output file (I did not check memory consumption, and I was only calculating single-particle GF).

@egull
Copy link
Contributor

egull commented Mar 27, 2018

So... how can we help? Should we get you a version that can do binning so you can try?

@dombrno
Copy link
Contributor Author

dombrno commented Mar 27, 2018

I think that before anything is done, I should assess the memory and storage which are needed at the moment, and consider how much more the hardware is able to handle on my side.

@shinaoka
Copy link
Collaborator

Thank you, Emanuel.

@dombrno
I want to ask you a real need for your BS equation stuff.
What if you just run the CT-HYB solver several times with different random seeds?
This may also give some estimates of error bars.

@dombrno
Copy link
Contributor Author

dombrno commented Mar 27, 2018

Yes, that is an option.

Maybe we can keep this feature as a "nice to have" option, but I would give it the lowest priority.

@dombrno
Copy link
Contributor Author

dombrno commented Mar 27, 2018

For what it's worth, with the current code, the h5 file is 100 Mo large, while, and the calculation consumes 5Go RAM. The available RAM is 128Go on my hardware, so I could probably use up to 20 bins, if this option were ever implemented.

If I do have a real strong need for it, I will ask for guidance as to which type of accumulator is best to use in your opinion, and implement it on my side, but for the time being I thank you for your answer and your help, which perfectly answers my initial question.

@dombrno dombrno closed this as completed Mar 27, 2018
@dombrno
Copy link
Contributor Author

dombrno commented Apr 27, 2018

@shinaoka I finally went for the option you suggested: run the CT-HYB solver several times with different random seeds. I do this by using the job array feature of the PBS scheduler: a number of jobs are launched with the exact same inputs. Each job uses one full node with 24 cpus. The only difference between each job is the value of SEED in the input.ini file.

Does it look reasonable to you? In particular, as the seed in my setup increases by one at each node, I would like to make sure that the same seed is used by all cpus controlled by a given job, so that there can be no seed overlap between cpus from different jobs. In other terms, I would like a confirmation that all the mpi processes of a single run share the same seed. This is a probably a question for @galexv ? Thanks a lot.

@shinaoka
Copy link
Collaborator

Hmm, the value of the seed increases one by one for different nodes?
This could cause a problem.
For the MPI process of rank n, its pseudorandom number generator is initialized with SEED + n.
Here, SEED is the seed in a given file. (This is the specification of ALPSCore libraries)
So, the value of SEED should increase at least by 24 from one node to another one.

@shinaoka shinaoka reopened this Apr 27, 2018
@shinaoka
Copy link
Collaborator

To prevent this happen, I may be able to apply some non-linear transformation $f$ to SEED to initialize Brandon-number generators with f(SEED) + n.
What do you think?

@dombrno
Copy link
Contributor Author

dombrno commented Apr 28, 2018

I suspected this could be the case, thanks for clarifying - I will simply increase the seed by 24 on each node, and should be fine then. Thank you!

@dombrno
Copy link
Contributor Author

dombrno commented Apr 28, 2018

Maybe just one detail to make sure everything is working as expected: I am controlling the seed via the key "SEED" at the top level of the .ini file, based on what I saw implemented in alps/mc/mcbase.cp. Is this the recommended way to control this parameter?

@shinaoka
Copy link
Collaborator

You're right!

@dombrno
Copy link
Contributor Author

dombrno commented May 1, 2018

I have now obtained the data corresponding to 64 runs on a single node, using different seed values. I would like to do some resampling of the quantities G1_LEGENDRE and G2_LEGENDRE. For this purpose I need to use the number of measurements performed on each node for these quantities (the average sign is 1.0). It looks like

  • /simulation/results/G1_Re/count
  • /simulation/results/G2_Re/count

might be the suitable fields - can you please confirm if this is correct?
Thank you!

@egull
Copy link
Contributor

egull commented May 1, 2018 via email

@shinaoka
Copy link
Collaborator

shinaoka commented May 1, 2018

BTW, why do you need to know the number of measurements?

@dombrno
Copy link
Contributor Author

dombrno commented May 1, 2018

Well, I have 64 samples, and need to calculate the values of G1 and G2 over subsets of these samples, so I was thinking that the number of measurements is the reasonable weight to apply to the contribution of each sample to the partial resummation, given the fact that the average sign is 1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants