Storing simulation frames for MultiStateSampler #638

xiki-tempula · 2022-12-01T16:57:24Z

I'm interested in using MultiStateSampler to run a simulation with MultiStateReporter but I struggle to find where the frames are stored. I create the object with

    reporter = multistate.MultiStateReporter(
        storage_path,
        checkpoint_interval=1000,
    )
    simulation = multistate.MultiStateSampler(
        mcmc_moves=move, number_of_iterations=100, online_analysis_interval=1000
    )
    simulation.create(
        thermodynamic_states=compound_thermodynamic_states,
        sampler_states=states.SamplerState(
            crd.positions, box_vectors=crd.getBoxVectors()
        ),
        storage=reporter,
        initial_thermodynamic_states=[i],
    )

So I expect to find 100 frames of positions in the storage_path. However, I cannot find a field called the position in the file.
I dig a bit deeper and found that there are positions in reporter._storage[1]['positions'].

<class 'netCDF4._netCDF4.Variable'>
float32 positions(iteration, replica, atom, spatial)
    units: nm
    long_name: positions[iteration][replica][atom][spatial] is position of coordinate 'spatial' of atom 'atom' from replica 'replica' for iteration 'iteration'.
unlimited dimensions: iteration
current shape = (1, 1, 6462, 3)

However, despite that I specified the number_of_iterations to 100, the first dimension is still one.

I also noted that there are two dataset under reporter._storage but only reporter._storage[0] is being stored in storage_path

I wonder how do I get the positions/box dimension/velocity sampled in the simulation? Thank you.

Archive.zip

The text was updated successfully, but these errors were encountered:

ijpulidos · 2022-12-06T00:09:23Z

Currently the way to access the positions is through the checkpoint file. That means that if you want to have positions stored at a certain frequency, you need to specify that using the checkpoint_interval for your MultiStateReporter, in number of iterations. That would be in line 81 of your script.

This means that we will only get something like floor(iterations/checkpoint_interval) + 1 "frames", where we can check the positions.

Once that is specified using reasonable interval numbers, one would just access all the positions for a specified frame_index and replica_index, using something like the following

ncobject = reporter._storage[1]
positions = ncobject.variables['positions'][frame_index, replica_index, :, :]

I also noted that there are two dataset under reporter._storage but only reporter._storage[0] is being stored in storage_path

Yes, both the main file and the checkpoint file can be accessed through the private ._storage attribute. The first one is the main file, which means the lambda_0.nc file in your example (this one only stores the energies and states informations for the last iteration), whereas the ._storage[1] is the first subfile, which points to the checkpoint file which stores positions and velocities, among others. I hope this helps to make it clearer.

I understand this can be confusing and this is something that we want to be improving in the future. Thanks for the feedback.

ijpulidos · 2022-12-06T00:27:28Z

We should probably want to document how to extract positions and velocities from the netcdf files as well. I don't think this is documented anywhere.

ijpulidos · 2022-12-06T16:46:45Z

@xiki-tempula Digging a bit further into this and thanks to what was pointed by @jchodera. If you want to have the positions you can do this in the two following ways:

Specify the subset of analysis_particle_indices they would like written every iteration to MultiStateReporter when initializing it if they want to write a subset of particle indices (e.g. solute only). This will write the positions in the non-checkpoint storage file for the specified particles.
The other option is what I mentioned earlier and it's through the checkpoint_interval option, if you want to store them for every iteration then you can specify checkpoint_interval=1 when initializing the MultiStateReporter. These will get written in the checkpoint file, of course.

xiki-tempula mentioned this issue Dec 5, 2022

About ReplicaExchangeSampler #631

Closed

ijpulidos mentioned this issue Dec 6, 2022

Improve analysis_particle_indices documentation for MultiStateReporter. Storing positions. #641

Closed

ijpulidos mentioned this issue Apr 4, 2023

Improve docstring for analysis_particle_indices in MultiStateReporter #676

Merged

5 tasks

mikemhenry closed this as completed in #676 Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing simulation frames for MultiStateSampler #638

Storing simulation frames for MultiStateSampler #638

xiki-tempula commented Dec 1, 2022 •

edited

ijpulidos commented Dec 6, 2022

ijpulidos commented Dec 6, 2022

ijpulidos commented Dec 6, 2022

Storing simulation frames for MultiStateSampler #638

Storing simulation frames for MultiStateSampler #638

Comments

xiki-tempula commented Dec 1, 2022 • edited

ijpulidos commented Dec 6, 2022

ijpulidos commented Dec 6, 2022

ijpulidos commented Dec 6, 2022

xiki-tempula commented Dec 1, 2022 •

edited