Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing simulation frames for MultiStateSampler #638

Closed
xiki-tempula opened this issue Dec 1, 2022 · 3 comments · Fixed by #676
Closed

Storing simulation frames for MultiStateSampler #638

xiki-tempula opened this issue Dec 1, 2022 · 3 comments · Fixed by #676

Comments

@xiki-tempula
Copy link

xiki-tempula commented Dec 1, 2022

I'm interested in using MultiStateSampler to run a simulation with MultiStateReporter but I struggle to find where the frames are stored. I create the object with

    reporter = multistate.MultiStateReporter(
        storage_path,
        checkpoint_interval=1000,
    )
    simulation = multistate.MultiStateSampler(
        mcmc_moves=move, number_of_iterations=100, online_analysis_interval=1000
    )
    simulation.create(
        thermodynamic_states=compound_thermodynamic_states,
        sampler_states=states.SamplerState(
            crd.positions, box_vectors=crd.getBoxVectors()
        ),
        storage=reporter,
        initial_thermodynamic_states=[i],
    )

So I expect to find 100 frames of positions in the storage_path. However, I cannot find a field called the position in the file.
I dig a bit deeper and found that there are positions in reporter._storage[1]['positions'].

<class 'netCDF4._netCDF4.Variable'>
float32 positions(iteration, replica, atom, spatial)
    units: nm
    long_name: positions[iteration][replica][atom][spatial] is position of coordinate 'spatial' of atom 'atom' from replica 'replica' for iteration 'iteration'.
unlimited dimensions: iteration
current shape = (1, 1, 6462, 3)

However, despite that I specified the number_of_iterations to 100, the first dimension is still one.

I also noted that there are two dataset under reporter._storage but only reporter._storage[0] is being stored in storage_path

I wonder how do I get the positions/box dimension/velocity sampled in the simulation? Thank you.

Archive.zip

@ijpulidos
Copy link
Contributor

Currently the way to access the positions is through the checkpoint file. That means that if you want to have positions stored at a certain frequency, you need to specify that using the checkpoint_interval for your MultiStateReporter, in number of iterations. That would be in line 81 of your script.

This means that we will only get something like floor(iterations/checkpoint_interval) + 1 "frames", where we can check the positions.

Once that is specified using reasonable interval numbers, one would just access all the positions for a specified frame_index and replica_index, using something like the following

ncobject = reporter._storage[1]
positions = ncobject.variables['positions'][frame_index, replica_index, :, :]

I also noted that there are two dataset under reporter._storage but only reporter._storage[0] is being stored in storage_path

Yes, both the main file and the checkpoint file can be accessed through the private ._storage attribute. The first one is the main file, which means the lambda_0.nc file in your example (this one only stores the energies and states informations for the last iteration), whereas the ._storage[1] is the first subfile, which points to the checkpoint file which stores positions and velocities, among others. I hope this helps to make it clearer.

I understand this can be confusing and this is something that we want to be improving in the future. Thanks for the feedback.

@ijpulidos
Copy link
Contributor

We should probably want to document how to extract positions and velocities from the netcdf files as well. I don't think this is documented anywhere.

@ijpulidos
Copy link
Contributor

@xiki-tempula Digging a bit further into this and thanks to what was pointed by @jchodera. If you want to have the positions you can do this in the two following ways:

  • Specify the subset of analysis_particle_indices they would like written every iteration to MultiStateReporter when initializing it if they want to write a subset of particle indices (e.g. solute only). This will write the positions in the non-checkpoint storage file for the specified particles.
  • The other option is what I mentioned earlier and it's through the checkpoint_interval option, if you want to store them for every iteration then you can specify checkpoint_interval=1 when initializing the MultiStateReporter. These will get written in the checkpoint file, of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants