Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remaining issues in the GeometryEngine stuff #49

Closed
pgrinaway opened this issue Nov 30, 2015 · 12 comments
Closed

Remaining issues in the GeometryEngine stuff #49

pgrinaway opened this issue Nov 30, 2015 · 12 comments
Assignees
Labels

Comments

@pgrinaway
Copy link
Member

Currently, there are a couple issues with the GeometryEngine, in particular relating to the implementation that uses all angle and torsion potentials when choosing a torsion.

  • Sometimes, the proposal potential results in some nans somehow making their way to the rejection sampler. When a uniform random number is drawn with nan as the upper limit, numpy throws an exception. Looking into why this is still happening despite replacing nans with inf.
  • The speed is a bit of an issue. I've discovered through profiling that this is a result of the conversion between internal and cartesian coordinates. This can probably be accelerated a lot via vectorization or something of the sort.
@pgrinaway pgrinaway self-assigned this Nov 30, 2015
@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

Why use a rejection sampler here? If we have computed the probability vector p, why not just use numpy.random.choice(numpy.arange(phis, p=p)?

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

p must be normalized for numpy.random.choice, but that is easy to do with p / p.sum() if that hasn't been done already.

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

For speeding things up, we have a few possibilities:

Compile a list of all torsions and angles and vectorize computations with numpy

This is a bit tricky, but should be straightforward given that all the other positions are fixed.

Code the energy terms in Cython

This might be the most straightforward way to recode the existing code.

Construct a minimum OpenMM system containing only the relevant atoms, bonds, and torsions

The trick here is to keep track of the atom mapping and compute the Cartesian positions for rotation around the single torsion, but it could also allow us to do more sophisticated things in the future.

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

You might want to return phis in this function to make it easier to do numpy.random.choice (and to make sure you don't get out of sync between the probability vector and torsions).

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

We can also refine your log_p computations a bit, and vectorize them.

For example, the _bond_logp currently looks like this:

        k_eq = bond.type.k*units.kilocalories_per_mole/(units.angstrom**2)
        r0 = bond.type.req*units.nanometers
        sigma = beta*2.0/np.sqrt(2.0*k_eq/k_eq.unit)
        logp = stats.distributions.norm.logpdf(r/r.unit, r0/r0.unit, sigma)
        return logp

We can use the fact that the reduced potential is u(x) = \beta (K/2) (r - r0)^2 to compute this directly in a vectorized manner:

k_eq = bond.type.k*units.kilocalories_per_mole/(units.angstrom**2)
r0 = bond.type.req*units.nanometers
return -beta*(k_eq/2.0)*(r-r0)**2

Does ParmEd store k and req as simtk.unit units natively, or do you need to attach units? It is weird that they would be listed in kcal/mol/A^2 and nm, since that seems like an inconsistent choice; OpenMM certainly uses a different unit system, and AMBER I believe uses kcal/mol and A consistently.

If you need log p(r), you would also need to compute sigma as you note:

k_eq = bond.type.k*units.kilocalories_per_mole/(units.angstrom**2)
r0 = bond.type.req*units.nanometers
sigma = unit.sqrt(1.0 / (beta * k_eq))
return -beta*(k_eq/2.0)*(r-r0)**2 - np.log(np.sqrt(2*pi)*sigma / base_distance_unit)

though note that you need this with reference to an arbitrary base_distance_unit, so it may not be useful to include this (since it is an irrelevant additive constant).

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

Also, there is a bit of confusion between whether you want the log normalized probability density log p(r), log of the unnormalized probability density log q(r), or the reduced potential (negative log of unnormalized density) u(r) = - log q(r).

Here, you seem to be accumulating reduced potentials as ub_angles, but are actually accumulating log p(theta), which has the opposite sign and an irrelevant constant added.

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

I would carefully check all your _log_p functions for:

  • unit errors and consistency of units
  • sign issues regarding whether you want u(x) = - log q(x) or log q(x)
  • eliminate use of states.distributions.norm.logpdf when the additive constants are irrelevant
  • factors of 2 seem to be in the wrong place; remember that harmonic bonds and angles are defined by the reduced potentials
u(r) = beta * (K/2) (r - r0)**2
u(theta) = beta * (K/2) (theta - theta0)**2
  • the beta seems to be on the wrong side of fractions in computing [sigma], so check that
sigma = unit.sqrt(1 / (beta * k))

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

You are returning q(phi) instead of log q(phi) in _torsion_logp here:

        q = np.exp(-beta*(V/2.0)*(1+np.cos(n*internal_coordinates[2]-gamma)))
        return q

@pgrinaway
Copy link
Member Author

Why use a rejection sampler here?

Good point. I had originally done that because I was imagining sampling from a continuous distribution, but it works just as well and is simpler to just treat it as a categorical variable.

Does ParmEd store k and req as simtk.unit units natively, or do you need to attach units?

I needed to attach units, and I found those units in the documentation.

Thanks for the rest of the pointers--you're right, those are errors.

I think this code is fairly disorganized and difficult to test, as you've pointed out. I think from your suggestions, the idea of using OpenMM to calculate these potentials is my favorite in the end (it enables a lot of extra stuff, and outsources calculation of the potential).

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

Let's not do this quite yet, but thinking ahead to what we would do if we wanted to do this in OpenMM:

I bet the easiest way to do this is to use ParmEd where we keep the Topology object (and hence atom numbers) the same, but only include a subset of the force term definitions:

  • use ParmEd to make a deep copy of the Structure object
  • eliminate all the force terms
  • add back only the forcefield terms involving bonds, angles, and torsions for the atom to be placed and atoms that already have their positions defined
  • use createSystem() to create an OpenMM system
  • crank the new atom around its torsion orbit to evaluate energies with the Reference platform

@jchodera
Copy link
Member

jchodera commented Dec 1, 2015

I needed to attach units, and I found those units in the documentation.

It's a bit scattered in terms of what units are in play, but looking at the omm_* family of functions will help us figure out the correct units to attach, e.g. omm_bond_force():

length_conv = u.angstroms.conversion_factor_to(u.nanometers)
        _ambfrc = u.kilocalorie_per_mole/u.angstrom**2
        _ommfrc = u.kilojoule_per_mole/u.nanometer**2
        frc_conv = _ambfrc.conversion_factor_to(_ommfrc)

More generally, I believe AMBER uses akma_unit_system: http://parmed.github.io/ParmEd/html/dimensional_analysis.html#available-unitsystem-options

  • lengths are in angstroms
  • energies are in kcal/mol

I wonder if there is a way to retrieve parameters with units attached automatically...

@pgrinaway
Copy link
Member Author

The refactors that are pending are covered in other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants