Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCMC Code Refactoring #44

Merged
merged 54 commits into from
Apr 21, 2017
Merged

NCMC Code Refactoring #44

merged 54 commits into from
Apr 21, 2017

Conversation

nathanmlim
Copy link
Collaborator

@nathanmlim nathanmlim commented Mar 28, 2017

This [WIP] is meant to refactor the NCMC core code. I've placed the new refactor code into a separate directory blues_refactor so that I could test them side-by-side to make sure I haven't introduced any new bugs during refactoring. Briefly, I've basically gone in and tried to reduce large code blocks in ncmc.py and moved them into their own functions. I've added 3 classes to handle some basic routines:

  • SimulationFactory handles generating a set of 3 OpenMM Simulations objects for the MD, NCMC, Alchemical parts of the BLUES engine.
  • LigandModeller intended to handle calculating any properties relating to the ligand/object we're moving around. As of now, we're just calculating particle masses but I think down the line we may want to calculate other properties on like protein residues or water densities etc.
  • ProposeMove intended to be a class where we basically define our move sets.
    Much of the simplifications from introducing these classes can be seen in the new example.py script

TODO:

  • Add Docstrings
  • Add unit tests for new classes
  • Finish up simplifying/moving the ncmc.Simulation.run() code blocks.

@sgill2
Copy link
Collaborator

sgill2 commented Mar 28, 2017

So these are just my initial thoughts/questions on the code.

  1. I'm not too clear on the sims object that you use as the SImulation object argument. It looks like it's supposed to be an object that holds the openmm simulation, alch_sim and ncmc contexts. If that's the case what benefit is there doing it this way as specifying them via separate arguments?
  2. Could you elaborate on what the model parameter is supposed to be?

Overall though, it seems more nicely structured than the previous code, and is heading in a good direction.

README.md Outdated
@@ -12,6 +12,7 @@ This also provides a prototype and validation of the SMIRFF SMIRKS-based force f
* `run_scripts/` - example scripts to run blues
* `systems/` - some example systems to run blues on.

<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a merge conflict here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Will fix that.

blues/example.py Outdated

* Benchmarking related code adapted from:
https://github.com/pandegroup/openmm/blob/master/examples/benchmark.py
(Authors: Peter Eastmen)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eastman

blues/example.py Outdated
def runNCMC(platform_name):
=======
def runNCMC(options):
>>>>>>> 63cd22390b2e6595bd1defa81aa758e62ae31fc2:examples/example.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflicts?

README.md Outdated
Install the required packages into a new environment

```
conda create -c omnia -c omnia/label/dev -n blues python=3.5 openmm==7.0.1 openmmtools alchemy mdtraj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, are we in omnia now?

And, openmm 7.0.1 not 7.1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh, I see, this is creating a blues environment. Sorry, my bad. But the question on OpenMM still stands.)

Copy link
Collaborator Author

@nathanmlim nathanmlim Mar 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I forgot to commit the conflicts in the README.md. It seems this is using a much older version. I'll fix this.

@davidlmobley
Copy link
Member

You have some extra .pyc files here that you should be able to (a) remove, and (b) get ignored by adding/modifying the .gitignore file.

'verbose' : True }

#Defines ncmc move eqns for lambda peturbation of sterics/electrostatics
opt['functions'] = { 'lambda_sterics' : 'step(0.199999-lambda) + step(lambda-0.2)*step(0.8-lambda)*abs(lambda-0.5)*1/0.3 + step(lambda-0.800001)',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? (OK, I can work through it, but it's a bit opaque). What does it control?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Obviously, it controls lambda_sterics... But what exactly is going on here? )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still in the process of forming all my opinions, so I'll keep writing them down as I go.
I think passing an opt dictionary for run() that handles runoptions makes things unclear to the user what actually can be done with it. I'd much rather have too many parameters, which users at least know exist, than having a mysterious, but important parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgill2 - having a dictionary of options can be helpful as it can allow for extension/modification of the options without modifying a list of required arguments. I'm less clear as to whether this specific implementation is the best way to do it or not, but that's the rationale I believe (and to a large extent I buy into it).

@nathanmlim - further comments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I've done here is take this line and chucked it into the opt dictionary. In retrospect, I should move this into the SimulationFactory as it only appears to be needed during the nc_integrator creation.

sims.createSimulationSet(opt)

# Calculate particle masses of object to be moved
from blues.modeller import LigandModeller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if "LigandModeller" is really the best name for this thing if you're expecting perhaps generalizing to calculation of other properties later. What if we wanted to apply BLUES to a protein sidechain (rotamer sampling) or to swapping a ligand with water? (Or to jumping a water, rather than a ligand, between sites?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really attached to naming and couldn't think of a good name on the fly. So if you have suggestions I'm open. I don't even think Modeller was a good choice either since OpenMM uses this to manipulate objects rather than get properties on them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgill2 - suggestions? What is the general purpose of this thing? ObjectProperties? MoveProperties? MoveObject? MovingObject? The last one seems to appeal to me most, as it might or might not be a ligand, and I also don't like the modeller terminology because of (a) UCSF Modeller, and (b) Modeller within OpenMM, as you note, @nathanmlim .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ObjectProperties seems like the best bet here.

model.calculateCOM()

# Propse some move
rot_move = ncmc.ProposeMove(sims.nc, model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ProposeMove be MoveProposal or similar? Right now it reads like a function name (action implied!) to me, so I was confused when I read it -- I thought "wait, you can't be proposing a move yet, as you haven't said what kind of move you want."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can change this to MoveProposal

# Propse some move
rot_move = ncmc.ProposeMove(sims.nc, model)
rot_step = (opt['nstepsNC'] / 2) - 1
nc_move = { 'type' : 'rotation',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgill2 - should we be more specific than "rotation", do you think? Since we might imagine other types of rotations (e.g. rotation matrix rather than random quaternion) maybe this should be "random_rotation"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specific is probably better, especially since we don't know what other kinds of moves we're adding; more specificity can help avoid name overlaps, so "random_rotation", would be better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.


if log_ncmc > randnum:
self.accept += 1
print('NCMC MOVE ACCETPED: {} > {}'.format(log_ncmc, randnum) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ACCEPTED"

randnum = math.log(np.random.random())

### Compute Alchemical Correction Term
if not np.isnan(log_ncmc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgill2 - what happens if it IS a NAN? Not totally obvious to me it doesn't do something strange (does it accept since log_ncmc > randnum below?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to remember something weird happening with NANs in the past, but trying some logical operations with NANs doesn't seem to suggest they would be accepted. I'll make a separate issue to see what happens with the NAN checking removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This originally had the if alchemical_correction==True statement with checking if log_ncmc is NaN. So I think that if you were to add the alchemical correction term, it won't let you add it to NaN. Since I've removed the alchemical_correction switch, should we turn NaN into 0 so that we can still add the alchemical correction term?

npos[i] = rotated_particles[index]
r_com_coord = test_class.calculate_com(npos)
for i in range(3):
np.testing.assert_almost_equal(np.asarray(r_com_coord._value)[i], np.asarray(com_coord._value)[i], decimal=4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd character on this line? (Or missing newline?)

@nathanmlim
Copy link
Collaborator Author

nathanmlim commented Mar 29, 2017

So these are just my initial thoughts/questions on the code.

I'm not too clear on the sims object that you use as the SImulation object argument. It looks like it's supposed to be an object that holds the openmm simulation, alch_sim and ncmc contexts. If that's the case what benefit is there doing it this way as specifying them via separate arguments?
Could you elaborate on what the model parameter is supposed to be?
Overall though, it seems more nicely structured than the previous code, and is heading in a good direction.

The sims object is just a dictionary containing the 3 simulations (md, alch, ncmc) that is generated out of the SimulationFactory class. Really you can name it whatever you want since it a SimulationFactory object. We have to lug these around together anyways and pass it into the BLUES engine itself to do the run, so for simplicity I just put them into a dictionary to pass as a single object.

model is also an object which, in this example, is referencing the ligand. The properties of the model object can be accessed like model.atom_indices , model.totalmass, model.masses etc.

@nathanmlim
Copy link
Collaborator Author

Walked through this with @sgill2 and we have reviewed the refactored code and are certain it does the some process. We did identify some potential bugs and made some other changes that may require another look @davidlmobley. Changes are noted below:

  • Changed the name from ModelProperties to Model
  • Changed the lambda functions from this to
functions = { 'lambda_sterics' : 'min(1, (1/0.3)*abs(lambda-0.5))',
 'lambda_electrostatics' : 'step(0.2-lambda) - 1/0.2*lambda*step(0.2-lambda) + 1/0.2*(lambda-0.8)*step(lambda-0.8)' }
  • Added the negative sign back to the alchemical correction factor in this line

Note: Since the smartdart.py has not been refactored into the new framework and inherits from the old ncmc.py it's tests have been disabled.

@sgill2 sgill2 mentioned this pull request Apr 21, 2017
@davidlmobley
Copy link
Member

@nathanmlim - it would be very helpful to have a comment before the functions line relating to lambda which explains a little more about what these lambda_sterics, lambda_electrostatics, etc. are doing and how they are being written. The factors 1/0.3 and 1/0.2 are very non-obvious to me, as are the use of step (which is not defined here); understanding this would presumably require reading the documentation of the relevant code, but this is not linked to. So I'd have to do some archaeology to figure out what this means.

@nathanmlim
Copy link
Collaborator Author

@davidlmobley: I think that would be a better question for @sgill2 since I just copied over what he had suggested.

Copy link
Member

@davidlmobley davidlmobley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to include some of your info on code architecture/organization in a "developer documentation" README or something similar? (Note you can have additional .md files in the main directory which could be linked from the main README.md). I'm just thinking, @nathanmlim , that this could be helpful for people who are getting involved with developing so you don't always have to show them your slides/explain.

@@ -41,9 +41,6 @@ This package takes advantage of non-candidate equilibrium monte carlo moves (NCM
The heart of the package is found in `blues/ncmc_switching.py`. This holds the framework for NCMC. Particularly, it contains the integrator class that calculates the work done during the NCMC move. It also controls the lambda scaling of parameters. Currently the alchemy package is used to generate the lambda parameters for the ligand, which can potentially modify 5 parameters–sterics, electrostatics, bonds, angles, and torsions.
The class SimNCMC in `blues/ncmc.py` serves as a wrapper for running ncmc simulations. SimNCMC.runSim() actually runs the simulation.

## Example Use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we have no example anymore? What are the plans on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, I still see blues/example.py here -- why no mention in README now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the entire block got removed. It might have gotten accidently deleted when I was trying to resolve conflicts. It should have just been changed to reference the blues/example.py location.

@davidlmobley
Copy link
Member

I agree with the sign of the alchemical correction factor, though I've challenged @sgill2 to think about how to check it. But it seems correct to me.

simulations.createSimulationSet()

blues = Simulation(simulations, ligand, ligand_mover, **opt)
blues.run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example sure becomes nice and concise now. Very nice. :)

blues/ncmc.py Outdated
ligand.masses : list of particle masses of the ligand with units.
ligand.totalmass : integer of the total mass of the ligand.

#Dynamic attribtues that must be updated with each iteration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"attributes"

@nathanmlim
Copy link
Collaborator Author

@davidlmobley

Do you want to include some of your info on code architecture/organization in a "developer documentation" README or something similar? (Note you can have additional .md files in the main directory which could be linked from the main README.md). I'm just thinking, @nathanmlim , that this could be helpful for people who are getting involved with developing so you don't always have to show them your slides/explain.

This is a good idea, I'll make the diagram a bit more detailed so it can serve as a quick overview image of the engine.

Copy link
Member

@davidlmobley davidlmobley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to merge once minor typos and clarification requests are addressed. Let's get this in! Nice work, @nathanmlim .

blues/ncmc.py Outdated
specified uses the residueList in self.residueList
class MoveProposal(object):
"""MoveProposal provides perturbation functions for the model during the NCMC
simulation. Current supported methods: random rotation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"random_rotation" probably better.

blues/ncmc.py Outdated
"""
if ncmc:
#Defines ncmc move eqns for lambda peturbation of sterics/electrostatics
functions = { 'lambda_sterics' : 'min(1, (1/0.3)*abs(lambda-0.5))',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, it still bothers me I can't understand this with the info in the comments.

self.md = self.generateSimFromStruct(self.structure, self.system, **self.opt)
self.alch = self.generateSimFromStruct(self.structure, self.system, **self.opt)
self.nc = self.generateSimFromStruct(self.structure, self.alch_system,
ncmc=True, **self.opt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd character at end of line 332?

#https://github.com/choderalab/perses/blob/master/perses/annihilation/ncmc_switching.py

Authors: Patrick B. Grinaway, Julie M. Behr, and John D. Chodera
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgill2 - add note on outline of paper to see if Julie should be an author.


def getStateInfo(self, context, parameters):
"""Function that gets the State information from the given context and
list of parameters to queuey it with.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query

if log_ncmc > randnum:
self.accept += 1
print('NCMC MOVE ACCEPTED: log_ncmc {} > randnum {}'.format(log_ncmc, randnum) )
#print('accCounter', float(self.accept)/float(stepsdone+1), self.accept)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove old comments? Otherwise use a debug/logging option so they can be used if needed?

self.nc_context.setVelocities(md_state0['velocities'])

def run(self):
"""Function that runs the BLUES engine that iterates of the actions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that iterates of the actions" -> "to iterate over the actions"...?

@davidlmobley
Copy link
Member

@sgill2 - can you get Nathan info to include on how the lambda vectors work? either an explanation or a documentation link to include in the comments or something similar?

@sgill2
Copy link
Collaborator

sgill2 commented Apr 21, 2017

@davidlmobley Could you clarify what you mean by the lambda vector? Do you mean the equations for the functions that control the lambda sterics and electrostatics ?

@davidlmobley
Copy link
Member

@davidlmobley Could you clarify what you mean by the lambda vector? Do you mean the equations for the functions that control the lambda sterics and electrostatics ?

Yes, @sgill2 - see my comments above to Nathan about that. Basically from the code/comments in the code it's impossible for me to know what the lambda sterics/lambda electrostatics stuff means. I'd like this to be better explained in comments, and/or there to be links to documentation. I don't get the factor 1/3, or what step does, etc. Presumably I could figure it out with some archeology/Googling but since it's an obvious question it makes sense to have SOME info in comments or in the code.

@davidlmobley
Copy link
Member

Good to merge once tests pass/you're done, I think, @nathanmlim .

@nathanmlim
Copy link
Collaborator Author

Added in some comments from Sam for the lambda functions. Also added in devdocs/ folder which has a somewhat incomplete class diagram (haven't gone in and added in the full list of attributes to the other classes) but I'll do that in a separate PR so we can get this merged.

@nathanmlim nathanmlim merged commit 6020ae2 into master Apr 21, 2017
@nathanmlim nathanmlim deleted the refactor branch April 21, 2017 23:32
@nathanmlim
Copy link
Collaborator Author

Merged.

@davidlmobley
Copy link
Member

Yay, congrats!

@davidlmobley davidlmobley mentioned this pull request Apr 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants