Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reproducible version of COVID Moonshot example anyone can run as an example #1145

Merged
merged 14 commits into from
Feb 19, 2023

Conversation

jchodera
Copy link
Member

This PR adds examples from the COVID Moonshot that anyone should be able to run.

This is currently a work in progress, but includes:

  • moonshot-mainseries/ - relative binding free energy calculations primary synthetic series milestones COVID Moonshot
  • moonshot-sulfonamides/ - relative binding free energy calculations of a congeneric sulfonamide series from the COVID Moonshot from a single structure

# Modeller needs topology and positions. Lots of trial and error found that this is what works to get these from
# an openforcefield Molecule object that was created from a RDKit molecule.
# The topology part is described in the openforcefield API but the positions part grabs the first (and only)
# conformer and passes it to Modeller. It works. Don't ask why!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any documentation we can refer to here?

simulation.reporters.append(XTCReporter(arguments['--xtctraj'], reporting_interval, atomSubset=output_indices))

# Add reporter to generate PDB trajectory, if requested
# NOTE: The PDBReporter does not currently support atom subsets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way to do this in the future? For ligand<->residue contacts just writing out the ligand + selected residues would be super useful..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be easy--I've opened an issue: openmm/openmm#3925

output_positions = context.getState(getPositions=True, enforcePeriodicBox=False).getPositions(asNumpy=True)
with open(arguments['--final'], 'w') as outfile:
PDBFile.writeFile(output_topology, output_positions[output_indices,:], file=outfile, keepIds=False);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be in scope here to implement writing out some 'representative' snapshots, i.e. do a quick clustering of the trajectory and write cluster centers to PDBs? Something like https://mdtraj.org/1.9.3/examples/centroids.html

But then again, not sure how well that scales to PL systems.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(asking because this is an ASAP request -> would like to see quantitatively how much ligand sections like to reside in specific subpockets)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, but I'm not certain yet whether we need to keep this particular script here.


"""

cdd_csv_filename = 'CDD CSV Export.csv'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdd_csv_filename = 'CDD CSV Export.csv'
cdd_csv_filename = 'CDD CSV Export.csv'
TODO: implement path into CLI

# Drop NaNs
print(f'Dropping NaNs...')
df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True)
print(f'{len(df)} records remain')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add an exception here in case len(df) == 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JenkeScheen : Could you help generalize this and migrate it to https://github.com/choderalab/covid-moonshot-ml?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC @kaminow is doing this for the moonshot manuscript?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jchodera @JenkeScheen I've just opened a new PR for this: asapdiscovery/asapdiscovery#130

Comment on lines +303 to +306
pred = oechem.OEAtomMatchResidue(["GLN:189: :A"])
protonate_opts = opts.GetPrepOptions().GetProtonateOptions();
place_hydrogens_opts = protonate_opts.GetPlaceHydrogensOptions()
place_hydrogens_opts.SetNoFlipPredicate(pred)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is giving me downstream issues with OpenMM complaining about protonation states. For some PDBs (I can provide exmaples if needed) this set of place_hydrogens_opts produces wonky forms of histidine (e.g. all single bonds for some reason?). Using this block instead:

    pred = oechem.OEAtomMatchResidue(["GLN:189:.*:.*:.*"])
    protonate_opts = opts.GetPrepOptions().GetProtonateOptions();
    place_hydrogens_opts = protonate_opts.GetPlaceHydrogensOptions()
    place_hydrogens_opts.SetBypassPredicate(pred)
    place_hydrogens_opts.SetNoFlipPredicate(pred)
    protonate_opts = oespruce.OEProtonateDesignUnitOptions(place_hydrogens_opts) # ?
    opts.GetPrepOptions().SetProtonateOptions(protonate_opts) # ?

Fixes most of the issues. Perhaps @kaminow or @apayne97 can comment on the best practice here?

@jchodera jchodera marked this pull request as ready for review February 19, 2023 21:51
@jchodera
Copy link
Member Author

Merging this example because it's needed for the manuscript.

@jchodera jchodera merged commit 9a7d593 into main Feb 19, 2023
@jchodera jchodera deleted the moonshot-examples branch February 19, 2023 21:52
@jchodera jchodera changed the title Add reproducible version of COVID Moonshot examples anyone can run as an example Add reproducible version of COVID Moonshot example anyone can run as an example Feb 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants