Overhaul atom mapping #809

jchodera · 2021-06-19T07:04:32Z

This PR attempts to address some of the issues described in #805

Refactor SmallMoleculeSetProposalEngine to move all user-modifiable attributes from propose() to be class attributes. These attributes should impact all proposals, so they make sense to have at the class level, rather than as optional arguments to propose().
Expose option to modify cutoff used for geometry-derived mappings
Enable YAML to specify cutoff for geometry-derived mappings
Check for problematic mappings using scheme in Check factorization of partition function #792
Clean up problematic mappings (e.g. de-mapping partially mapped rings)
Add tests for compounds that led to problematic mappings
Fix Sporadic unspecified stereochemistry issue with generate_atom_mapping_from_positions #826

@dominicrufa : I've got a start at addressing #804, but could use some pointers on how to add new options to the YAML file to carry over into geometry-derived mappings!

…tes, and enable coordinate precision for geometry-derived mappings to be specified.

…ine API

codecov · 2021-06-20T04:41:17Z

Codecov Report

Merging #809 (788055d) into master (0352f82) will increase coverage by 0.86%.
The diff coverage is 86.49%.

jchodera · 2021-06-21T06:24:44Z

For the geometry-derived mappings, we should be using the largest connected subgraph of matched nodes that does not contain only a subset of any cycles. Cycles should be either contained in their entirety or have all nodes omitted.

We can use an algorithm that first computes the largest connected subset of all matched nodes and then iterates over all cycles, removing any nodes for which the cycle only contains a partial (rather than complete or empty) set of matched nodes.

Alternatively, we could try to match the whole cycle if they share common subgraphs, but we'd need to be more clever about how exactly we do this graph isomorphism "helped" by partial matches.

…cule-proposal

…ng arguments in YAML

jchodera · 2021-06-22T04:35:17Z

@zhang-ivy : I'm seeing test failures on protein mutation tests---any idea what this might mean?


    def test_protein_counterion_topology_fix_negitive():
        """
        mutate alanine dipeptide into ARG dipeptide and assert that the appropriate number of water indices are identified
        """
        from perses.rjmc.topology_proposal import PolymerProposalEngine
        new_res = 'ARG'
        charge_diff = -1
    
        # Make a vacuum system
        atp, system_generator = generate_atp(phase='vacuum')
    
        # Make a solvated system/topology/positions with modeller
        modeller = app.Modeller(atp.topology, atp.positions)
        modeller.addSolvent(system_generator.forcefield, model='tip3p', padding=9*unit.angstroms, ionicStrength=0.15*unit.molar)
        solvated_topology = modeller.getTopology()
        solvated_positions = modeller.getPositions()
    
        # Canonicalize the solvated positions: turn tuples into np.array
        atp.positions = unit.quantity.Quantity(value=np.array([list(atom_pos) for atom_pos in solvated_positions.value_in_unit_system(unit.md_unit_system)]), unit=unit.nanometers)
        atp.topology = solvated_topology
    
        atp.system = system_generator.create_system(atp.topology)
    
        # Make a topology proposal and generate new positions
        top_proposal, new_pos, _, _ = generate_dipeptide_top_pos_sys(topology = atp.topology,
                                       new_res = new_res,
                                       system = atp.system,
                                       positions = atp.positions,
                                       system_generator = system_generator,
                                       conduct_geometry_prop = True,
                                       conduct_htf_prop = False,
                                       validate_energy_bookkeeping=True,
                                       )
    
        # Get the charge difference
        charge_diff_test = PolymerProposalEngine._get_charge_difference(top_proposal._old_topology.residue_topology.name,
                                                                    top_proposal._new_topology.residue_topology.name)
        assert charge_diff_test == charge_diff
    
        # Get the array of water indices (w.r.t. new topology) to turn into ions
>       water_indices = PolymerProposalEngine.get_water_indices(charge_diff = charge_diff_test,
                                                 new_positions = new_pos,
                                                 new_topology = top_proposal._new_topology,
                                                 radius=0.8)

perses/tests/test_topology_proposal.py:868: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
perses/rjmc/topology_proposal.py:577: in get_water_indices
    traj = md.Trajectory(new_positions[np.newaxis, ...], md.Topology.from_openmm(new_topology))
../../../miniconda/envs/perses-dev/lib/python3.8/site-packages/mdtraj/core/trajectory.py:1169: in __init__
    self.xyz = xyz
../../../miniconda/envs/perses-dev/lib/python3.8/site-packages/mdtraj/core/trajectory.py:882: in xyz
    value = ensure_type(value, np.float32, 3, 'xyz', shape=shape,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

val = Quantity(value=array([[[ 2.00000100e-01,  1.00000000e-01, -1.30000000e-07],
        [ 2.00000100e-01,  2.09000000e-01,...-01,  5.23500000e-01,  9.63099985e-01],
        [ 3.55762500e-02,  6.86000000e-02,  1.78259999e+00]]]), unit=nanometer)
dtype = <class 'numpy.float32'>, ndim = 3, name = 'xyz', length = None, can_be_none = False, shape = (None, 1571, 3), warn_on_cast = False, add_newaxis_on_deficient_ndim = True

    def ensure_type(val, dtype, ndim, name, length=None, can_be_none=False, shape=None,
        warn_on_cast=True, add_newaxis_on_deficient_ndim=False):
        """Typecheck the size, shape and dtype of a numpy array, with optional
        casting.
    
        Parameters
        ----------
        val : {np.ndaraay, None}
            The array to check
        dtype : {nd.dtype, str}
            The dtype you'd like the array to have
        ndim : int
            The number of dimensions you'd like the array to have
        name : str
            name of the array. This is used when throwing exceptions, so that
            we can describe to the user which array is messed up.
        length : int, optional
            How long should the array be?
        can_be_none : bool
            Is ``val == None`` acceptable?
        shape : tuple, optional
            What should be shape of the array be? If the provided tuple has
            Nones in it, those will be semantically interpreted as matching
            any length in that dimension. So, for example, using the shape
            spec ``(None, None, 3)`` will ensure that the last dimension is of
            length three without constraining the first two dimensions
        warn_on_cast : bool, default=True
            Raise a warning when the dtypes don't match and a cast is done.
        add_newaxis_on_deficient_ndim : bool, default=True
            Add a new axis to the beginining of the array if the number of
            dimensions is deficient by one compared to your specification. For
            instance, if you're trying to get out an array of ``ndim == 3``,
            but the user provides an array of ``shape == (10, 10)``, a new axis will
            be created with length 1 in front, so that the return value is of
            shape ``(1, 10, 10)``.
    
        Notes
        -----
        The returned value will always be C-contiguous.
    
        Returns
        -------
        typechecked_val : np.ndarray, None
            If `val=None` and `can_be_none=True`, then this will return None.
            Otherwise, it will return val (or a copy of val). If the dtype wasn't right,
            it'll be casted to the right shape. If the array was not C-contiguous, it'll
            be copied as well.
    
        """
        if can_be_none and val is None:
            return None
    
        if not isinstance(val, np.ndarray):
            if isinstance(val, collections.abc.Iterable):
                # If they give us an iterator, let's try...
                if isinstance(val, collections.abc.Sequence):
                    # sequences are easy. these are like lists and stuff
                    val = np.array(val, dtype=dtype)
                else:
                    # this is a generator...
>                   val = np.array(list(val), dtype=dtype)
E                   ValueError: setting an array element with a sequence.

../../../miniconda/envs/perses-dev/lib/python3.8/site-packages/mdtraj/utils/validation.py:104: ValueError

jchodera · 2021-06-22T04:58:15Z

@dominicrufa : I'd like to add atom mapping validation and refinement steps to occur after the coordinate-derived mapping step. Are there methods already implemented in AtomMapper that I should be using to check the validity of mappings or de-map ring systems that are only partially mapped? Or am I better off implementing new methods for this purpose?

zhang-ivy · 2021-06-23T16:08:31Z

@mikemhenry: I think you re-worked the counterion tests.. did you see the above error that John mentioned before you merged?

I've run the counterion code for my RBD:ACE2 D501Y simulations and it runs fine. It seems like this error only comes up in the tests. Something is going wrong when the mdtraj trajectory is created:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
perses/rjmc/topology_proposal.py:577: in get_water_indices
    traj = md.Trajectory(new_positions[np.newaxis, ...], md.Topology.from_openmm(new_topology))
../../../miniconda/envs/perses-dev/lib/python3.8/site-packages/mdtraj/core/trajectory.py:1169: in __init__
    self.xyz = xyz
../../../miniconda/envs/perses-dev/lib/python3.8/site-packages/mdtraj/core/trajectory.py:882: in xyz
    value = ensure_type(value, np.float32, 3, 'xyz', shape=shape,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

@mikemhenry could you help debug this further? I'm guessing something weird is happening when new_positions[np.newaxis, ...] is fed into md.Trajectory

…cule-proposal

jchodera · 2021-06-29T14:40:18Z

I'm tremendously confused as to why the AtomMapper class exists only to hold a bunch of private underscore-appended static methods, rather than use a simple factory idiom like

# Create an atom mapping factory
atom_mapper = AtomMapper(atom_expr=atom_expr, bond_expr=bond_expr, map_strategy=map_strategy)
# Use the factory
for target_molecule in target_molecules:
   atom_mapping = AtomMapper.create_map(source_molecule, target_molecule)
   ...

I'm going to at least make some notes about how to refactor this class, but may end up doing some refactoring in this PR if it seems straightforward to fix.

mikemhenry · 2021-06-29T22:32:01Z

Looks like the tests are passing now, not sure why since the committed code -- or are you talking about some failing tests happing locally?

jchodera · 2021-06-30T11:35:54Z

@mikemhenry @zhang-ivy : This must be a stochastic test failure then. Can you add this to the list of stochastic test failures to revisit?

…oposal

…reaking/forming with class methods.

…hat generates AtomMapping objects

jchodera · 2021-07-25T20:53:17Z

We're down to two test failures remaining!

@dominicrufa @zhang-ivy : I'll need your help to sort out this issue with the repartitioned topology that likely appeared previously as a stochastic error since we were not exhaustively testing transformations here before:

Failure log for `test_RepartitionedHybridTopologyFactory_energies`


=================================== FAILURES ===================================
_______________ test_RepartitionedHybridTopologyFactory_energies _______________
[gw0] linux -- Python 3.9.6 /usr/share/miniconda/envs/test/bin/python

    def test_RepartitionedHybridTopologyFactory_energies():
        """
        Test whether the difference in the nonalchemical zero and alchemical zero states is the forward valence energy.  Also test for the one states.
        """
    
        from perses.tests.test_topology_proposal import generate_atp
        from openmmforcefields.generators import SystemGenerator
    
        # Test alanine dipeptide in vacuum
        atp, system_generator = generate_atp('vacuum')
        RepartitionedHybridTopologyFactory_energies(atp.topology, '1', atp.system, atp.positions, system_generator)
    
        # Test alanine dipeptide in solvent
        atp, system_generator = generate_atp('solvent')
>       RepartitionedHybridTopologyFactory_energies(atp.topology, '1', atp.system, atp.positions, system_generator)

perses/tests/test_relative.py:784: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
perses/tests/test_relative.py:665: in RepartitionedHybridTopologyFactory_energies
    RepartitionedHybridTopologyFactory_energy_mutant(topology, chain, system, positions, system_generator, mutant)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

topology = <Topology; 3 chains, 516 residues, 1557 atoms, 1043 bonds>
chain = '1'
system = <simtk.openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x7fddf8b4bea0> >
positions = Quantity(value=array([[ 2.00000100e-01,  1.00000000e-01, -1.30000000e-07],
       [ 2.00000100e-01,  2.09000000e-01,  ...0e-01, -6.70000000e-01,  1.08389999e+00],
       [ 1.29177625e+00,  1.14910000e+00,  1.20299985e-01]]), unit=nanometer)
system_generator = <openmmforcefields.generators.system_generators.SystemGenerator object at 0x7fde0c30b4f0>
mutant = 'GLU'

    def RepartitionedHybridTopologyFactory_energy_mutant(topology, chain, system, positions, system_generator, mutant):
        """
        Test whether the difference in the nonalchemical zero and alchemical zero states is the forward valence energy.  Also test for the one states.
        Note that two RepartitionedHybridTopologyFactorys need to be generated (one for each endstate) because the energies need to be validated separately for each endstate.
        Tests a single mutant.
        """
    
        from perses.rjmc.topology_proposal import PointMutationEngine
        from perses.annihilation.relative import RepartitionedHybridTopologyFactory
        from perses.tests.utils import validate_endstate_energies
    
        ENERGY_THRESHOLD = 1e-4
    
        for res in topology.residues():
            if res.id == '2':
                wt_res = res.name
    
        print(f'Making mutation {wt_res}->{mutant}')
    
        # Create point mutation engine to mutate residue at id 2 to random amino acid
        point_mutation_engine = PointMutationEngine(wildtype_topology=topology,
                                                    system_generator=system_generator,
                                                    chain_id=chain,
                                                    max_point_mutants=1,
                                                    residues_allowed_to_mutate=['2'],  # the residue ids allowed to mutate
                                                    allowed_mutations=[('2', mutant)],
                                                    aggregate=True)  # always allow aggregation
    
        # Create topology proposal
        topology_proposal = point_mutation_engine.propose(current_system=system, current_topology=topology)
    
        # Create geometry engine
        from perses.rjmc.geometry import FFAllAngleGeometryEngine
        geometry_engine = FFAllAngleGeometryEngine(metadata=None,
                                                   use_sterics=False,
                                                   n_bond_divisions=100,
                                                   n_angle_divisions=180,
                                                   n_torsion_divisions=360,
                                                   verbose=True,
                                                   storage=None,
                                                   bond_softening_constant=1.0,
                                                   angle_softening_constant=1.0,
                                                   neglect_angles=False,
                                                   use_14_nonbondeds=True)
    
        # Create geometry proposal
        new_positions, logp_proposal = geometry_engine.propose(topology_proposal, positions, beta,
                                                                       validate_energy_bookkeeping=True)
        logp_reverse = geometry_engine.logp_reverse(topology_proposal, new_positions, positions, beta,
                                                    validate_energy_bookkeeping=True)
    
        if not topology_proposal.unique_new_atoms:
            assert geometry_engine.forward_final_context_reduced_potential == None, f"There are no unique new atoms but the geometry_engine's final context reduced potential is not None (i.e. {self._geometry_engine.forward_final_context_reduced_potential})"
            assert geometry_engine.forward_atoms_with_positions_reduced_potential == None, f"There are no unique new atoms but the geometry_engine's forward atoms-with-positions-reduced-potential in not None (i.e. { self._geometry_engine.forward_atoms_with_positions_reduced_potential})"
            vacuum_added_valence_energy = 0.0
        else:
            added_valence_energy = geometry_engine.forward_final_context_reduced_potential - geometry_engine.forward_atoms_with_positions_reduced_potential
    
        if not topology_proposal.unique_old_atoms:
            assert geometry_engine.reverse_final_context_reduced_potential == None, f"There are no unique old atoms but the geometry_engine's final context reduced potential is not None (i.e. {self._geometry_engine.reverse_final_context_reduced_potential})"
            assert geometry_engine.reverse_atoms_with_positions_reduced_potential == None, f"There are no unique old atoms but the geometry_engine's atoms-with-positions-reduced-potential in not None (i.e. { self._geometry_engine.reverse_atoms_with_positions_reduced_potential})"
            subtracted_valence_energy = 0.0
        else:
            subtracted_valence_energy = geometry_engine.reverse_final_context_reduced_potential - geometry_engine.reverse_atoms_with_positions_reduced_potential
    
        # Generate repartitioned htf at lambda = 0
        zero_htf = RepartitionedHybridTopologyFactory(topology_proposal=topology_proposal,
                              current_positions=positions,
                              new_positions=new_positions,
                              endstate=0)
    
        # Compute error at lambda = 0 endstate
        zero_state_error, _ = validate_endstate_energies(zero_htf._topology_proposal,
                                                         zero_htf,
                                                         added_valence_energy,
                                                         subtracted_valence_energy,
                                                         ENERGY_THRESHOLD=ENERGY_THRESHOLD,
                                                         platform=openmm.Platform.getPlatformByName('Reference'),
                                                         repartitioned_endstate=0)
        # Generate repartitioned htf at lambda = 1
        one_htf = RepartitionedHybridTopologyFactory(topology_proposal=topology_proposal,
                                                      current_positions=positions,
                                                      new_positions=new_positions,
                                                      endstate=1)
    
        # Compute error at lambda = 1 endstate
        _, one_state_error = validate_endstate_energies(one_htf._topology_proposal,
                                                         one_htf,
                                                         added_valence_energy,
                                                         subtracted_valence_energy,
                                                         ENERGY_THRESHOLD=ENERGY_THRESHOLD,
                                                         platform=openmm.Platform.getPlatformByName('Reference'),
                                                         repartitioned_endstate=1)
    
        # Check that endstate errors are below threshold
        assert abs(zero_state_error) < ENERGY_THRESHOLD, f"The zero state alchemical and nonalchemical energy absolute difference {abs(zero_state_error)} is greater than the threshold of {ENERGY_THRESHOLD}."
>       assert abs(one_state_error) < ENERGY_THRESHOLD, f"The one state alchemical and nonalchemical energy absolute difference {abs(one_state_error)} is greater than the threshold of {ENERGY_THRESHOLD}."
E       AssertionError: The one state alchemical and nonalchemical energy absolute difference 0.2041063642262273 is greater than the threshold of 0.0001.
E       assert 0.2041063642262273 < 0.0001
E        +  where 0.2041063642262273 = abs(-0.2041063642262273)

perses/tests/test_relative.py:764: AssertionError
----------------------------- Captured stdout call -----------------------------

Energy breakdown

Making mutation ALA->GLU
Warning: Returning Reference platform instead of requested platform CUDA
conducting subsequent work with the following platform: Reference
conducting subsequent work with the following platform: CPU
conducting subsequent work with the following platform: CPU
conducting subsequent work with the following platform: CPU
conducting subsequent work with the following platform: CPU
Warning: Returning Reference platform instead of requested platform CUDA
conducting subsequent work with the following platform: Reference
conducting subsequent work with the following platform: CPU
added energy components: [('CustomBondForce', 1.580827770516917), ('CustomAngleForce', 4.3543592013601105), ('CustomTorsionForce', 7.050911762338636), ('CustomBondForce', 20.71734407448997)]
Warning: Returning Reference platform instead of requested platform CUDA
conducting subsequent work with the following platform: Reference
conducting subsequent work with the following platform: CPU
conducting subsequent work with the following platform: CPU
conducting subsequent work with the following platform: CPU
conducting subsequent work with the following platform: CPU
Warning: Returning Reference platform instead of requested platform CUDA
conducting subsequent work with the following platform: Reference
conducting subsequent work with the following platform: CPU
added energy components: [('CustomBondForce', 0.0), ('CustomAngleForce', 0.00011854972790545768), ('CustomTorsionForce', 0.0014456143524425994), ('CustomBondForce', 4.034319661495377)]
conducting subsequent work with the following platform: Reference
conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 0.034551580662335844
			HarmonicAngleForce: 0.6072024919297015
			PeriodicTorsionForce: 16.176829149610796
			NonbondedForce: -796.2718391330847
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0
conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 1.615379351179253
			HarmonicAngleForce: 4.961561693289812
			PeriodicTorsionForce: 23.22774091194943
			NonbondedForce: -775.5554960693804
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0
conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 1.615379351179253
			HarmonicAngleForce: 4.9614431435619055
			PeriodicTorsionForce: 23.22629529759699
			NonbondedForce: 2162531545690.6074
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0
conducting subsequent work with the following platform: Reference
conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 0.034551580662335844
			HarmonicAngleForce: 0.6072024919297015
			PeriodicTorsionForce: 16.176829149610796
			NonbondedForce: -796.2718391330847
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0
conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 1.615379351179253
			HarmonicAngleForce: 4.961561693289811
			PeriodicTorsionForce: 23.22774091194943
			NonbondedForce: 2162531545748.2427
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0
conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 1.615379351179253
			HarmonicAngleForce: 4.9614431435619055
			PeriodicTorsionForce: 23.22629529759699
			NonbondedForce: 2162531545690.6074
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0
------------------------------ Captured log call -------------------------------

Here's a failed test that I need to update that had simply hard-coded the "correct" mapping results:

[gw0] linux -- Python 3.9.6 /usr/share/miniconda/envs/test/bin/python

pairs_of_smiles = [('Cc1ccccc1', 'c1ccc(cc1)N'), ('CC(c1ccccc1)', 'O=C(c1ccccc1)'), ('Oc1ccccc1', 'Sc1ccccc1')]
test = True

    def test_mapping_strength_levels(pairs_of_smiles=[('Cc1ccccc1','c1ccc(cc1)N'),('CC(c1ccccc1)','O=C(c1ccccc1)'),('Oc1ccccc1','Sc1ccccc1')],test=True):
    
        correct_results = {0:{'default': (3,2), 'weak':(3,2), 'strong':(4,3)},
                           1:{'default': (7,3), 'weak':(6,2), 'strong':(7,3)},
                           2:{'default': (1,1), 'weak':(1,1), 'strong':(2,2)}}
    
        mapping = ['weak','default','strong']
    
        for example in mapping:
            for index, (lig_a, lig_b) in enumerate(pairs_of_smiles):
                print(f"conducting {example} mapping with ligands {lig_a}, {lig_b}")
                initial_molecule = smiles_to_oemol(lig_a)
                proposed_molecule = smiles_to_oemol(lig_b)
                molecules = [Molecule.from_openeye(mol) for mol in [initial_molecule, proposed_molecule]]
                system_generator = SystemGenerator(forcefields = forcefield_files, barostat=barostat, forcefield_kwargs=forcefield_kwargs,nonperiodic_forcefield_kwargs=nonperiodic_forcefield_kwargs,
                                                     small_molecule_forcefield = 'gaff-1.81', molecules=molecules, cache=None)
                proposal_engine = SmallMoleculeSetProposalEngine([initial_molecule, proposed_molecule], system_generator)
                proposal_engine.map_strength = example
                initial_system, initial_positions, initial_topology = OEMol_to_omm_ff(initial_molecule, system_generator)
                print(f"running now with map strength {example}")
                proposal = proposal_engine.propose(initial_system, initial_topology)
                print(lig_a, lig_b,'length OLD and NEW atoms',len(proposal.unique_old_atoms), len(proposal.unique_new_atoms))
                if test:
                    render_atom_mapping(f'{index}-{example}.png', initial_molecule, proposed_molecule, proposal._new_to_old_atom_map)
>                   assert ( (len(proposal.unique_old_atoms), len(proposal.unique_new_atoms)) == correct_results[index][example]), f"the mapping failed, correct results are {correct_results[index][example]}"
E                   AssertionError: the mapping failed, correct results are (7, 3)
E                   assert (6, 2) == (7, 3)
E                     At index 0 diff: 6 != 7
E                     Full diff:
E                     - (7, 3)
E                     + (6, 2)

perses/tests/test_topology_proposal.py:99: AssertionError
----------------------------- Captured stdout call -----------------------------

jchodera · 2021-07-25T21:30:50Z

OK, the hard-coded map test is fixed, and this PR should be ready for review!
I'll incorporate input from @zhang-ivy @dominicrufa when they can provide it about the failing ALA->GLU mutation energy comparison test, but I'm guessing that test failure occurs because the nonbonded forces are just very large:

conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 1.615379351179253
			HarmonicAngleForce: 4.961561693289811
			PeriodicTorsionForce: 23.22774091194943
			NonbondedForce: 2162531545748.2427
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0
conducting subsequent work with the following platform: CPU
			HarmonicBondForce: 1.615379351179253
			HarmonicAngleForce: 4.9614431435619055
			PeriodicTorsionForce: 23.22629529759699
			NonbondedForce: 2162531545690.6074
			AndersenThermostat: 0.0
			MonteCarloBarostat: 0.0

jchodera · 2021-07-25T21:35:27Z

One way we might fix the test_RepartitionedHybridTopologyFactory_energies issue is to either minimize the energy or generate multiple conformers until we get one with a non-crazy energy, and then compare the energies for that conformation. But I suggest we create a different issue for that.

Tagging @mikemhenry to review this on Monday if possible!

…r notebook with atom mapping examples

jchodera · 2021-07-26T02:51:30Z

I've now enabled AtomMapping classes to render themselves inside Jupyter notebooks!

I've used this feature to overhaul the atom mapping example notebook so we can show off the new features at Monday's meeting:
https://github.com/choderalab/perses/blob/refactor-small-molecule-proposal/examples/atom-mapping/Atom-mapping.ipynb

zhang-ivy · 2021-07-26T13:43:25Z

Just noting this here for my future self: The major issue with why the current biopolymer atom mapping is failing is that the current scheme seems to make use of the fact that OEMol atom indices can be arbitrarily set, and this facility is used to keep track of which atoms in part of the residue are mapped. My atom mapping tool instead returns indices of the atoms within the OpenFF Molecule object, which does not support the concept of attached arbitrary atom indices.

@jchodera : Does this PR break biopolymer atom mapping? If so, what is the short term solution here for making sure that we can map protein mutations?

One way we might fix the test_RepartitionedHybridTopologyFactory_energies issue is to either minimize the energy or generate multiple conformers until we get one with a non-crazy energy, and then compare the energies for that conformation. But I suggest we create a different issue for that.

I can look into adding a minimization step to the tests today.

jchodera · 2021-07-26T15:08:00Z

@jchodera : Does this PR break biopolymer atom mapping? If so, what is the short term solution here for making sure that we can map protein mutations?

No. All tests pass. So if the test coverage in master covered your use cases, we should not break them.

If you have unit tests outside of master that are important, it would be critical to share those now!

I can look into adding a minimization step to the tests today.

Let's tackle that in a separate PR!

perses/app/relative_setup.py

perses/app/setup_relative_calculation.py

mikemhenry

LGTM! I really like the rework of the atom mapping API

ijpulidos

This is a great PR. Lots of improvements in the API. There are some comments to be discussed as usual but nothing blocking. This can be merged.

ijpulidos · 2021-07-28T15:42:38Z

perses/tests/test_atom_mapping.py

+    def test_render_image(self):
+        import tempfile
+        atom_mapping = AtomMapping(self.old_mol, self.old_mol, old_to_new_atom_map=self.old_to_new_atom_map)
+        for suffix in ['.pdf', '.png', '.svg']:
+            with tempfile.NamedTemporaryFile(suffix=suffix) as tmpfile:
+                atom_mapping.render_image(tmpfile.name)


Wouldn't it be better to test for these formats/extensions individually? Such that if one of them fails, for whatever reasons it is easier to detect which one actually failed.

Test parameterization is probably the right way to go here, as you suggested! I'll leave that for further cleanup.

ijpulidos · 2021-07-28T15:50:08Z

perses/tests/test_atom_mapping.py

+        old_to_new_atom_map = { 0:0, 1:1, 2:2, 3:3, 5:4, 6:5 }
+        new_to_old_atom_map = dict(map(reversed, self.old_to_new_atom_map.items()))
+        atom_mapping = AtomMapping(old_mol, new_mol, old_to_new_atom_map=old_to_new_atom_map)
+        assert atom_mapping.creates_or_breaks_rings() == True


Hasn't this feature/method been tested in the previous test_ring_breaking_detection already?

Extra test coverage doesn't help! There are a lot of combinations of options being tested here.

ijpulidos · 2021-07-28T15:57:49Z

perses/tests/test_atom_mapping.py

+    """Test AtomMapping object."""
+    def setUp(self):
+        """Create useful common objects for testing."""
+        from openff.toolkit.topology import Molecule


I don't think this import is needed here since there is already a global one happening at line 7.

Good catch!

ijpulidos · 2021-07-28T16:29:54Z

perses/tests/test_atom_mapping.py

+                            e.args += (f'Exception encountered for {dataset_name} use_positions={use_positions} allow_ring_breaking={allow_ring_breaking}: {old_index} {molecules[old_index]}-> {new_index} {molecules[new_index]}', )
+                            raise e
+
+    def test_map_strategy(self):
+        """
+        Test the creation of atom maps between pairs of molecules from the JACS benchmark set.
+        """
+        # Create and configure an AtomMapper
+        from openeye import oechem
+        atom_expr = oechem.OEExprOpts_IntType
+        bond_expr = oechem.OEExprOpts_RingMember
+        atom_mapper = AtomMapper(atom_expr=atom_expr, bond_expr=bond_expr)
+
+        # Test mappings for JACS dataset ligands
+        for dataset_name in ['Jnk1']:
+            molecules = self.molecules[dataset_name]
+
+            # Jnk1 ligands 0 and 2 have meta substituents that face opposite each other in the active site.
+            # When ignoring position information, the mapper should align these groups, and put them both in the core.
+            # When using position information, the mapper should see that the orientations differ and chose
+            # to unmap (i.e. put both these groups in core) such as to get the geometry right at the expense of
+            # mapping fewer atoms
+
+            # Ignore positional information when scoring mappings
+            atom_mapper.use_positions = False
+            atom_mapping = atom_mapper.get_best_mapping(molecules[0], molecules[2])
+            #assert len(atom_mapping.new_to_old_atom_map) == 36, f'Expected meta groups methyl C to map onto ethyl O\n{atom_mapping}' # TODO
+
+            # Use positional information to score mappings
+            atom_mapper.use_positions = True
+            atom_mapping = atom_mapper.get_best_mapping(molecules[0], molecules[2])
+            #assert len(atom_mapping.new_to_old_atom_map) == 35,  f'Expected meta groups methyl C to NOT map onto ethyl O as they are distal in cartesian space\n{atom_mapping}' # TODO
+
+            # Explicitly construct mapping from positional information alone
+            atom_mapping = atom_mapper.generate_atom_mapping_from_positions(molecules[0], molecules[2])
+            #assert len(atom_mapping.new_to_old_atom_map) == 35,  f'Expected meta groups methyl C to NOT map onto ethyl O as they are distal in cartesian space\n{atom_mapping}' # TODO
+
+    def test_simple_heterocycle_mapping(self):
+        """
+        Test the ability to map conjugated heterocycles (that preserves all rings).  Will assert that the number of ring members in both molecules is the same.
+        """
+        # TODO: generalize this to test for ring breakage and closure.
+
+        iupac_pairs = [
+            ('benzene', 'pyridine')
+            ]
+
+        # Create and configure an AtomMapper
+        atom_mapper = AtomMapper(allow_ring_breaking=False)
+
+
+        for old_iupac, new_iupac in iupac_pairs:
+            from openff.toolkit.topology import Molecule
+            old_mol = Molecule.from_iupac(old_iupac)
+            new_mol = Molecule.from_iupac(new_iupac)
+            atom_mapping = atom_mapper.get_best_mapping(old_mol, new_mol)
+
+            assert len(atom_mapping.old_to_new_atom_map) > 0
+
+    def test_mapping_strength_levels(self):
+        """Test the mapping strength defaults work as expected"""
+        # SMILES pairs to test mappings
+        tests = [
+            ('c1ccccc1', 'C1CCCCC1', {'default': 0, 'weak' : 6, 'strong' : 0}), # benzene -> cyclohexane
+            ('CNC1CCCC1', 'CNC1CCCCC1', {'default': 6, 'weak' : 6, 'strong' : 6}), # https://github.com/choderalab/perses/issues/805#issue-913932127
+            ('c1ccccc1CNC2CCC2', 'c1ccccc1CNCC2CCC2', {'default': 13, 'weak' : 13, 'strong' : 11}), # https://github.com/choderalab/perses/issues/805#issue-913932127
+            ('Cc1ccccc1','c1ccc(cc1)N', {'default': 12, 'weak' : 12, 'strong' : 11}),
+            ('CC(c1ccccc1)','O=C(c1ccccc1)', {'default': 13, 'weak' : 14, 'strong' : 11}),
+            ('Oc1ccccc1','Sc1ccccc1', {'default': 12, 'weak' : 12, 'strong' : 11}),
+            ]
+
+        DEBUG_MODE = True # If True, don't fail, but print results of tests for calibration
+
+        for mol1_smiles, mol2_smiles, expected_results in tests:
+            for map_strength, expected_n_mapped_atoms in expected_results.items():
+                # Create OpenFF Molecule objects
+                from openff.toolkit.topology import Molecule
+                mol1 = Molecule.from_smiles(mol1_smiles)
+                mol2 = Molecule.from_smiles(mol2_smiles)
+
+                # Initialize the atom mapper with the requested mapping strength
+                atom_mapper = AtomMapper(map_strength=map_strength, allow_ring_breaking=False)
+
+                # Create the atom mapping
+                atom_mapping = atom_mapper.get_best_mapping(mol1, mol2)
+
+                if DEBUG_MODE:
+                    if atom_mapping is not None:
+                        _logger.info(f'{mol1_smiles} -> {mol2_smiles} using map strength {map_strength} : {atom_mapping.n_mapped_atoms} atoms mapped : {atom_mapping.old_to_new_atom_map}')
+                        atom_mapping.render_image(f'test_mapping_strength_levels:{mol1_smiles}:{mol2_smiles}:{map_strength}.png')
+                    else:
+                        _logger.info(f'{mol1_smiles} -> {mol2_smiles} using map strength {map_strength} : {atom_mapping}')
+
+                else:
+                    # Check that expected number of mapped atoms are provided
+                    n_mapped_atoms = 0
+                    if atom_mapping is not None:
+                        n_mapped_atoms = atom_mapping.n_mapped_atoms
+
+                    assert n_mapped_atoms==expected_n_mapped_atoms, "Number of mapped atoms does not match hand-calibrated expectation"


This is great testing, a lot of cases involved. Maybe these tests could eventually use test parametrization and fixtures to make it a bit cleaner. We should try making a separate issue for that.

Yes! Great idea---please open an issue!

ijpulidos · 2021-07-28T19:58:23Z

perses/rjmc/atom_mapping.py

+        # Fix atom name retrieval from OEMol objects
+        # TODO: When https://github.com/openforcefield/openff-toolkit/issues/1026 is fixed, remove this
+        if hasattr(old_mol, 'GetAtoms'):
+            for index, atom in enumerate(old_mol.GetAtoms()):
+                self.old_mol.atoms[index].name = atom.GetName()
+        if hasattr(new_mol, 'GetAtoms'):
+            for index, atom in enumerate(new_mol.GetAtoms()):
+                self.new_mol.atoms[index].name = atom.GetName()


So the issue seems to be solved upstream but a release of openff-toolkit with the changes has yet to be made. I think the next release should have this already, we should keep an eye for it. Do you think we can then remove this code and make perses depend on that version or newer?

For now, this code doesn't do any harm, and is compatible with the openff-toolkit current and future release.
At some point in the future, we can change the dependency to require the new version of openff-toolkit or later, and then remove this code---hence the #TODO!

ijpulidos · 2021-07-29T17:43:46Z

perses/rjmc/atom_mapping.py

+        old_offmol, old_oemol = copy_molecule(old_mol)
+        new_offmol, new_oemol = copy_molecule(new_mol)
+
+        if (old_offmol.n_atoms==0) or (new_offmol.n_atoms==0):


Shouldn't we just test this in the beginning with old_mol and new_mol directly? I know this shouldn't change the behavior but just for 'order'.

That would take more code, since we don't know whether they are OEMol or Molecule objects at the beginning.

ijpulidos · 2021-07-29T19:07:07Z

perses/rjmc/atom_mapping.py

+        elapsed_time = time.time() - initial_time
+        _logger.debug(f'get_best_mapping took {elapsed_time:.3f} s')
+
+        return atom_mappings[best_map_index]


Can we get multiple mappings with the same max score? What happens when that's the case?

Great point! Added a comment to address this ambiguity.

ijpulidos · 2021-07-29T19:12:57Z

perses/rjmc/atom_mapping.py

+            else:
+                raise ValueError(f'Not sure how to copy {mol}; should be openff.toolkit.topology.Molecule or openeye.oechem.OEMol')
+
+            return offmol, oemol


Not sure how bad we are in memory usage or if we would ever need a better memory footprint. But this might be a candidate for improvement, maybe returning just copies of the same type as the input. That is, if input is an OEMol output is just OEMol as well (similar with offmol).

Great point! The hope is to eliminate OEMol objects entirely when possible, and just use Molecule in the future. This is just a temporary hack.

ijpulidos · 2021-07-29T19:16:16Z

perses/rjmc/atom_mapping.py

+
+        """
+        import numpy as np
+        atom_mappings = self.get_all_maps(old_mol, new_mol)


I believe get_all_maps doesn't exist as a public/user-visible method. Is this supposed to be _get_all_maps? Now that makes me wonder if this method is being tested at all (probably not).

Actually I now realize as well that the _get_all_maps static method has an undefined reference to self if no atom or bond expressions are passed.

I am hoping to refactor the private _get_all_map() method in a future PR---once I fully understand what it does!
This was just a partial refactor, cleaning up the public API, eliminating references to the private API when possible, and refactoring the bits of the code I could understand. We'll need @dominicrufa's help (after he returns from his internship) for the next phase of refactoring/cleanup.

ijpulidos · 2021-07-29T19:32:26Z

perses/rjmc/atom_mapping.py

+
+        """
+        from openff.toolkit.topology import Molecule
+        distance_unit = unit.angstroms # distance unit for comparisons


Are units always expected to be angstroms? I guess this works for the most part but maybe we could try to get the actual units from the molecule conformers themselves. Something like old_mol.conformers[0].unit (assuming they all have the same units as the first occurrence)

The distance_unit here is arbitrary, but defined here to be consistent in how both the measured distances and absolute distance threshold are converted into unitless numbers. So the choice is totally arbitrary.

…oposal

jchodera · 2021-07-30T02:31:31Z

Changes should be complete. Just waiting on tests to pass now.

jchodera added 4 commits June 18, 2021 23:59

Move SmallMoleculeSetProposalEngine.propose() kwargs to class attribu…

289c99f

…tes, and enable coordinate precision for geometry-derived mappings to be specified.

Adapt app tools and tests for changes to SmalleMoleculeSetProposalEng…

a6d346e

…ine API

Fix test failures for API changes

33a19de

Fix another test that did not adapt to updated SystemGenerator API.

4814be2

Update CDK2 example to use geometry-derived mappings

59c860a

jchodera added this to the 0.9.1 Bugfix release milestone Jun 21, 2021

jchodera self-assigned this Jun 21, 2021

jchodera added bug 🐛 priority: high priority high effort: medium labels Jun 21, 2021

jchodera added 3 commits June 21, 2021 20:57

Merge remote-tracking branch 'origin/master' into refactor-small-mole…

6ab7ecc

…cule-proposal

Add README for thrombin FAH example, and include geometry-based mappi…

02fb701

…ng arguments in YAML

Move AtomMapper into separate file; separate out tests.

28140e7

jchodera added 2 commits June 23, 2021 17:10

Merge remote-tracking branch 'origin/master' into refactor-small-mole…

3a8cf7a

…cule-proposal

Clearer comments in SmallMoleculeSetProposalEngine and AtomMapper

261e2ec

mikemhenry mentioned this pull request Jul 1, 2021

Stochastic test failures in RepartitionedHybridTopologyFactory_energies #815

Closed

jchodera added 6 commits July 11, 2021 11:22

Merge remote-tracking branch 'origin' into refactor-small-molecule-pr…

6bcfe17

…oposal

Implement versatile AtomMapping container class.

33e898e

Extend AtomMapping class to check for ring breaking/forming and fix b…

29e218b

…reaking/forming with class methods.

Encapsulate all atom mapping capabilities in new AtomMapper factory t…

ebaaf0a

…hat generates AtomMapping objects

Update AtomMapper use within topology proposals

f6892b5

Lots of bugfixes and updates to atom mapping tests.

00eba3c

Add AtomMapping and AtomMapper to sphinx API docs

e39c3a4

Update test that used hard-coded results for map strength

471ae3d

Increase test coverage; weaken 'weak' mapping criteria; rework Jupyte…

32c8652

…r notebook with atom mapping examples

Update Jupyter notebook

7d073a9

jchodera mentioned this pull request Jul 28, 2021

Sporadic unspecified stereochemistry issue with generate_atom_mapping_from_positions #826

Closed

jchodera requested review from zhang-ivy and ijpulidos July 28, 2021 04:20

ijpulidos reviewed Jul 28, 2021

View reviewed changes

perses/app/relative_setup.py Show resolved Hide resolved

ijpulidos reviewed Jul 28, 2021

View reviewed changes

perses/app/relative_setup.py Show resolved Hide resolved

ijpulidos reviewed Jul 28, 2021

View reviewed changes

perses/app/setup_relative_calculation.py Show resolved Hide resolved

mikemhenry approved these changes Jul 29, 2021

View reviewed changes

ijpulidos approved these changes Jul 29, 2021

View reviewed changes

Add test to catch failure

6176a14

zhang-ivy mentioned this pull request Jul 30, 2021

Fix RepartitionedHybridTopologyFactory tests to reduce stochastic failures #828

Merged

jchodera added 4 commits July 29, 2021 19:10

Fix #826 (adding test in addition)

ed38bdb

Merge remote-tracking branch 'origin' into refactor-small-molecule-pr…

b8c6a09

…oposal

Address @ijpulidos feedback

4e88b02

Check in test ligands from the COVID Moonshot

f1066be

Fix Molecule imports

788055d

jchodera merged commit 02df62f into master Jul 30, 2021

jchodera deleted the refactor-small-molecule-proposal branch July 30, 2021 05:12

This was referenced Jul 30, 2021

Check factorization of partition function #792

Closed

When using geometry-based mapping, de-map problematic transformations #805

Closed

Overhaul atom mapping #809

Overhaul atom mapping #809

Conversation

jchodera commented Jun 19, 2021 • edited Loading

codecov bot commented Jun 20, 2021 • edited Loading

Codecov Report

jchodera commented Jun 21, 2021

jchodera commented Jun 22, 2021

jchodera commented Jun 22, 2021

zhang-ivy commented Jun 23, 2021 • edited Loading

jchodera commented Jun 29, 2021

mikemhenry commented Jun 29, 2021

jchodera commented Jun 30, 2021

jchodera commented Jul 25, 2021 • edited Loading

jchodera commented Jul 25, 2021

jchodera commented Jul 25, 2021

jchodera commented Jul 26, 2021

zhang-ivy commented Jul 26, 2021 • edited Loading

jchodera commented Jul 26, 2021

mikemhenry left a comment

Choose a reason for hiding this comment

ijpulidos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchodera commented Jul 30, 2021

jchodera commented Jun 19, 2021 •

edited

Loading

codecov bot commented Jun 20, 2021 •

edited

Loading

zhang-ivy commented Jun 23, 2021 •

edited

Loading

jchodera commented Jul 25, 2021 •

edited

Loading

zhang-ivy commented Jul 26, 2021 •

edited

Loading