Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Nucleic Acid Backbone #189

Merged
merged 10 commits into from Oct 24, 2023
Merged

Conversation

fjclark
Copy link
Contributor

@fjclark fjclark commented Oct 23, 2023

Is this pull request to fix a bug, or to introduce new functionality?

This PR allows BSS to recognise nucleic acid backbones e.g.:

protocol = BSS.Protocol.Equilibration(restraint="backbone")

I've tried to implement this in the same way as for protein backbones, copying residue names over from MDAnalysis. A couple of slight complications are:

  • I've had to use regex syntax to allow searching for atom names containing '
  • I've created the methods containsAminoAcids and containsNucleicAcids in System to make this section of code cleaner.

If this introduces new functionality...

Changes proposed in this pull request:

  • Add nucl_res - a list of nucleic acid residues, taken from MDAnalysis.
  • Add some logic when creating the Amber mask to ensure that only atoms which exist in the system are included in the mask (otherwise Amber produces the error: Error: an illegal memory access was encountered launching kernel kClearForces)
    • Add the methods containsAminoAcids and containsNucleicAcids to System to make the above logic cleaner.
  • I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): y
  • I confirm that I have added a test for any new functionality in this pull request: n - I couldn't see tests related to the functionality which was present before.
  • I confirm that I have added documentation (e.g. a new tutorial page or detailed guide) for any new functionality in this pull request: n
  • I confirm that I have permission to release this code under the GPL3 license: y

Suggested reviewers:

@lohedges, @chryswoods

@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 15:59 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 15:59 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 15:59 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 15:59 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 16:50 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 16:50 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 16:50 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 16:50 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 23, 2023 16:50 — with GitHub Actions Inactive
@lohedges
Copy link
Contributor

Many thanks, this looks great. Will take a closer look tomorrow. The CI failure was just because the required Sire packages were still building so it didn't find the dependency for that platform. I've re-triggerred.

Copy link
Contributor

@lohedges lohedges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Other than my review comments, do you have any simple tests for the new logic? If so, it would be good if they could re-use some of the existing test files. If you need to add anything, then I can upload it to the website as needed.

If it's not too much effort, could you mirror the new functionality in the Exscientia Sandpit too? I can always do this post-merge if needed.

Comment on lines 56 to 272
"ADE",
"URA",
"CYT",
"GUA",
"THY",
"DA",
"DC",
"DG",
"DT",
"RA",
"RU",
"RG",
"RC",
"A",
"T",
"U",
"C",
"G",
"DA5",
"DC5",
"DG5",
"DT5",
"DA3",
"DC3",
"DG3",
"DT3",
"RA5",
"RU5",
"RG5",
"RC5",
"RA3",
"RU3",
"RG3",
"RC3",
}

# A list of ion elements.
ions = [
"F",
"Cl",
"Br",
"I",
"Li",
"Na",
"K",
"Rb",
"Cs",
"Mg",
"Tl",
"Cu",
"Ag",
"Be",
"Cu",
"Ni",
"Pt",
"Zn",
"Co",
"Pd",
"Ag",
"Cr",
"Fe",
"Mg",
"V",
"Mn",
"Hg",
"Cd",
"Yb",
"Ca",
"Sn",
"Pb",
"Eu",
"Sr",
"Sm",
"Ba",
"Ra",
"Al",
"Fe",
"Cr",
"In",
"Tl",
"Y",
"La",
"Ce",
"Pr",
"Nd",
"Sm",
"Eu",
"Gd",
"Tb",
"Dy",
"Er",
"Tm",
"Lu",
"Hf",
"Zr",
"Ce",
"U",
"Pu",
"Th",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could all of these be moved outside of the class? I'd rather they were hidden from the user since they are only used internally. It would probably be best to place them in a utility module, e.g. _utils.py, then import them privately in _system.py to use where required, e.g.:

from ._utils import prot_res as _prot_res
...

For simplicity, they could even be private in the _utils.py module itself, e.g. just call it _prot_res, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course - have updated and will push shortly.

Comment on lines 2158 to 2206
def containsAminoAcids(self, property_map={}):
"""
Check whether the system contains nucleic acids.

Parameters
----------
property_map : dict
A dictionary that maps system "properties" to their user defined
values. This allows the user to refer to properties with their
own naming scheme, e.g. { "charge" : "my-charge" }

Returns
-------

contains_aa : bool
True if the system contains amino acids, False otherwise.
"""
string = "(resname " + ",".join(self.prot_res) + ")"
try:
search = self.search(string, property_map)
except:
search = []

return bool(search)

def containsNucleicAcids(self, property_map={}):
"""
Check whether the system contains nucleic acids.

Parameters
----------
property_map : dict
A dictionary that maps system "properties" to their user defined
values. This allows the user to refer to properties with their
own naming scheme, e.g. { "charge" : "my-charge" }

Returns
-------

contains_na : bool
True if the system contains amino acids, False otherwise.
"""
string = "(resname " + ",".join(self.nucl_res) + ")"
try:
search = self.search(string, property_map)
except:
search = []

return bool(search)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to match these to the other similar search functions, e.g. have functions called nNucleicAcids and getNucleicAcids, etc. The first would give you the number of matches, the second returns the atoms matching the search, if any. (Could probably do the same for ions, too.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I've added these. I've skipped getIons for now because this seems a bit more complicated. The ions list only contains atom names, so this would have to be getMonoatomicIons, and I'd have to make sure that the atoms aren't part of molecules. Happy to try and implement something if you would like though·

@@ -2127,6 +2218,7 @@ def _isParameterised(self, property_map={}):
property_map : dict
A dictionary that maps system "properties" to their user defined
values. This allows the user to refer to properties with their
own naming scheme, e.g. { "charge" : "my-charge" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching.

@fjclark
Copy link
Contributor Author

fjclark commented Oct 24, 2023

Thanks for this. Other than my review comments, do you have any simple tests for the new logic? If so, it would be good if they could re-use some of the existing test files. If you need to add anything, then I can upload it to the website as needed.

Thanks very much for the quick review. I've added some tests. Are any of the website systems nucleic acids? If not, could you please add compressed versions of this RNA system: rna.tar.gz?

@lohedges
Copy link
Contributor

I'm not sure. I'll add that to the website now and let you know when I'm done.

@lohedges
Copy link
Contributor

All done. These can now be loaded with:

s = BSS.IO.readMolecules(BSS.IO.expand(BSS.tutorialUrl(), ["rna_6e1s.rst7", "rna_6e1s.prm7"]))

In the test suite the url is a variable within the conftest.py file, so you can just use that.

@fjclark
Copy link
Contributor Author

fjclark commented Oct 24, 2023

Perfect, thanks very much. I'll update the tests and copy over to the sandpit now.

@fjclark fjclark temporarily deployed to biosimspace-build October 24, 2023 12:38 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 24, 2023 12:38 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 24, 2023 12:38 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 24, 2023 12:38 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build October 24, 2023 12:38 — with GitHub Actions Inactive
@lohedges lohedges added the enhancement New feature or request label Oct 24, 2023
lohedges
lohedges previously approved these changes Oct 24, 2023
Copy link
Contributor

@lohedges lohedges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, this looks great 👍

@lohedges
Copy link
Contributor

Sorry, just to check... I know you have tests in place, but does this work as documented:

    def getNucleotides(self, property_map={}):
        """
        Return a list containing all of the nucleotide residues in the system.
        Parameters
        ----------
        property_map : dict
            A dictionary that maps system "properties" to their user defined
            values. This allows the user to refer to properties with their
            own naming scheme, e.g. { "charge" : "my-charge" }
        Returns
        -------
        residues : [:class:`Residue <BioSimSpace._SireWrappers.Residue>`]
            The list of all nucleotide residues in the system.
        """
        search_string = "(resname " + ",".join(_nucl_res) + ")"
        try:
            residues = list(self.search(search_string, property_map))
        except:
            residues = []
        return residues

This should return a list of residues. However, by default, searches will return atoms matching the search, i.e. the atoms within the residue. Do you want to return the atoms, or should this be converted to residues. For example:

import BioSimSpace as BSS

s = BSS.IO.readMolecules(BSS.IO.expand(BSS.tutorialUrl(), ["ala.top", "ala.crd"]))

# Search for the ALA residue. This returns the atoms in residue ALA.
s.search("resname ALA")
<BioSimSpace.SearchResult: nResults=10>

# Search for the residues matching the search. Now there is a single match.
s.search("resname ALA").residues()
<BioSimSpace.SearchResult: nResults=1

Do this by using .residues() on the result of the search.
@fjclark
Copy link
Contributor Author

fjclark commented Oct 24, 2023

Sorry, just to check... I know you have tests in place, but does this work as documented:

    def getNucleotides(self, property_map={}):
        """
        Return a list containing all of the nucleotide residues in the system.
        Parameters
        ----------
        property_map : dict
            A dictionary that maps system "properties" to their user defined
            values. This allows the user to refer to properties with their
            own naming scheme, e.g. { "charge" : "my-charge" }
        Returns
        -------
        residues : [:class:`Residue <BioSimSpace._SireWrappers.Residue>`]
            The list of all nucleotide residues in the system.
        """
        search_string = "(resname " + ",".join(_nucl_res) + ")"
        try:
            residues = list(self.search(search_string, property_map))
        except:
            residues = []
        return residues

This should return a list of residues. However, by default, searches will return atoms matching the search, i.e. the atoms within the residue. Do you want to return the atoms, or should this be converted to residues. For example:

import BioSimSpace as BSS

s = BSS.IO.readMolecules(BSS.IO.expand(BSS.tutorialUrl(), ["ala.top", "ala.crd"]))

# Search for the ALA residue. This returns the atoms in residue ALA.
s.search("resname ALA")
<BioSimSpace.SearchResult: nResults=10>

# Search for the residues matching the search. Now there is a single match.
s.search("resname ALA").residues()
<BioSimSpace.SearchResult: nResults=1

Thanks for pointing this out. I've added .residues() and a check of isintance() in the tests. These functions were returning lists of residues before, but I see now that they could return lists of atoms if only one residue was found. I didn't realise that the return type depended on the number of results, e.g.:

import BioSimSpace as BSS

s = BSS.IO.readMolecules(BSS.IO.expand(BSS.tutorialUrl(), ["ala.top", "ala.crd"]))

# Search for the ALA residue. This returns the atoms in residue ALA.
s.search("resname ALA")
<BioSimSpace.SearchResult: nResults=10>

# Search for the ALA residue and the cap ACE. This returns 2 residues, rather than atoms
s.search("resname ALA,ACE")
<BioSimSpace.SearchResult: nResults=2>

@lohedges
Copy link
Contributor

No problem, I agree that it's confusing. This changed in 2023.3.0 when the Sire search API was updated. As such, it's now best to use the .atoms(), .residues(), or .molecules() functions to cast the result to whatever you want. It does work with the new API, e.g.:

s = BSS.IO.readMolecules(BSS.IO.expand(BSS.tutorialUrl(), ["ala.top", "ala.crd"]))

sire_sys = sr.system.System(s._sire_object)

sire_sys["resname ALA"]
Residue( ALA:2   num_atoms=10 )

However, using the old API allows the search to respect the property map. This should all be sorted out once we fully migrate across to the new API in future.

@fjclark
Copy link
Contributor Author

fjclark commented Oct 24, 2023

OK, thanks for the explanation.

@lohedges
Copy link
Contributor

Previously I had some logic that would automatically convert the results to the minimal object, i.e. atoms if there was only one atom, residues if there was only one residue, etc, but this broke with the API update.

@lohedges lohedges merged commit cb92227 into OpenBioSim:devel Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants