Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Residue groups #178

Closed
frodofine opened this issue Jun 28, 2018 · 5 comments
Closed

Feature request: Residue groups #178

frodofine opened this issue Jun 28, 2018 · 5 comments
Assignees

Comments

@frodofine
Copy link
Contributor

I believe there is a need for residue groups in the Frame class to store data such as chain IDs and other related information in Biologic systems (I'm sure other applications could benefit here, but I do not know them). Such an implementation could look like:

class ResidueGroup {
    sorted_set<size_t> residue_ids_
    property_map prop_map_
public:
    set / get - property
    check if a residue is in the group
    add / residues from the group
}

Thoughts?

@frodofine frodofine self-assigned this Jun 28, 2018
@Luthaf
Copy link
Member

Luthaf commented Jul 1, 2018

I am a bit worried about proliferation of ad-hoc structures here, with both Residue and ResidueGroup being loosely integrated with the Frame (both only store index of atoms/residues in a separated frame).

What is made available by ResidueGroup that could not be done with a chain_id string property on atoms/residues? If one need the ability to get all atoms with a given chain_id, maybe extending selections to work with properties would be another solution.

If there is a real need for ResidueGroup as a separated class, maybe we can re-organize the code to make the whole ResiudeGroup -> Residue -> Atom clearer.


On this matter, see https://khinsen.wordpress.com/2013/04/10/lessons-from-sixteen-years-of-molecular-simulation-in-python/, at the end he says he would prefer having only Atom and Group instead of the whole Atoms, Groups, Molecules, Complexes, AminoAcidResidue, PeptideChain, and Protein. He does not go on to say why, but I would guess less types of groups make the code more general and easier to understand.

@frodofine
Copy link
Contributor Author

I really like the selection based idea, although I would have no idea how to implement it...

@Luthaf
Copy link
Member

Luthaf commented Jul 2, 2018

Just to be sure, you mean that a Selection based solution would fulfil your needs here?


On the implementation side, the selection engine is basically a simple interpreter for a small language.

The main matching happens in one of the Selector subclass

/// Abstract base class for selectors in the selection AST
class Selector {
public:
/// Pretty-printing of this selector. The output should use a shift
/// of `delta` spaces in case of multilines output.
virtual std::string print(unsigned delta = 0) const = 0;
/// Check if the `match` is valid in the given `frame`.
virtual bool is_match(const Frame& frame, const Match& match) const = 0;
/// Optimize the AST corresponding to this Selector. Currently, this only
/// perform constant propgations in mathematical expressions.
virtual void optimize() {}

The main function here is is_match, that check if a given Match is valid for the current selection in a specific frame.

Creating the Selector from the selection string is the job of the parser (src/selections/parser.cpp), which is a recursive descent parser. To add support for properties in selections, one would need to add a new case in Parser::selector().


Adding support for generic property would be hard, so we could start with only string properties.

On the Selector side, this would mean adding a new StringSelector implementation that check if the property exist, if it is a string and if it matches the required value.

On the parser side, the code would depend on the desired syntax. We could add support for chain_id specifically by adding a new chain_id selector (chain_id == A, chain_id(#2) != A) which we could just add in the list of string selectors:

static std::map<std::string, string_prop_creator_t> STRING_PROPERTIES = {
{"name", [](std::string value, bool equals, Variable var){ return Ast(new Name(value, equals, var));}},
{"type", [](std::string value, bool equals, Variable var){ return Ast(new Type(value, equals, var));}},
{"resname", [](std::string value, bool equals, Variable var){ return Ast(new Resname(value, equals, var));}},
};

A more generic solution for arbitrary property names and types could be nice, but is much harder to achieve. Syntax wise, I would go for something like property("long name", #1) == 45.0 or property("name", #3) == "foo" or property("bool"). An alternative syntax like [long name](#1) == 45.0 or [name](#3) == "foo" or [bool] could work too.

@frodofine
Copy link
Contributor Author

Yes, a selection mechanism would satisfy my needs (in my own project I build the residue group myself anyway, I was really just suggesting to move that upstream).

I could add the selector part my self as it looks easy enough, but I need to finish up some other projects before I can look at this more in detail.

@frodofine
Copy link
Contributor Author

This issue is closed, with the above PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants