-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group module #29
Comments
Mara Kim has started a group module |
It seems the Group module has been deprecated. I'm marking closed. |
I'm reopening this issue because our recent discussion found that several groups are still using this module. |
I am heavily using the group module for 1) protein families, 2) ortholog 3) metabolic pathways. During the call a couple months ago, others mentioned that they are treating families as dbxref's and denoting membership as with a feature_dbxref. But we organize our families into different nodes of a phylogenetic tree; so we want to have groups of families at the chlorophyte node, at the grass node and so on. So for us the group module is the way to go. My question about the existing module is whether there is a use case for the grpmember table in addition to the feature_grpmember and organism_grpmember. Could this be simplified to using feature_grp and organism_grp to directly point from the grp to the feature or organism? I have made one extension to the group module by adding analysisfeaturegrp. This is similar to analysisfeature in capturing the results of an analysis of an individual feature to a group. I'm using it for hmmscores of proteins to a protein family. |
What are the chances @astralarya is still working with Chado/Tripal? I'm curious how her use of the group module is going for them. It was quite some time ago that she created this page: http://www.gmod.org/wiki/Chado_Group_Module |
@scottcain interesting you bring this up. We've had some recent discussions about this module for Tripal specifically to show protein families. So, there is interest to use the module, or at least a subset of it. We were thinking of proposing a much simpler version of this module so that we could get some tables into an official release of Chado for this. |
We’ve been using the group module for protein families or a while now.
A grp is a family. It has dbxref that points to a common clade (we have families grouped by different taxonomic node) and we attached MSA alignments to the grpmembers through grpmemberprop and the HMM is a grpprop.
The one extension I have added is a table AnalysisFeatureGrp: it’s the hmm score of a protein (from a proteome which was not included in the family build) to the family.
Joe
… On Apr 13, 2021, at 5:39 PM, Stephen Ficklin ***@***.***> wrote:
@scottcain <https://github.com/scottcain> interesting you bring this up. We've had some recent discussions about this module for Tripal specifically to show protein families. So, there is interest to use the module, or at least a subset of it. We were thinking of proposing a much simpler version of this module so that we could get some tables into Chado for this.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2YBDOOB4LSKIV7PR3VLZLTITP5XANCNFSM4ELGZ7ZA>.
|
Hello everyone! I haven't been working with a Chado database for many years at this point, but I am happy to see my work still being utilized (at least to some extent) even today. From my own usage of the group module, I found that for simple cases the *_dbxref tables were sufficient to associate entities to groups. From there, it was also possible to use cvterm_dbxref to encode hierarchies and other complex relationships. I think any official implementation of the group module should probably use these semantics as much as possible. @spficklin I originally designed the group module with the thinking to use it for protein families but then went with the simpler approach detailed above. I'm curious to learn more about which aspects you are finding useful. @JoeCarlson presents a very interesting use case from the perspective of multiple sequence alignments. This does sound like something that would necessitate the more abstract groups I was thinking about when I wrote up the original proposal. I would definitely like to learn more about that usage. |
On Apr 14, 2021, at 8:17 AM, Mara Kim ***@***.***> wrote:
Hello everyone! I haven't been working with a Chado database for many years at this point, but I am happy to see my work still being utilized (at least to some extent) even today.
From my own usage of the group module, I found that for simple cases the *_dbxref tables were sufficient to associate entities to groups. From there, it was also possible to use cvterm_dbxref to encode hierarchies and other complex relationships. I think any official implementation of the group module should probably use these semantics as much as possible.
@spficklin <https://github.com/spficklin> I originally designed the group module with the thinking to use it for protein families but then went with the simpler approach detailed above. I'm curious to learn more about which aspects you are finding useful.
@JoeCarlson <https://github.com/JoeCarlson> presents a very interesting use case from the perspective of multiple sequence alignments. This does sound like something that would necessitate the more abstract groups I was thinking about when I wrote up the original proposal. I would definitely like to learn more about that usage.
If you just want to describe a group of proteins, then attaching a dbxref to them will certainly do the trick. And you can have dbxref_relationship records to describe how the different families are part of one clade. But we wanted to store a bit more. The multiple alignment of the protein is stored as a the alignment string of a single protein in the grpmemberprop table. (We had been storing the full MSA as a grpprop. But they were getting too big to handle as java strings.) We generate the MSAs on the fly by extracting all the individual alignment strings. HMM matrices are stored in text format as grpprops.
We compute protein families from a set of well annotated proteomes and as we add new ones to our set we compute HMM scores to that set of families to get the best family assignment. So we want to have all the precomputed scores in the db. That necessitated the analysisfeaturegroup table.
Joe
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2YBDO4K2YM5OLID4FXRGDTIWWY7ANCNFSM4ELGZ7ZA>.
|
@astralarya we do miss you! One question for you. it was my understanding that the Group module had performance challenges or was overly complicated and that's why it didn't move forward? I can't remember. I'm willing to help push it a long a bit if we can get a final consensus on it. Because I thought there were issues with the current module, I was thinking the following as a simplified version of the group module: Tables:
These tables follow the normal Chado schema for similar tables and would be very similar to the I'm not sure I understand the use of the |
@spficklin You can use for example the Pfam dbxref to associate all instances of a protein using feature_dbxref. Then you could use dbxref_relationship to encode hierarchies. Perhaps there is something in your use case that necessitates more functionality akin to the group module proposal. I agree that a more simple implementation is the right direction here. @JoeCarlson I think that maybe it would make more sense to develop a more first class representation for alignments in general for Chado, as opposed to the original general grouping module that I proposed. The central problem that I see is that the feature table requires an organism_id and feature_relationship is unsuitable for elegantly specifying an MSA. Here, a feature_group of some sort makes a lot of sense. |
From the google dock
The group schema will define groups of things.
Here is the group module discussion.
The text was updated successfully, but these errors were encountered: