group module #29

bradfordcondon · 2018-01-10T17:48:35Z

From the google dock
The group schema will define groups of things.
Here is the group module discussion.

As per the discussion with Scott Cain, Stephen Ficklin, Lacey Sanderson, a proposal was made to require the following for addition of new modules (including group module)
Properly define use cases that provide enough detail such that anyone can understand how the data will be stored in each table
A full schema DDL must be available.
Documentation for the module must be ready to be published on the GMOD wiki, including at a minimum HTML tables describing the module

bradfordcondon · 2018-01-10T17:51:02Z

Mara Kim has started a group module

spficklin · 2018-03-15T05:58:50Z

It seems the Group module has been deprecated. I'm marking closed.

bradfordcondon · 2018-05-04T17:12:16Z

I'm reopening this issue because our recent discussion found that several groups are still using this module.

JoeCarlson · 2018-08-06T16:01:05Z

I am heavily using the group module for 1) protein families, 2) ortholog 3) metabolic pathways. During the call a couple months ago, others mentioned that they are treating families as dbxref's and denoting membership as with a feature_dbxref. But we organize our families into different nodes of a phylogenetic tree; so we want to have groups of families at the chlorophyte node, at the grass node and so on. So for us the group module is the way to go.

My question about the existing module is whether there is a use case for the grpmember table in addition to the feature_grpmember and organism_grpmember. Could this be simplified to using feature_grp and organism_grp to directly point from the grp to the feature or organism?

I have made one extension to the group module by adding analysisfeaturegrp. This is similar to analysisfeature in capturing the results of an analysis of an individual feature to a group. I'm using it for hmmscores of proteins to a protein family.

scottcain · 2021-04-13T23:17:29Z

What are the chances @astralarya is still working with Chado/Tripal? I'm curious how her use of the group module is going for them. It was quite some time ago that she created this page: http://www.gmod.org/wiki/Chado_Group_Module

spficklin · 2021-04-14T00:39:42Z

@scottcain interesting you bring this up. We've had some recent discussions about this module for Tripal specifically to show protein families. So, there is interest to use the module, or at least a subset of it. We were thinking of proposing a much simpler version of this module so that we could get some tables into an official release of Chado for this.

JoeCarlson · 2021-04-14T02:26:44Z

We’ve been using the group module for protein families or a while now. A grp is a family. It has dbxref that points to a common clade (we have families grouped by different taxonomic node) and we attached MSA alignments to the grpmembers through grpmemberprop and the HMM is a grpprop. The one extension I have added is a table AnalysisFeatureGrp: it’s the hmm score of a protein (from a proteome which was not included in the family build) to the family. Joe

…

On Apr 13, 2021, at 5:39 PM, Stephen Ficklin ***@***.***> wrote: @scottcain <https://github.com/scottcain> interesting you bring this up. We've had some recent discussions about this module for Tripal specifically to show protein families. So, there is interest to use the module, or at least a subset of it. We were thinking of proposing a much simpler version of this module so that we could get some tables into Chado for this. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2YBDOOB4LSKIV7PR3VLZLTITP5XANCNFSM4ELGZ7ZA>.

astralarya · 2021-04-14T15:17:16Z

Hello everyone! I haven't been working with a Chado database for many years at this point, but I am happy to see my work still being utilized (at least to some extent) even today.

From my own usage of the group module, I found that for simple cases the *_dbxref tables were sufficient to associate entities to groups. From there, it was also possible to use cvterm_dbxref to encode hierarchies and other complex relationships. I think any official implementation of the group module should probably use these semantics as much as possible.

@spficklin I originally designed the group module with the thinking to use it for protein families but then went with the simpler approach detailed above. I'm curious to learn more about which aspects you are finding useful.

@JoeCarlson presents a very interesting use case from the perspective of multiple sequence alignments. This does sound like something that would necessitate the more abstract groups I was thinking about when I wrote up the original proposal. I would definitely like to learn more about that usage.

JoeCarlson · 2021-04-14T18:30:57Z

On Apr 14, 2021, at 8:17 AM, Mara Kim ***@***.***> wrote: Hello everyone! I haven't been working with a Chado database for many years at this point, but I am happy to see my work still being utilized (at least to some extent) even today. From my own usage of the group module, I found that for simple cases the *_dbxref tables were sufficient to associate entities to groups. From there, it was also possible to use cvterm_dbxref to encode hierarchies and other complex relationships. I think any official implementation of the group module should probably use these semantics as much as possible. @spficklin <https://github.com/spficklin> I originally designed the group module with the thinking to use it for protein families but then went with the simpler approach detailed above. I'm curious to learn more about which aspects you are finding useful. @JoeCarlson <https://github.com/JoeCarlson> presents a very interesting use case from the perspective of multiple sequence alignments. This does sound like something that would necessitate the more abstract groups I was thinking about when I wrote up the original proposal. I would definitely like to learn more about that usage.

If you just want to describe a group of proteins, then attaching a dbxref to them will certainly do the trick. And you can have dbxref_relationship records to describe how the different families are part of one clade. But we wanted to store a bit more. The multiple alignment of the protein is stored as a the alignment string of a single protein in the grpmemberprop table. (We had been storing the full MSA as a grpprop. But they were getting too big to handle as java strings.) We generate the MSAs on the fly by extracting all the individual alignment strings. HMM matrices are stored in text format as grpprops. We compute protein families from a set of well annotated proteomes and as we add new ones to our set we compute HMM scores to that set of families to get the best family assignment. So we want to have all the precomputed scores in the db. That necessitated the analysisfeaturegroup table. Joe

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2YBDO4K2YM5OLID4FXRGDTIWWY7ANCNFSM4ELGZ7ZA>.

spficklin · 2021-04-14T18:43:19Z

@astralarya we do miss you! One question for you. it was my understanding that the Group module had performance challenges or was overly complicated and that's why it didn't move forward? I can't remember. I'm willing to help push it a long a bit if we can get a final consensus on it.

Because I thought there were issues with the current module, I was thinking the following as a simplified version of the group module:

Tables:

group
group_feature, group_stock, group_organism, etc.
group_pub
groupprop
group_dbxref
group_relationship

These tables follow the normal Chado schema for similar tables and would be very similar to the stock module. This gets rid of the grpmember table and the <base table>_grpmember linker tables.

I'm not sure I understand the use of the dbxref table to form groups. It's my understanding the dbxref table is meant to link to external references. But, maybe I'm misunderstanding the suggestion @JoeCarlson and @astralarya

astralarya · 2021-04-15T21:10:46Z

@spficklin You can use for example the Pfam dbxref to associate all instances of a protein using feature_dbxref. Then you could use dbxref_relationship to encode hierarchies. Perhaps there is something in your use case that necessitates more functionality akin to the group module proposal. I agree that a more simple implementation is the right direction here.

@JoeCarlson I think that maybe it would make more sense to develop a more first class representation for alignments in general for Chado, as opposed to the original general grouping module that I proposed. The central problem that I see is that the feature table requires an organism_id and feature_relationship is unsuitable for elegantly specifying an MSA. Here, a feature_group of some sort makes a lot of sense.

scottcain added the Chado 1.4 Suggestion label Jan 10, 2018

bradfordcondon mentioned this issue Jan 10, 2018

Integrating this module into Chado v1.4 astralarya/Chado-group-module#1

Open

spficklin closed this as completed Mar 15, 2018

bradfordcondon reopened this May 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

group module #29

group module #29

bradfordcondon commented Jan 10, 2018 •

edited

bradfordcondon commented Jan 10, 2018

spficklin commented Mar 15, 2018 •

edited

bradfordcondon commented May 4, 2018 •

edited

JoeCarlson commented Aug 6, 2018

scottcain commented Apr 13, 2021

spficklin commented Apr 14, 2021 •

edited

JoeCarlson commented Apr 14, 2021 via email

astralarya commented Apr 14, 2021

JoeCarlson commented Apr 14, 2021 via email

spficklin commented Apr 14, 2021

astralarya commented Apr 15, 2021 •

edited

group module #29

group module #29

Comments

bradfordcondon commented Jan 10, 2018 • edited

bradfordcondon commented Jan 10, 2018

spficklin commented Mar 15, 2018 • edited

bradfordcondon commented May 4, 2018 • edited

JoeCarlson commented Aug 6, 2018

scottcain commented Apr 13, 2021

spficklin commented Apr 14, 2021 • edited

JoeCarlson commented Apr 14, 2021 via email

astralarya commented Apr 14, 2021

JoeCarlson commented Apr 14, 2021 via email

spficklin commented Apr 14, 2021

astralarya commented Apr 15, 2021 • edited

bradfordcondon commented Jan 10, 2018 •

edited

spficklin commented Mar 15, 2018 •

edited

bradfordcondon commented May 4, 2018 •

edited

spficklin commented Apr 14, 2021 •

edited

astralarya commented Apr 15, 2021 •

edited