Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group module #29

Open
bradfordcondon opened this issue Jan 10, 2018 · 11 comments
Open

group module #29

bradfordcondon opened this issue Jan 10, 2018 · 11 comments

Comments

@bradfordcondon
Copy link
Contributor

bradfordcondon commented Jan 10, 2018

From the google dock
The group schema will define groups of things.
Here is the group module discussion.

As per the discussion with Scott Cain, Stephen Ficklin, Lacey Sanderson, a proposal was made to require the following for addition of new modules (including group module)
Properly define use cases that provide enough detail such that anyone can understand how the data will be stored in each table
A full schema DDL must be available.
Documentation for the module must be ready to be published on the GMOD wiki, including at a minimum HTML tables describing the module

@bradfordcondon
Copy link
Contributor Author

Mara Kim has started a group module

@spficklin
Copy link
Contributor

spficklin commented Mar 15, 2018

It seems the Group module has been deprecated. I'm marking closed.

@bradfordcondon
Copy link
Contributor Author

bradfordcondon commented May 4, 2018

I'm reopening this issue because our recent discussion found that several groups are still using this module.

@bradfordcondon bradfordcondon reopened this May 4, 2018
@JoeCarlson
Copy link

I am heavily using the group module for 1) protein families, 2) ortholog 3) metabolic pathways. During the call a couple months ago, others mentioned that they are treating families as dbxref's and denoting membership as with a feature_dbxref. But we organize our families into different nodes of a phylogenetic tree; so we want to have groups of families at the chlorophyte node, at the grass node and so on. So for us the group module is the way to go.

My question about the existing module is whether there is a use case for the grpmember table in addition to the feature_grpmember and organism_grpmember. Could this be simplified to using feature_grp and organism_grp to directly point from the grp to the feature or organism?

I have made one extension to the group module by adding analysisfeaturegrp. This is similar to analysisfeature in capturing the results of an analysis of an individual feature to a group. I'm using it for hmmscores of proteins to a protein family.

@scottcain
Copy link
Member

What are the chances @astralarya is still working with Chado/Tripal? I'm curious how her use of the group module is going for them. It was quite some time ago that she created this page: http://www.gmod.org/wiki/Chado_Group_Module

@spficklin
Copy link
Contributor

spficklin commented Apr 14, 2021

@scottcain interesting you bring this up. We've had some recent discussions about this module for Tripal specifically to show protein families. So, there is interest to use the module, or at least a subset of it. We were thinking of proposing a much simpler version of this module so that we could get some tables into an official release of Chado for this.

@JoeCarlson
Copy link

JoeCarlson commented Apr 14, 2021 via email

@astralarya
Copy link

Hello everyone! I haven't been working with a Chado database for many years at this point, but I am happy to see my work still being utilized (at least to some extent) even today.

From my own usage of the group module, I found that for simple cases the *_dbxref tables were sufficient to associate entities to groups. From there, it was also possible to use cvterm_dbxref to encode hierarchies and other complex relationships. I think any official implementation of the group module should probably use these semantics as much as possible.

@spficklin I originally designed the group module with the thinking to use it for protein families but then went with the simpler approach detailed above. I'm curious to learn more about which aspects you are finding useful.

@JoeCarlson presents a very interesting use case from the perspective of multiple sequence alignments. This does sound like something that would necessitate the more abstract groups I was thinking about when I wrote up the original proposal. I would definitely like to learn more about that usage.

@JoeCarlson
Copy link

JoeCarlson commented Apr 14, 2021 via email

@spficklin
Copy link
Contributor

@astralarya we do miss you! One question for you. it was my understanding that the Group module had performance challenges or was overly complicated and that's why it didn't move forward? I can't remember. I'm willing to help push it a long a bit if we can get a final consensus on it.

Because I thought there were issues with the current module, I was thinking the following as a simplified version of the group module:

Tables:

  • group
  • group_feature, group_stock, group_organism, etc.
  • group_pub
  • groupprop
  • group_dbxref
  • group_relationship

These tables follow the normal Chado schema for similar tables and would be very similar to the stock module. This gets rid of the grpmember table and the <base table>_grpmember linker tables.

I'm not sure I understand the use of the dbxref table to form groups. It's my understanding the dbxref table is meant to link to external references. But, maybe I'm misunderstanding the suggestion @JoeCarlson and @astralarya

@astralarya
Copy link

astralarya commented Apr 15, 2021

@spficklin You can use for example the Pfam dbxref to associate all instances of a protein using feature_dbxref. Then you could use dbxref_relationship to encode hierarchies. Perhaps there is something in your use case that necessitates more functionality akin to the group module proposal. I agree that a more simple implementation is the right direction here.

@JoeCarlson I think that maybe it would make more sense to develop a more first class representation for alignments in general for Chado, as opposed to the original general grouping module that I proposed. The central problem that I see is that the feature table requires an organism_id and feature_relationship is unsuitable for elegantly specifying an MSA. Here, a feature_group of some sort makes a lot of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants