-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving the i5k gene page to tripal 3 #5
Comments
i was going to use my mini devseed, but i realize that isnt a great representation of your data since its mRNA focused instead of gene focused. I will therefore load the bee genome from https://i5k.nal.usda.gov/content/data-downloads ... I should probably still minifiy it since space is a real concern. |
ok so lets compare this to a DEFAULT gene page. Note- i dont have any sequences appearing. This is a mistake on my end, i loaded scaffold, protein, and CDS fasta sequences and associated the sequences with the CDS. The fields by default on gene want to display the mRNA-associated sequences :( Let me upload the mRNA sequences instead and they should appear here. edit: even doing so, the mRNA sequences dont appear. hmmmm i wonder if the field is bugged or not expecting how i loaded the data, let me look into it..... |
discussion of gene page improvements: tripal/tripal#100 transcript field is not designed to display sequence, just wahts included in the field: https://github.com/tripal/tripal/blob/7.x-3.x/tripal_chado/includes/TripalFields/so__transcript/so__transcript.inc Instead the data__sequence field should display sequences- https://github.com/tripal/tripal/blob/7.x-3.x/tripal_chado/includes/TripalFields/data__sequence/data__sequence.inc . However per tripal's general philosophy, the sequence field only look as the targeted feature, not related features. So as a defininite starting point, we want one or two fields that
In both cases we need to think about speed and display as we could have several child mRNA each with several child CDS, proteins, etc. We might choose to limit data to, for example, jUST retrieving the mRNA, CDS, and protein details. |
I'm going to start with a "general all child feature" field, specific for gene, and go from there. The term i'll use is data:0916 from EDAM: "Gene report" . https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_0916 there's also nucleic acid report (https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_2084) ifw e want to make it more feature agnostic. or Sequence features: https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_1255 this last one might be too heavily tied to the "feature table format" whatever that is. |
Here's my first step. I'm torn on if the popup should just be for sequence, or if it should have ALL extra information, including the featureloc (start, stop, strand) and the annotation definitions (right now the annotation field is just a list of annotation names. I need to load some example data so that it shows up here..... I also had a "parent" field but in this data's case all the parents were the mRNA so I didn't include it... |
@bradfordcondon would there be a way to tie the sequence retrieval into the Tripal collections module somehow? We currently only have functionality for users to copy/paste from the popup. Ideally though they'd be able to download as fasta. Or both. |
Collections are very very buggy right now. Unfortunately I can't recommend planning around them from an end user perspective (they are still very useful for admin purposes). That said it would be easy and desirable to have an "add feature to collection" button. In this case the question then becomes collection of WHAT, right? Which feature type sequences? We can have a separate discussion about that. but yes reading the second part of your message over again, im in agreement. the popup could offer both. |
I've added a very simple popup window for each sequence. Bug im aware of: that CDS has ::::: for annotations. It actually has no annotations, but that many unique start/stop locations, i forgot that table could have multiple entries so i am redoing the query. I loaded the apis set @mpoelchau provided me with. here are the two genes: Conclusions I draw: We want to infer the sequence for each child feature if it doesnt have on directly associated (right?) The fact that the parent table is the top one, and the children table is the bottom, isnt clear at all. Let's maybe reformat the first one to just be two columns, key and value? And add the name, uniquename, etc, to make it more clear? Additional: We need to link out to feature pages if they exist by checking if the entity exists. |
Rather than try to cram all of the child feature info into a single field, the approach taken will be to a) create a master index field which follows tripal web services best practice and has all the info needed for child features, b) have other fields check that that field has loaded. Once it has, pull in the info they need. The master field has a draft done: i put the code in my catch-all module for now: https://github.com/statonlab/tripal_manage_analyses/tree/gene_field It stores an object looking like this in its . value:
As you can see each node of the array has info describing that feature, and children which is a list of feature children associated with that feature. At each node, in info, we've crammed in all hte stuff we're going to want to pass along to other fields. This includes:
Now that the base index field is taken care of, I'll work on the childprop field. |
Here's the child properties field in action: we're on a gene page (FRAEX38873_v2_000001410). each mRNA (FRAEX38873_v2_000001410.1 & 2) has a collapsible fieldset with a table inside listing any featureprops associated with the mRNA or any child of the mRNA or any children of that child ie FRAEX38873_v2_000001410.2.cds3. I'll follow this general design pattern for the other alternate fields: so the new annotations field, for example, will similarly be broken down into collapsible fieldsets by mRNA. We do this to keep the information manageable-- we expect different isoforms of the same transcript to have many of the same annotations. |
I opened a PR into core tripal/tripal#837 We have athree fields: the base field that stores all info. It also maps out the feature:
There are some outstanding issues im working on. I think as far as I5K is concerned, whats really missing is a nice way of displaying all the sequence information. |
a note on collapsible fieldsets- I just merged a pr that makes them compatible with i5k's theme. |
OK, this isnt amazing but it does work on all themes which was the trick- We now have a sequence column. Clicking on the word sequence expands it, clicking again hides it. We use the |
Great discussion here but in the interim we're using the vanilla tripal gene pages. We'll resume the discussion on gene pages at a later date. |
https://i5k.nal.usda.gov/CLEC010822
vs
http://167.99.232.220/gene/185
My plan:
I'll get the demo site up so that i can link to compare screenshots and then fill out the rest of the below...
First impressions:
General
The Overview, sequences, and Transcript "tabs" are set up as tripal panes in tripal 3. It will take some custom styling to get them looking like you want (although the horizontal layout is a default option i think).
Overview tab
Pre-exsting Chado fields:
Depends on how its stored in the db:
Sequences
Transcripts
So CLEC010822-RA doesnt have its own page? I actually like this a lot. Something I dont like about Tripal 3 genes is they only display select info about child features, and link out to child genes proteins etc. I like it much better all displayed on the gene page.
hmm start/stop of each subfeature, i dontk now that this exists. Subfeatures get listed as relationships, and sequences get listed in their relevant fields. I can imagine this becoming a custom field.
The text was updated successfully, but these errors were encountered: