Moving the i5k gene page to tripal 3 #5

bradfordcondon · 2018-10-30T18:24:58Z

https://i5k.nal.usda.gov/CLEC010822
vs
http://167.99.232.220/gene/185

My plan:

Demonstrate the "default" feature layout in tripal 3.
Identify what content would belong as new fields, custom modules, etc.

I'll get the demo site up so that i can link to compare screenshots and then fill out the rest of the below...

First impressions:

General

The Overview, sequences, and Transcript "tabs" are set up as tripal panes in tripal 3. It will take some custom styling to get them looking like you want (although the horizontal layout is a default option i think).

Overview tab

Pre-exsting Chado fields:

organism
Name
synonyms
location
transcripts
analysis links

Depends on how its stored in the db:

gene ID (featureprop? uniquename?)
annotator comments (featureprops?)

Sequences

Transcripts

So CLEC010822-RA doesnt have its own page? I actually like this a lot. Something I dont like about Tripal 3 genes is they only display select info about child features, and link out to child genes proteins etc. I like it much better all displayed on the gene page.

annotated terms:

is this feature_cvterm? if so, pre-existing field

feature details:

hmm start/stop of each subfeature, i dontk now that this exists. Subfeatures get listed as relationships, and sequences get listed in their relevant fields. I can imagine this becoming a custom field.

bradfordcondon · 2018-10-30T18:45:16Z

i was going to use my mini devseed, but i realize that isnt a great representation of your data since its mRNA focused instead of gene focused.

I will therefore load the bee genome from https://i5k.nal.usda.gov/content/data-downloads ... I should probably still minifiy it since space is a real concern.

bradfordcondon · 2018-10-31T13:51:51Z

ok so lets compare this to a DEFAULT gene page.

Note- i dont have any sequences appearing. This is a mistake on my end, i loaded scaffold, protein, and CDS fasta sequences and associated the sequences with the CDS. The fields by default on gene want to display the mRNA-associated sequences :(

Let me upload the mRNA sequences instead and they should appear here. edit: even doing so, the mRNA sequences dont appear. hmmmm i wonder if the field is bugged or not expecting how i loaded the data, let me look into it.....

http://167.99.232.220/gene/185

bradfordcondon · 2018-10-31T15:02:15Z

discussion of gene page improvements: tripal/tripal#100

transcript field is not designed to display sequence, just wahts included in the field: https://github.com/tripal/tripal/blob/7.x-3.x/tripal_chado/includes/TripalFields/so__transcript/so__transcript.inc

Instead the data__sequence field should display sequences- https://github.com/tripal/tripal/blob/7.x-3.x/tripal_chado/includes/TripalFields/data__sequence/data__sequence.inc . However per tripal's general philosophy, the sequence field only look as the targeted feature, not related features.
For example, the data__protein_sequence field displays protein sequences.... but only for features that are DIRECTLY CONNECTED to said feature.

So as a defininite starting point, we want one or two fields that

Aggregate all child features and display coordinates and annotations
Aggregate all cihld feature sequences to make it easy to download.

In both cases we need to think about speed and display as we could have several child mRNA each with several child CDS, proteins, etc. We might choose to limit data to, for example, jUST retrieving the mRNA, CDS, and protein details.

bradfordcondon · 2018-11-01T20:06:41Z

I'm going to start with a "general all child feature" field, specific for gene, and go from there. The term i'll use is data:0916 from EDAM: "Gene report" .

https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_0916

there's also nucleic acid report (https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_2084) ifw e want to make it more feature agnostic.

or Sequence features: https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_1255

this last one might be too heavily tied to the "feature table format" whatever that is.

bradfordcondon · 2018-11-05T20:12:30Z

Here's my first step.
Next step is to provide a clickable popup element for the sequences

I'm torn on if the popup should just be for sequence, or if it should have ALL extra information, including the featureloc (start, stop, strand) and the annotation definitions (right now the annotation field is just a list of annotation names. I need to load some example data so that it shows up here.....

I also had a "parent" field but in this data's case all the parents were the mRNA so I didn't include it...

mpoelchau · 2018-11-05T20:21:28Z

@bradfordcondon would there be a way to tie the sequence retrieval into the Tripal collections module somehow?

We currently only have functionality for users to copy/paste from the popup. Ideally though they'd be able to download as fasta. Or both.

bradfordcondon · 2018-11-05T20:24:13Z

Collections are very very buggy right now. Unfortunately I can't recommend planning around them from an end user perspective (they are still very useful for admin purposes).

That said it would be easy and desirable to have an "add feature to collection" button. In this case the question then becomes collection of WHAT, right? Which feature type sequences?

We can have a separate discussion about that. but yes reading the second part of your message over again, im in agreement. the popup could offer both.

bradfordcondon · 2018-11-06T16:36:34Z

I've added a very simple popup window for each sequence.

Bug im aware of: that CDS has ::::: for annotations. It actually has no annotations, but that many unique start/stop locations, i forgot that table could have multiple entries so i am redoing the query.

I loaded the apis set @mpoelchau provided me with. here are the two genes:

Conclusions I draw:

We want to infer the sequence for each child feature if it doesnt have on directly associated (right?)
If yes, then thats simple for some features, and we need to be cautious for others (CDS comes to mind... any others? My biology is rusty...)

The fact that the parent table is the top one, and the children table is the bottom, isnt clear at all. Let's maybe reformat the first one to just be two columns, key and value? And add the name, uniquename, etc, to make it more clear?

Additional: We need to link out to feature pages if they exist by checking if the entity exists.

bradfordcondon · 2018-12-18T19:59:16Z

Rather than try to cram all of the child feature info into a single field, the approach taken will be to

a) create a master index field which follows tripal web services best practice and has all the info needed for child features,

b) have other fields check that that field has loaded. Once it has, pull in the info they need.

The master field has a draft done: i put the code in my catch-all module for now: https://github.com/statonlab/tripal_manage_analyses/tree/gene_field

It stores an object looking like this in its . value:

[feature_id of first child] => [ info => ['array_of_info'], children => ['array_of_children keyed by feature_id']]

As you can see each node of the array has info describing that feature, and children which is a list of feature children associated with that feature. At each node, in info, we've crammed in all hte stuff we're going to want to pass along to other fields. This includes:

the sequence, for display in the sequence field
feature_cvterms for display in hte annotation field
props for display in a new, to-bewritten, childprop field (since each property is its own field in tripal 3, there isnt one field htat summarizes them we can stick our code into).

Now that the base index field is taken care of, I'll work on the childprop field.

bradfordcondon · 2018-12-19T16:41:10Z

Here's the child properties field in action:

we're on a gene page (FRAEX38873_v2_000001410). each mRNA (FRAEX38873_v2_000001410.1 & 2) has a collapsible fieldset with a table inside listing any featureprops associated with the mRNA or any child of the mRNA or any children of that child ie FRAEX38873_v2_000001410.2.cds3.

I'll follow this general design pattern for the other alternate fields: so the new annotations field, for example, will similarly be broken down into collapsible fieldsets by mRNA. We do this to keep the information manageable-- we expect different isoforms of the same transcript to have many of the same annotations.

bradfordcondon · 2019-01-28T13:38:44Z

I opened a PR into core tripal/tripal#837

We have athree fields:

the base field that stores all info. It also maps out the feature:

a feature property field (featureprop)

a feature annotation field (feature_cvterm)

There are some outstanding issues im working on.

I think as far as I5K is concerned, whats really missing is a nice way of displaying all the sequence information.

bradfordcondon · 2019-03-21T13:38:34Z

a note on collapsible fieldsets- I just merged a pr that makes them compatible with i5k's theme.

bradfordcondon · 2019-03-25T17:28:21Z

OK, this isnt amazing but it does work on all themes which was the trick- We now have a sequence column. Clicking on the word sequence expands it, clicking again hides it.

We use the chado_get_feature_sequences API which in theory (but not in practice for i5k data i've tested) should find sequenced derived from mapped sequences.

mpoelchau · 2020-09-30T13:41:47Z

Great discussion here but in the interim we're using the vanilla tripal gene pages. We'll resume the discussion on gene pages at a later date.

bradfordcondon self-assigned this Oct 30, 2018

bradfordcondon added the discussion label Oct 30, 2018

bradfordcondon mentioned this issue Nov 6, 2018

GFF annotations are not where tripal expects #8

Closed

bradfordcondon added the tripal 3 migration label Mar 11, 2019

mpoelchau mentioned this issue Mar 20, 2019

gene pages don't retrieve child feature sequence #38

Closed

mpoelchau closed this as completed Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving the i5k gene page to tripal 3 #5

Moving the i5k gene page to tripal 3 #5

bradfordcondon commented Oct 30, 2018 •

edited

bradfordcondon commented Oct 30, 2018 •

edited

bradfordcondon commented Oct 31, 2018 •

edited

bradfordcondon commented Oct 31, 2018 •

edited

bradfordcondon commented Nov 1, 2018 •

edited

bradfordcondon commented Nov 5, 2018 •

edited

mpoelchau commented Nov 5, 2018

bradfordcondon commented Nov 5, 2018 •

edited

bradfordcondon commented Nov 6, 2018 •

edited

bradfordcondon commented Dec 18, 2018

bradfordcondon commented Dec 19, 2018

bradfordcondon commented Jan 28, 2019

bradfordcondon commented Mar 21, 2019

bradfordcondon commented Mar 25, 2019

mpoelchau commented Sep 30, 2020

Moving the i5k gene page to tripal 3 #5

Moving the i5k gene page to tripal 3 #5

Comments

bradfordcondon commented Oct 30, 2018 • edited

General

Overview tab

Sequences

Transcripts

bradfordcondon commented Oct 30, 2018 • edited

bradfordcondon commented Oct 31, 2018 • edited

bradfordcondon commented Oct 31, 2018 • edited

bradfordcondon commented Nov 1, 2018 • edited

bradfordcondon commented Nov 5, 2018 • edited

mpoelchau commented Nov 5, 2018

bradfordcondon commented Nov 5, 2018 • edited

bradfordcondon commented Nov 6, 2018 • edited

bradfordcondon commented Dec 18, 2018

bradfordcondon commented Dec 19, 2018

bradfordcondon commented Jan 28, 2019

bradfordcondon commented Mar 21, 2019

bradfordcondon commented Mar 25, 2019

mpoelchau commented Sep 30, 2020

bradfordcondon commented Oct 30, 2018 •

edited

bradfordcondon commented Oct 30, 2018 •

edited

bradfordcondon commented Oct 31, 2018 •

edited

bradfordcondon commented Oct 31, 2018 •

edited

bradfordcondon commented Nov 1, 2018 •

edited

bradfordcondon commented Nov 5, 2018 •

edited

bradfordcondon commented Nov 5, 2018 •

edited

bradfordcondon commented Nov 6, 2018 •

edited