Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotate genes split across scaffolds #51

Closed
13 of 17 tasks
nathandunn opened this issue Oct 22, 2014 · 6 comments
Closed
13 of 17 tasks

annotate genes split across scaffolds #51

nathandunn opened this issue Oct 22, 2014 · 6 comments

Comments

@nathandunn
Copy link
Contributor

  • allow annotation of features when second contiguous projection
  • annotation when second contiguous projection can be projected off-site (or not at all)
  • test annotation when cross annotation (not folded)
  • test annotation when folded (and with padding)
  • getFeatures should project properly
    • projecting in ref seq is failing
    • fix offset no-exon projection when annotating locally (and viewing without projections) if second contiguous one
  • projections for the latter half lose strand
  • projecting with "padding" fails in HTMLFeatures (loses latter half)
  • handle proper gene name
  • handle proper transcript name
  • verify doing an annotation correctly
  • fix naming schema for cross-scaffold scheming
  • verify doing an annotation across scaffolds
  • feature event should publish back properly:
    • make sure that the "Bookmark" is being sent back over the wire
    • also subscribe to any sequence ids, as well.

Several of our groups are struggling with annotating genes that are split across scaffolds. I imagine you guys have discussed the feasibility of this before… This isn’t really a feature request, but I wanted to put it on your radar as something that annotators of very fragmented assemblies are dealing with. We looked a bit into the gff3 and chado support for gene features split across reference sequences - it appears that gff3 supports having one feature with a single ID (without having a parent feature grouping them), but the chado bulk loader doesn’t. There also doesn’t appear to be a CV term in SO that describes a gene split between sequences due to assembly fragmentation – which could potentially be used as a parent feature for these situations. So, I can see where there would be issues both in figuring out the best way to visualize split genes, as well as how this should be represented in gff3 or chado.

@nathandunn nathandunn added this to the 3.0 Release - Early 2015 milestone Oct 22, 2014
@selewis
Copy link

selewis commented Oct 22, 2014

Absolutely, it was one of the things written about in the proposal. It's
all part of the same problem, folding bits of the genome together when
needed (same technique to be used for intron elision, duplicated regions,
etc.) It all fit under Gregg's general title of genometry. He had it in
IGB, but we never got it into Apollo. I think we have some mockups of the
visualization in the proposal. More next week when I'm back.

-S

On Wed, Oct 22, 2014 at 6:15 PM, Nathan Dunn notifications@github.com
wrote:

Several of our groups are struggling with annotating genes that are split
across scaffolds. I imagine you guys have discussed the feasibility of this
before… This isn’t really a feature request, but I wanted to put it on your
radar as something that annotators of very fragmented assemblies are
dealing with. We looked a bit into the gff3 and chado support for gene
features split across reference sequences - it appears that gff3 supports
having one feature with a single ID (without having a parent feature
grouping them), but the chado bulk loader doesn’t. There also doesn’t
appear to be a CV term in SO that describes a gene split between sequences
due to assembly fragmentation – which could potentially be used as a parent
feature for these situations. So, I can see where there would be issues
both in figuring out the best way to visualize split genes, as well as how
this should be represented in gff3 or chado.


Reply to this email directly or view it on GitHub
#51.

@nathandunn nathandunn modified the milestones: 2.1, 2.0 Apr 6, 2015
@nathandunn
Copy link
Contributor Author

The architecture is setup to support this, but almost all of the code I converted from 1.0 assumed that there would always be a single feature location per feature.

@nathandunn nathandunn modified the milestones: 2.1, 2.2 Aug 7, 2015
@nathandunn nathandunn self-assigned this Nov 18, 2015
@nathandunn
Copy link
Contributor Author

So . . the issue is that instead of a sequence we are passing in a bookmark and based on that bookmark some location information.

location + bookmark -> reverse-projected location + sequence

So there is a ton of code relating to this in FeatureService . . . mostly it just needs to be generalized for location.

{"track":"{"padding":0, "projection":"None", "referenceTrack":"Official Gene Set v3.2", "sequenceList":[{"name":"Group1.1"},{"name":"GroupUn87"}], "label":"Group1.1::GroupUn87"}:-1..-1","features":[{"location":{"fmin":976735,"fmax":995721,"strand":1},"type":{"cv":{"name":"sequence"},"name":"mRNA"},"name":"GB42183-RA","children":[{"location":{"fmin":995216,"fmax":995721,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"}},{"location":{"fmin":976735,"fmax":976888,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"}},{"location":{"fmin":992139,"fmax":992559,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"}},{"location":{"fmin":992748,"fmax":993041,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"}},{"location":{"fmin":993307,"fmax":995721,"strand":1},"type":{"cv":{"name":"sequence"},"name":"exon"}},{"location":{"fmin":976735,"fmax":995216,"strand":1},"type":{"cv":{"name":"sequence"},"name":"CDS"}}]}],"operation":"add_transcript"}

@nathandunn
Copy link
Contributor Author

Basically a Feature should have multiple feature locations . . and of course those feature locations should be able to go to different reference sequences.

The data structures are setup correctly for this BUT we need to make this something we can support within the code as everywhere the assumption is that there only ever be a single location.

@nathandunn
Copy link
Contributor Author

In branch: multiscaffold_feature

nathandunn added a commit that referenced this issue Nov 29, 2015
nathandunn added a commit that referenced this issue Nov 30, 2015
nathandunn added a commit that referenced this issue Nov 30, 2015
nathandunn added a commit that referenced this issue Dec 1, 2015
nathandunn added a commit that referenced this issue Dec 1, 2015
nathandunn added a commit that referenced this issue Dec 1, 2015
nathandunn added a commit that referenced this issue Dec 2, 2015
nathandunn added a commit that referenced this issue Dec 2, 2015
@nathandunn
Copy link
Contributor Author

this works except for #1205 , setting the longest ORF. Finishing up there.

@nathandunn nathandunn modified the milestones: 2.1.0, 2.1.0-alpha Mar 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants