Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derives_from strictness #4

Closed
cmdcolin opened this issue Feb 21, 2019 · 4 comments
Closed

Derives_from strictness #4

cmdcolin opened this issue Feb 21, 2019 · 4 comments

Comments

@cmdcolin
Copy link
Contributor

This is somewhat questionable whether it should be allowed, but the TAIR gff has lines such as this

Chr1  TAIR10  transposable_element_gene 433031  433819  . - . ID=AT1G02228;Note=transposable_element_gene;Name=AT1G02228;Derives_from=AT1TE01405

The feature AT1TE01405 is not in the file at all (tair link here though https://www.arabidopsis.org/servlets/TairObject?type=transposon&id=10102)

I guess it is some question of whether strictness of derives from can be turned off (and then in this case, would it skip resolving derives from or just not throw an error?)

@cmdcolin
Copy link
Contributor Author

cmdcolin commented Feb 21, 2019

It also has protein lines that fail parsing

Chr1  TAIR10  gene  3631  5899  . + . ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
Chr1  TAIR10  mRNA  3631  5899  . + . ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
Chr1  TAIR10  protein 3760  5630  . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1

@cmdcolin
Copy link
Contributor Author

Oh ya, link ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff

@nathanhaigh
Copy link

I think the TAIR GFF3 file needs fixing. There are issues with it straight off the bat:

  • They don't have a GFF3 header
  • They use Index as an attribute name - attribute names beginning with an uppercase letter are reserved for future use by the spec
  • The Derives_from values reference non-existant ID - the main cause of my problems with JBworse

I've posted an issue with GenomeTools regarding adding a check for the Derived_from attribute relationship: genometools/genometools#909

My workaround is to parse out the Derived_from attributes referencing TE's in addition to my other fixes:

curl ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff \
| sed '1i ##gff-version 3' \
| sed -r 's/Index/index/g; s/\;?Derives_from=AT[0-9]TE[0-9]+//' \
| gt gff3 -retainids \
| ~/git_repos/gff3sort/gff3sort.pl /dev/stdin \
| bgzip \
> TAIR10_GFF3_genes.gff3.gz

The question as to what JBrowse might do with Derived_from values which are not defined in the GFF3 file is perhaps up to you? I guess a useful warning will hep guide users to what the issue is so that it can be fixed by the upstream GFF3 generators.

@cmdcolin
Copy link
Contributor Author

cmdcolin commented Dec 6, 2022

can now set disableDerivesFromReferences on the parser options, which is enabled by default for jbrowse 2 (jbrowse 2 manually specifies). we could reconsider it to be default in this library possibly in the future

@cmdcolin cmdcolin closed this as completed Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants