Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore trailing semicolons in GFF3 attributes column #2718

Closed
heavywatal opened this issue Feb 6, 2022 · 3 comments · Fixed by #2726
Closed

Ignore trailing semicolons in GFF3 attributes column #2718

heavywatal opened this issue Feb 6, 2022 · 3 comments · Fixed by #2726
Labels
bug Something isn't working

Comments

@heavywatal
Copy link
Contributor

jbrowse text-index (and FeatureTrack display sometimes?) fails to parse GFF3 file with trailing semicolons at attributes (9th) column.

Sorry I haven't tested this thoroughly with minimal data, but it is recovered by applying sed -e 's/;$//' to GFF3.

Expected behavior

GFF3 parser should (be able to) ignore trailing semicolons although it may be input files' fault. The current GFF3 spec is not clear enough about this.

Screenshots

Version:

@jbrowse/cli/1.6.4 darwin-x64 node-v16.13.21
Google Chrome Version 97.0.4692.99
macOS 12.2 Monterey

Additional context

I have submitted an issue The-Sequence-Ontology/Specifications#28 to clarify this, and another lawremi/rtracklayer#58 to avoid writing trailing semicolons.

@heavywatal heavywatal added the bug Something isn't working label Feb 6, 2022
@cmdcolin
Copy link
Collaborator

cmdcolin commented Feb 6, 2022

what behavior do you see when it 'fails to parse'? in basic testing on our volvox sample data i'm not sure i see any issue

@heavywatal
Copy link
Contributor Author

The output is like this:

Indexing assembly Oryza_sativa.IRGSP-1.0.dna_sm.genome...                                                       
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ renamed.gff3 0% | ETA: 0s/node_modules/@jb1
                        .split(';')                                                                             
                         ^                                                                                      
                                                                                                                
TypeError: Cannot read properties of undefined (reading 'split')                                                
    at indexGff3_1 (/node_modules/@jbrowse/cli/lib/types/gff3Adapter.js:61:26)
    at indexGff3_1.next (<anonymous>)
    at resume (/node_modules/tslib/tslib.js:225:48)
    at fulfill (/node_modules/tslib/tslib.js:227:35)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

The cause may be not semicolons, but filenames. This file, renamed.gff3.bgz, can be jbrowse text-indexed successfully after just renaming to renamed.gff3.gz without removing semicolons. A file without trailing semicolons causes the same error if it is named *.gff3.bgz.

@cmdcolin
Copy link
Collaborator

cmdcolin commented Feb 9, 2022

@heavywatal ah I see, indeed the code does explicitly check for .gz file extension and doesn't handle .bgz extension. we can fix that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants