Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 Record Encode Error in Attributes like ";product=xxx (a; b) xxx" #10

Open
cihga39871 opened this issue Dec 1, 2021 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@cihga39871
Copy link

The code:

using GFF3

r = GFF3.Record("Ga0225945_11\timg_core_v400\tCDS\t350909\t352399\t.\t-\t0\tID=2800905551;locus_tag=Ga0225945_11352;product=respiratory nitrite reductase (cytochrome; ammonia-forming) precursor")

Expected Behavior

Expect product attribute to be ["respiratory nitrite reductase (cytochrome; ammonia-forming) precursor"]

Current Behavior

ERROR: ArgumentError: failed to index Any ~>""
Stacktrace:
 [1] macro expansion
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\reader.jl:310 [inlined]
 [2] index!(stream::TranscodingStreams.NoopStream{IOBuffer}, record::GFF3.Record)
   @ GFF3 C:\Users\x\.julia\packages\Automa\1KOLQ\src\Stream.jl:126
 [3] index!
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\reader.jl:118 [inlined]
 [4] convert
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:54 [inlined]
 [5] Record
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:44 [inlined]
 [6] convert(#unused#::Type{GFF3.Record}, str::String)
   @ GFF3 C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:69
 [7] GFF3.Record(str::String)
   @ GFF3 C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:65
 [8] top-level scope
   @ none:1

Possible Solution / Implementation

Treat ; in (), [] and {} as a non separator.

Your Environment

  • Package Version used: [af1dc308] GFF3 v0.2.1
  • Julia Version used: 1.6.1
  • Operating System and version (desktop or mobile): Windows 11
  • Link to your project: NA
(@v1.6) pkg> status
      Status `C:\Users\x\.julia\environments\v1.6\Project.toml`
  [c7e460c6] ArgParse v1.1.4
  [c52e3926] Atom v0.12.34
  [336ed68f] CSV v0.9.1
  [a93c6f00] DataFrames v1.2.2
  [1313f7d8] DataFramesMeta v0.9.1
  [31c24e10] Distributions v0.25.16
  [c2308a5c] FASTX v1.2.0
  [af1dc308] GFF3 v0.2.1
  [eeff360b] JobSchedulers v0.1.2
  [e5e0dc1b] Juno v0.8.4
  [ef544631] Pipelines v0.4.0
  [91a5bcdd] Plots v1.21.3
  [f3b207a7] StatsPlots v0.14.27
  [fdbf4ff8] XLSX v0.7.8
  [ddb6d928] YAML v0.4.7

Thank you.

@CiaranOMara
Copy link
Member

CiaranOMara commented Dec 2, 2021

This is an interesting issue. I had to check the specification for column 9. The semicolon is a reserved character that delimits the attributes. However, the specification stipulates that URL escaped semicolons may be used as part of an attribute tag or value. The URL encoding for a semicolon is %3B.

A possible solution is to URL encode your tags or values with HTTP.

using GFF3
using HTTP

str = "Ga0225945_11\timg_core_v400\tCDS\t350909\t352399\t.\t-\t0\tID=2800905551;locus_tag=Ga0225945_11352;product="*HTTP.escapeuri("respiratory nitrite reductase (cytochrome; ammonia-forming) precursor")

r = GFF3.Record(str)

When reading data back, HTTP.unescapeuri(str) is available if needed.

@CiaranOMara CiaranOMara added the enhancement New feature or request label Dec 2, 2021
@cihga39871
Copy link
Author

Thank you, Ciaran. I found the record is not stick to the specification, so it is possible OK to throw an error. Because the data is downloaded from JGI, so I cannot escape it unless I parse it manually.

It would be nice if GFF3.jl can hint the user when the data provided is not valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants