Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding specified after the encoded data #9

Open
Juul opened this issue Mar 11, 2017 · 14 comments
Open

Encoding specified after the encoded data #9

Juul opened this issue Mar 11, 2017 · 14 comments

Comments

@Juul
Copy link
Member

Juul commented Mar 11, 2017

In the currently available SBOL examples the encoding tag within the sequence tag is specified after the end of the elements tag. This is problematic for streaming parsers since they then have to buffer the entire contents of each elements tag before it can be decoded.

If the elements tag contains a lot of data e.g. if a user of SBOL compliant software decides to save a whole unannotated genome in SBOL format then the entire genome would have to be loaded into memory in such a parser.

Possibly something to improve for future SBOL versions?

@Juul Juul added the enhancement New feature or request label Mar 11, 2017
@cjmyers
Copy link
Contributor

cjmyers commented Mar 11, 2017 via email

@Juul
Copy link
Member Author

Juul commented Mar 12, 2017

Yes I am proposing that it should not be a tag at all but rather an attribute of either the sequence or element tags. The fact that you currently can encounter the encoding tag after the elements tag is causing issues with my streaming processor.

The reason why I need to know the encoding is that I don't even know if it's DNA, Amino Acids or SMILE data before I get to the encoding tag. I could look at the data itself but you can have AA or SMILE data that consists only of characters that are legal in either format.

My streaming processor is building a BLAST database from a large amount of user-uploaded files and it needs to discard the SMILE data (and sometimes DNA or Amino Acid sequence data depending on parameters) or the BLAST database command will exit with an error. I cannot even easily pre-categorize the sbol files on user upload since a single sbol file could contain sequences with different encodings, so I'm left with no option but to buffer an unknown and potentially very large amount of sequence data.

Regardless it's always good practice to keep metadata before the actual data, rather than leaving that decision to the implementors.

@graik
Copy link

graik commented Mar 12, 2017 via email

@Juul
Copy link
Member Author

Juul commented Mar 12, 2017

Hm, yes @graik that would definitely solve the problem. I don't know enough about SBOL to say if that might prevent some legitimate use-cases that mix DNA, protein and RNA.

@cjmyers
Copy link
Contributor

cjmyers commented Mar 12, 2017 via email

@graik
Copy link

graik commented Mar 12, 2017 via email

@cjmyers
Copy link
Contributor

cjmyers commented Mar 12, 2017 via email

@graik
Copy link

graik commented Mar 12, 2017 via email

@cjmyers
Copy link
Contributor

cjmyers commented Mar 12, 2017 via email

@graik
Copy link

graik commented Mar 12, 2017 via email

@palchicz
Copy link

@cjmyers has the change been incorporated into the library already and @Juul does this address your concern?

@jakebeal
Copy link
Contributor

I believe this is now moot for SBOL 3, which uses RDF as a serialization format (such that we don't have control of ordering) and which also allows genome-scale sequences to be stored as ExternalReference objects instead.

@cjmyers
Copy link
Contributor

cjmyers commented Aug 24, 2020

Not sure about this one. I think he wants the encoding to be a data type attribute. Might be worth further thought.

@cjmyers
Copy link
Contributor

cjmyers commented Oct 8, 2020

Should be dealt with by creating some genome editing use cases to ensure we do not need to store and exchange very large sequences.

@cjmyers cjmyers removed the enhancement New feature or request label Oct 9, 2020
@LukasBuecherl LukasBuecherl transferred this issue from SynBioDex/SBOL-specification Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants