Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make constructing records less horribly inefficient #80

Open
jakobnissen opened this issue Aug 9, 2022 · 0 comments
Open

Make constructing records less horribly inefficient #80

jakobnissen opened this issue Aug 9, 2022 · 0 comments

Comments

@jakobnissen
Copy link
Member

jakobnissen commented Aug 9, 2022

In v2, the way records are constructed (directly, not parsed from a file) is not very nice, which hints at an underlying problem. Luckily this is internal and can be solved in a non-breaking feature release.

The idea is that we only want one source of what is a valid FASTX file, so when a user constructs a record from e.g. a string, we use the Automa parser also. Two issues with this:

First, the Automa parser only works on a TranscodingStream due to the way Automa generates the code. This means that to parse a bytevector like a string, we need to needlessly wrap it in a NoopStream(IOBuffer(data)), which creates way more overhead than needed.
To construct records from raw parts this is even more roundabout, since we first print the parts to an IOBuffer, then convert to a bytearray, then back to an IOBuffer.

The second issue is what happens if the user does parse(FASTARecord, ">A\nG\n>A\nG"). Clearly this should error as there are two records. But the Automa machine simply returns that it found a record. What the code does now is to try to load another record, then throw an error if that succeeds. This is simply bad design. [EDIT: I've changed that to instead use NoopStream's internals. It's better design, but still not great]

A better solution would be to somehow make parsing of a record from a byte buffer the single fundamental operation, then have the IO-based code operate on top of that. This is pretty tricky and requires a rework of Automa (but it would also make the Automa code way easier to reason about, so it should probably happen)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant