Make constructing records less horribly inefficient #80

jakobnissen · 2022-08-09T14:02:05Z

In v2, the way records are constructed (directly, not parsed from a file) is not very nice, which hints at an underlying problem. Luckily this is internal and can be solved in a non-breaking feature release.

The idea is that we only want one source of what is a valid FASTX file, so when a user constructs a record from e.g. a string, we use the Automa parser also. Two issues with this:

First, the Automa parser only works on a TranscodingStream due to the way Automa generates the code. This means that to parse a bytevector like a string, we need to needlessly wrap it in a NoopStream(IOBuffer(data)), which creates way more overhead than needed.
To construct records from raw parts this is even more roundabout, since we first print the parts to an IOBuffer, then convert to a bytearray, then back to an IOBuffer.

The second issue is what happens if the user does parse(FASTARecord, ">A\nG\n>A\nG"). Clearly this should error as there are two records. But the Automa machine simply returns that it found a record. What the code does now is to try to load another record, then throw an error if that succeeds. This is simply bad design. [EDIT: I've changed that to instead use NoopStream's internals. It's better design, but still not great]

A better solution would be to somehow make parsing of a record from a byte buffer the single fundamental operation, then have the IO-based code operate on top of that. This is pretty tricky and requires a rework of Automa (but it would also make the Automa code way easier to reason about, so it should probably happen)

The text was updated successfully, but these errors were encountered:

jakobnissen mentioned this issue Aug 13, 2022

Migrate to new Automa release #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make constructing records less horribly inefficient #80

Make constructing records less horribly inefficient #80

jakobnissen commented Aug 9, 2022 •

edited

Loading

Make constructing records less horribly inefficient #80

Make constructing records less horribly inefficient #80

Comments

jakobnissen commented Aug 9, 2022 • edited Loading

jakobnissen commented Aug 9, 2022 •

edited

Loading