Skip to content

GreedyRange's exception swallowing behaviour is problematic (for some usecases) #1025

@phire

Description

@phire

I know GreedyRange is designed to abort when it encounters an exception.
But for some usecases, this behavour is problematic.

For example, I'm parsing a file format that has multiple concatinated "substreams", each filled with an unknown number of variable length records. This seems like a perfect usecase for FixedSize and GreedyRange

Simplified example:

header = Struct(
  "SectionASize" / Int32ul,
  "SectionBSize" / Int32ul,
  "SectionCSize" / Int32ul,
)

file = Struct(
  "Header" / header,
  "SectionA" / FixedSize(this.Header.SectionASize, GreedyRange(ARecord)), # Note: Records are variable length
  "SectionB" / FixedSize(this.Header.SectionBSize, GreedyRange(BRecord)),
  "SectionC" / FixedSize(this.Header.SectionCSize, GreedyRange(CRecord)),
)

And this works perfectly fine. But only if the file is valid and the record types have been implemented correctly.
When the file is invalid, or I'm still the the process of reverse engineering the file format types and one of the records throws and exception, GreedyRange will just silently truncate all remaining items. That's a huge pain, I might not even notice there was an error until way later, and it can be hard to debug.

I tried to find an alternative solution. The best I found was a hacking way of using RepeatUntil

"SectionA" / FixedSize(this.Header.SectionASize, RepeatUntil(
    lambda x, lst, ctx: x._io.tell() == ctx.Header.SectionASize, 
    ARecord
 ),

The fact that I had to read the source code and access a non-public variable is highly indicative of a deficiency in construct's API. Maybe there is a better way? But I couldn't find it.

Users might also encounter this when using GreedyRange with OffsettedEnd or otherwise just expecting GreedyRange until the end of a file to actually consume the whole file.

Other usecases?

The other time I've run into this is when parsing a variable length list with termination conditions. The documentation suggests that StopIf can be used to signal a termination condition. But because GreedyRange always terminates on any exceptions it silently swallows any exceptions before your StopIf condition.

For this usecase, it's possible to use an alternative implementation with RepeatUntil and expressing the termination condition in the lambda. But that moves where the termination condition is expressed (and sometimes you want it to be with the subcon), and kind of means StopIf is useless for GreedyRange.

Possible solutions

I've brainstormed a few potential solutions to this problem, but I'm not sure the best direction. Maybe someone has a better idea?

New Repeater

Add a new repeater that is just GreedyRange without exception swallowing.

Perhaps named something like RepeatUntilEnd or FillRange.

Add an alternative mode to GreedyRange

An optional parameter could be added to GreedyRange that controls the swallowing behavour, allowing the user to do custom exception filters.

  • When not supplied, GreedyRange keeps the current behaviour
  • When something Falsifiable (ie None, [], False) is supplied, all exceptions (except StopFieldError) are propagated up
  • When a list of exceptions is supplied, all other are propagated
  • When a lambda is supplied, it's called with the exception allowing for custom filtering code (or other custom behaviour on exceptions)

Change the GreedyRange API

Maybe GreedyRange shouldn't be doing this swallowing behaviour by default?
It's somewhat counterintuitive, and maybe users should be explictly opting into this stopping on exceptions (a StopIfException wrapper?)

But this would be a breaking API change, which I guess that would be a hard sell.

A ConsumeAll wrapper

Perhaps the user should be explictly stating that they expect a subcon to fill all remaining space in the stream. For example:

"SectionA" / FixedSize(this.Header.SectionASize, ConsumeAll(
    # Expect entire FixedSized substream to be filled with ARecords
    GreedyRange(ARecord)
)),

This has wider applications beyond just GreedyRange as ConsumeAll would always throw an error when it's subcon doesn't completely fill the remaining space in the stream.

The major advantage here is you can do things like:

"SectionA" / FixedSize(this.Header.SectionASize, ConsumeAll(
    # Same as above, but allow the substream to be padded with nulls to a multiple of 16 bytes
    Aligned(16, GreedyRange(ARecord))
)),

It would be pretty hard to express this with any of the previous solutions.

However, it doesn't entirely fix the issue of GreedyRange swallowing exceptions. While we get an error from ConsumeAll when the stream had unused bytes, we won't know why GreedyRange aborted early.

So GreedyRange would need to modify it's behaviour when nested inside a ConsumeAll context to stop swallowing exceptions (or store them until later for ConsumeAll to retrieve them). Though the design gets complex, what if there are multiple Greedys inside a ConsumeAll? Or a complex structure with nesting and other variable length fields?

I'm not sure I like the idea of GreedyRange's behaviour changing based on it's wrapper, maybe this solution can be combined with one of the above solutions?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions