Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefixed and others reset offsets returned by Tell #1014

Closed
ntrrgc opened this issue Feb 4, 2023 · 0 comments · Fixed by #1015
Closed

Prefixed and others reset offsets returned by Tell #1014

ntrrgc opened this issue Feb 4, 2023 · 0 comments · Fixed by #1015
Assignees

Comments

@ntrrgc
Copy link

ntrrgc commented Feb 4, 2023

Imagine we have this simple example where we want to record the position of the "payload" object within our file to be able to reference it:

In [58]: Struct(
    ...:     "version" / Int16ub,
    ...:     "box" / Struct(
    ...:         "size" / Int16ub,
    ...:         "payload_position" / Tell,
    ...:         "payload" / Bytes(this.size),
    ...:     ),
    ...: ).parse(bytes([0, 1, 0, 5, 1, 2, 3, 4, 5]))
Out[58]: Container(version=1, box=Container(size=5, payload_position=4, payload=b'\x01\x02\x03\x04\x05'))

In that example, payload_position is 4, as expected. Our payload starts at byte index 4 in the parsed stream.

Now, let's write an alternative version using Prefixed(), which a user could expect to be functionally equivalent:

In [59]: Struct(
    ...:     "version" / Int16ub,
    ...:     "box" / Prefixed(Int16ub, Struct(
    ...:         "payload_position" / Tell,
    ...:         "payload" / GreedyBytes,
    ...:     )),
    ...: ).parse(bytes([0, 1, 0, 5, 1, 2, 3, 4, 5]))
Out[59]: Container(version=1, box=Container(payload_position=0, payload=b'\x01\x02\x03\x04\x05'))

Oops, payload_position=0. We lost track of the position of the payload. That's not a very useful Tell in this case.

The reason is Tell relies stream.tell(), and several subconstructs create io.BytesIO() from data read from the original substream, which makes stream.tell() return 0. Furthermore, as far as I've been able to find, the position of the parent stream at the time of creating the BytesIO is not recorded, so this information cannot be recovered from the Container.

$ git grep --heading -p '_parsereport(io.BytesIO'
construct/core.py
class OffsettedEnd(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
class Prefixed(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
class FixedSized(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
class NullTerminated(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
class NullStripped(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
class Transformed(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
class ProcessXor(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
class ProcessRotateLeft(Subconstruct):
        return self.subcon._parsereport(io.BytesIO(data), context, path)
ntrrgc added a commit to ntrrgc/construct that referenced this issue Feb 4, 2023
Fixes construct#1014

This patch wraps the BytesIO substreams made by the following
subconstructs during parsing:

* OffsettedEnd
* Prefixed
* FixedSize
* NullTerminated
* NullStripped
* ProcessXor

This allows the Tell construct to return offsets that are relative to
the beginning of the stream, rather than relative to the last occurence
of these subconstructs.

This has not been implemented for the following subconstructs:

* Transformed: there is no guarantee that there won't be any byte
  swapping in the transformation, therefore we can't map offsets without
  knowledge of the specific transformation.

* ProcessRotateLeft: tell() offsets would be decreasing, which could
  cause all sorts of problems.

In both those cases, it makes more sense for the user to use a Tell
immediately before the Transformed or ProcessRotateLeft and do their own
mapping.
ntrrgc added a commit to ntrrgc/construct that referenced this issue Feb 4, 2023
Fixes construct#1014

This patch wraps the BytesIO substreams made by the following
subconstructs during parsing:

* OffsettedEnd
* Prefixed
* FixedSize
* NullTerminated
* NullStripped
* ProcessXor

This allows the Tell construct to return offsets that are relative to
the beginning of the stream, rather than relative to the last occurence
of these subconstructs.

This has not been implemented for the following subconstructs:

* Transformed: there is no guarantee that there won't be any byte
  swapping in the transformation, therefore we can't map offsets without
  knowledge of the specific transformation.

* ProcessRotateLeft: tell() offsets would be decreasing, which could
  cause all sorts of problems.

In both those cases, it makes more sense for the user to use a Tell
immediately before the Transformed or ProcessRotateLeft and do their own
mapping.
@arekbulski arekbulski self-assigned this Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants