Skip to content

Tolerate truncated / partially-corrupt FSSHTTPB streams#27

Open
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
orangeruan128:fix/tolerate-truncated-streams
Open

Tolerate truncated / partially-corrupt FSSHTTPB streams#27
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
orangeruan128:fix/tolerate-truncated-streams

Conversation

@orangeruan128
Copy link
Copy Markdown

Summary

Add local error recovery to four read sites in FileNode.py so a single truncated or partially-corrupt FSSHTTPB structure no longer aborts parsing of the entire document. No behavior change for well-formed input.

Failure modes covered

Each of the four cases below was observed on real-world .one files where parsing currently aborts even though the rest of the document is well-formed:

1. ValueError: cannot fit 'int' into an offset-sized integer

FileNodeList.__init__ calls file.seek(file_chunk_reference.stp). When stp is the result of an unsigned read but the underlying file object treats it as a signed off_t, the seek raises before the body is even read. Fix: catch OverflowError / ValueError / OSError and treat the chunk as an empty list.

2. struct.error: unpack requires a buffer of N bytes — header

ObjectSpaceObjectStreamOfIDs.__init__ reads ObjectSpaceObjectStreamHeader eagerly. On a truncated stream this raises before self.header is ever assigned, so callers checking self.header.OsidStreamNotPresent or self.header.ExtendedStreamsPresent would also crash. Fix: catch struct.error and synthesize an empty header (Count=0, OsidStreamNotPresent=True, ExtendedStreamsPresent=False).

3. struct.error: unpack requires a buffer of 4 bytes — body / 2 bytes — PropertySet

ObjectSpaceObjectStreamOfIDs body loop and PropertySet.__init__'s cProperties read can both run past EOF on truncated streams. Fix: catch struct.error and break / treat the set as empty.

4. KeyError in CompactID.__str__ / __repr__

CompactID.__str__ does document._global_identification_table[current_revision][guidIndex]. Two real-world cases miss the lookup:

  • guidIndex == 0xFFFFFF (16777215) — the documented "invalid" sentinel ([MS-ONESTORE] CompactID).
  • Cross-revision references whose target table is not yet populated.

Both currently raise KeyError from inside __str__/__repr__, which propagates out of the JSON serialization and aborts the whole document. Fix: factor the lookup into _resolve_guid() and return '<unresolved guidIndex=0xNNNNNN>' on miss, so the rest of the JSON output is still produced.

Scope / non-goals

Verification

  • ast.parse on the modified file passes.
  • On a corpus of real-world .one files that previously aborted with each of the four error messages above, parsing now completes and produces the expected JSON output (pages, properties, embedded files). Files that already parsed cleanly continue to produce byte-identical JSON for their headers / properties / files structure.

Several read sites in FileNode.py read fixed-size structs (CompactID,
ObjectSpaceObjectStreamHeader, PropertySet header, FileNodeList stp seek)
without guarding against truncated input. On real-world OneNote files this
is hit by:

  * ValueError: cannot fit 'int' into an offset-sized integer
      — FileNodeList.__init__ calls file.seek(stp) where stp is the result
        of an unsigned read but the underlying file object treats it as a
        signed off_t.
  * struct.error: unpack requires a buffer of N bytes
      — ObjectSpaceObjectStreamOfIDs.body or its header, or PropertySet's
        cProperties, read past EOF.
  * KeyError on document._global_identification_table[...][guidIndex]
      — CompactID.__str__/__repr__ resolves a guidIndex that is the
        documented 0xFFFFFF "invalid" sentinel, or a cross-revision
        reference whose table is not populated yet.

Each of these previously aborted parsing of the entire document, even
when only one inner structure was malformed. This change keeps the happy
path identical and only adds local recovery:

  * FileNodeList: out-of-range stp → empty list, do not seek.
  * ObjectSpaceObjectStreamOfIDs: truncated header → synthetic empty
    header (Count=0, OsidStreamNotPresent=True); truncated body → break.
  * PropertySet: truncated cProperties → empty set (cProperties=0).
  * CompactID: missing guid table entry → '<unresolved guidIndex=0x...>'
    placeholder string instead of KeyError.

No spec-defined behavior changes for well-formed input.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant