Tolerate truncated / partially-corrupt FSSHTTPB streams#27
Open
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
Open
Tolerate truncated / partially-corrupt FSSHTTPB streams#27orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
Conversation
Several read sites in FileNode.py read fixed-size structs (CompactID,
ObjectSpaceObjectStreamHeader, PropertySet header, FileNodeList stp seek)
without guarding against truncated input. On real-world OneNote files this
is hit by:
* ValueError: cannot fit 'int' into an offset-sized integer
— FileNodeList.__init__ calls file.seek(stp) where stp is the result
of an unsigned read but the underlying file object treats it as a
signed off_t.
* struct.error: unpack requires a buffer of N bytes
— ObjectSpaceObjectStreamOfIDs.body or its header, or PropertySet's
cProperties, read past EOF.
* KeyError on document._global_identification_table[...][guidIndex]
— CompactID.__str__/__repr__ resolves a guidIndex that is the
documented 0xFFFFFF "invalid" sentinel, or a cross-revision
reference whose table is not populated yet.
Each of these previously aborted parsing of the entire document, even
when only one inner structure was malformed. This change keeps the happy
path identical and only adds local recovery:
* FileNodeList: out-of-range stp → empty list, do not seek.
* ObjectSpaceObjectStreamOfIDs: truncated header → synthetic empty
header (Count=0, OsidStreamNotPresent=True); truncated body → break.
* PropertySet: truncated cProperties → empty set (cProperties=0).
* CompactID: missing guid table entry → '<unresolved guidIndex=0x...>'
placeholder string instead of KeyError.
No spec-defined behavior changes for well-formed input.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add local error recovery to four read sites in
FileNode.pyso a single truncated or partially-corrupt FSSHTTPB structure no longer aborts parsing of the entire document. No behavior change for well-formed input.Failure modes covered
Each of the four cases below was observed on real-world
.onefiles where parsing currently aborts even though the rest of the document is well-formed:1.
ValueError: cannot fit 'int' into an offset-sized integerFileNodeList.__init__callsfile.seek(file_chunk_reference.stp). Whenstpis the result of an unsigned read but the underlying file object treats it as a signedoff_t, the seek raises before the body is even read. Fix: catchOverflowError/ValueError/OSErrorand treat the chunk as an empty list.2.
struct.error: unpack requires a buffer of N bytes— headerObjectSpaceObjectStreamOfIDs.__init__readsObjectSpaceObjectStreamHeadereagerly. On a truncated stream this raises beforeself.headeris ever assigned, so callers checkingself.header.OsidStreamNotPresentorself.header.ExtendedStreamsPresentwould also crash. Fix: catchstruct.errorand synthesize an empty header (Count=0,OsidStreamNotPresent=True,ExtendedStreamsPresent=False).3.
struct.error: unpack requires a buffer of 4 bytes— body /2 bytes— PropertySetObjectSpaceObjectStreamOfIDsbody loop andPropertySet.__init__'scPropertiesread can both run past EOF on truncated streams. Fix: catchstruct.errorand break / treat the set as empty.4.
KeyErrorinCompactID.__str__/__repr__CompactID.__str__doesdocument._global_identification_table[current_revision][guidIndex]. Two real-world cases miss the lookup:guidIndex == 0xFFFFFF(16777215) — the documented "invalid" sentinel ([MS-ONESTORE] CompactID).Both currently raise
KeyErrorfrom inside__str__/__repr__, which propagates out of the JSON serialization and aborts the whole document. Fix: factor the lookup into_resolve_guid()and return'<unresolved guidIndex=0xNNNNNN>'on miss, so the rest of the JSON output is still produced.Scope / non-goals
pyOneNote/FileNode.py.Verification
ast.parseon the modified file passes..onefiles that previously aborted with each of the four error messages above, parsing now completes and produces the expected JSON output (pages, properties, embedded files). Files that already parsed cleanly continue to produce byte-identical JSON for their headers / properties / files structure.