Make property/file content stringification type-safe#26
Open
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
Open
Make property/file content stringification type-safe#26orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
Conversation
PrtFourBytesOfLengthFollowedByData.Data, FileDataStoreObject.FileData,
and the OneDocument.get_json() file contents are documented to be byte
buffers, but on real-world OneNote files they can already be Python
strs by the time stringification happens (typically because earlier
code paths decoded a UTF-16 buffer in-place, or a parser fallback
returned a hex string). When that happens, calling .hex() raises:
AttributeError: 'str' object has no attribute 'hex'
This change keeps .hex() as the path for bytes/bytearray (no behavior
change), and adds explicit fallbacks for str (pass through, or encode
to bytes for the JSON file-contents case) and other types (repr).
No new functionality, just defensive type coercion so a single corrupt
property does not abort parsing of the whole document.
This was referenced Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make
.hex()calls on property/file payloads type-safe so a single corrupted property cannot abort parsing of an entire.onefile.Three call sites currently assume the payload is
bytes/bytearrayand call.hex()directly:pyOneNote/FileNode.py—FileDataStoreObject.__str__(line ~559)pyOneNote/FileNode.py—PropertySet.get_properties()UTF-16 decode fallback (line ~671)pyOneNote/FileNode.py—PrtFourBytesOfLengthFollowedByData.__str__(line ~762)pyOneNote/OneDocument.py—get_json()file-contents serialization (line ~73)Symptom
On real-world
.onefiles (large notebooks with many revisions, embedded files, or property variants), the payload at one of these sites can already be astrby the time stringification happens — typically because an upstream code path decoded a UTF-16 buffer in place, or a parser fallback returned a hex string. When that happens the call raises:…which propagates out of
process_onenote_fileand stops parsing entirely, even though only a single property is malformed.Fix
Keep
.hex()as the path forbytes/bytearray(no behavior change for the common case). Add explicit branches for the two non-bytescases that have been observed in the wild:str→ pass through (or encode to bytes first in theOneDocument.get_json()case so the JSON output stays hex-only)repr(...)so the parser produces a readable placeholder instead of crashingScope / non-goals
ArrayOfPropertyValues) or Ensure FileNode.data is initialized in __init__ and is not used if it… #20 (FileNode.datainitialization). Those are separate, complementary fixes; this PR is intentionally narrow so it can be reviewed and merged on its own.Verification
ast.parseon both modified files passes.main, parsing of a corpus of OneNote files that previously failed on'str' object has no attribute 'hex'completes successfully and produces the expected JSON / extracted file output.