Skip to content

Make property/file content stringification type-safe#26

Open
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
orangeruan128:fix/property-value-type-safety
Open

Make property/file content stringification type-safe#26
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
orangeruan128:fix/property-value-type-safety

Conversation

@orangeruan128
Copy link
Copy Markdown

Summary

Make .hex() calls on property/file payloads type-safe so a single corrupted property cannot abort parsing of an entire .one file.

Three call sites currently assume the payload is bytes/bytearray and call .hex() directly:

  • pyOneNote/FileNode.pyFileDataStoreObject.__str__ (line ~559)
  • pyOneNote/FileNode.pyPropertySet.get_properties() UTF-16 decode fallback (line ~671)
  • pyOneNote/FileNode.pyPrtFourBytesOfLengthFollowedByData.__str__ (line ~762)
  • pyOneNote/OneDocument.pyget_json() file-contents serialization (line ~73)

Symptom

On real-world .one files (large notebooks with many revisions, embedded files, or property variants), the payload at one of these sites can already be a str by the time stringification happens — typically because an upstream code path decoded a UTF-16 buffer in place, or a parser fallback returned a hex string. When that happens the call raises:

AttributeError: 'str' object has no attribute 'hex'

…which propagates out of process_onenote_file and stops parsing entirely, even though only a single property is malformed.

Fix

Keep .hex() as the path for bytes/bytearray (no behavior change for the common case). Add explicit branches for the two non-bytes cases that have been observed in the wild:

  • str → pass through (or encode to bytes first in the OneDocument.get_json() case so the JSON output stays hex-only)
  • anything else → repr(...) so the parser produces a readable placeholder instead of crashing

Scope / non-goals

Verification

  • ast.parse on both modified files passes.
  • After applying this change on top of main, parsing of a corpus of OneNote files that previously failed on 'str' object has no attribute 'hex' completes successfully and produces the expected JSON / extracted file output.
  • Files that already parsed cleanly continue to produce byte-identical JSON for the headers/properties/files structure.

PrtFourBytesOfLengthFollowedByData.Data, FileDataStoreObject.FileData,
and the OneDocument.get_json() file contents are documented to be byte
buffers, but on real-world OneNote files they can already be Python
strs by the time stringification happens (typically because earlier
code paths decoded a UTF-16 buffer in-place, or a parser fallback
returned a hex string). When that happens, calling .hex() raises:

    AttributeError: 'str' object has no attribute 'hex'

This change keeps .hex() as the path for bytes/bytearray (no behavior
change), and adds explicit fallbacks for str (pass through, or encode
to bytes for the JSON file-contents case) and other types (repr).

No new functionality, just defensive type coercion so a single corrupt
property does not abort parsing of the whole document.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant