Make property/file content stringification type-safe by orangeruan128 · Pull Request #26 · DissectMalware/pyOneNote

orangeruan128 · 2026-04-20T02:28:47Z

Summary

Make .hex() calls on property/file payloads type-safe so a single corrupted property cannot abort parsing of an entire .one file.

Three call sites currently assume the payload is bytes/bytearray and call .hex() directly:

pyOneNote/FileNode.py — FileDataStoreObject.__str__ (line ~559)
pyOneNote/FileNode.py — PropertySet.get_properties() UTF-16 decode fallback (line ~671)
pyOneNote/FileNode.py — PrtFourBytesOfLengthFollowedByData.__str__ (line ~762)
pyOneNote/OneDocument.py — get_json() file-contents serialization (line ~73)

Symptom

On real-world .one files (large notebooks with many revisions, embedded files, or property variants), the payload at one of these sites can already be a str by the time stringification happens — typically because an upstream code path decoded a UTF-16 buffer in place, or a parser fallback returned a hex string. When that happens the call raises:

AttributeError: 'str' object has no attribute 'hex'

…which propagates out of process_onenote_file and stops parsing entirely, even though only a single property is malformed.

Fix

Keep .hex() as the path for bytes/bytearray (no behavior change for the common case). Add explicit branches for the two non-bytes cases that have been observed in the wild:

str → pass through (or encode to bytes first in the OneDocument.get_json() case so the JSON output stays hex-only)
anything else → repr(...) so the parser produces a readable placeholder instead of crashing

Scope / non-goals

No new functionality.
No changes to parser semantics for well-formed input.
Does not touch the issues already covered by Implements candidate code for pocessing ArrayOfPropertyValues #16 (ArrayOfPropertyValues) or Ensure FileNode.data is initialized in __init__ and is not used if it… #20 (FileNode.data initialization). Those are separate, complementary fixes; this PR is intentionally narrow so it can be reviewed and merged on its own.

Verification

ast.parse on both modified files passes.
After applying this change on top of main, parsing of a corpus of OneNote files that previously failed on 'str' object has no attribute 'hex' completes successfully and produces the expected JSON / extracted file output.
Files that already parsed cleanly continue to produce byte-identical JSON for the headers/properties/files structure.

PrtFourBytesOfLengthFollowedByData.Data, FileDataStoreObject.FileData, and the OneDocument.get_json() file contents are documented to be byte buffers, but on real-world OneNote files they can already be Python strs by the time stringification happens (typically because earlier code paths decoded a UTF-16 buffer in-place, or a parser fallback returned a hex string). When that happens, calling .hex() raises: AttributeError: 'str' object has no attribute 'hex' This change keeps .hex() as the path for bytes/bytearray (no behavior change), and adds explicit fallbacks for str (pass through, or encode to bytes for the JSON file-contents case) and other types (repr). No new functionality, just defensive type coercion so a single corrupt property does not abort parsing of the whole document.

This was referenced Apr 20, 2026

Ensure FileNode.data is initialized in __init__ and is not used if it… #20

Open

Implements candidate code for pocessing ArrayOfPropertyValues #16

Open

Tolerate truncated / partially-corrupt FSSHTTPB streams #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make property/file content stringification type-safe#26

Make property/file content stringification type-safe#26
orangeruan128 wants to merge 1 commit intoDissectMalware:mainfrom
orangeruan128:fix/property-value-type-safety

orangeruan128 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

orangeruan128 commented Apr 20, 2026

Summary

Symptom

Fix

Scope / non-goals

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant