Skip to content

[Bug]+security: cap XML/archive reads to bound memory use #1341

@reddraconi

Description

@reddraconi

Checklist

  • I am using an up-to-date version.
  • I have read the documentation.
  • I have searched existing issues.

TagStudio Version

9.5.6

Operating System & Version

Fedora 43 x86_64

Description

There's a few spots where we don't have guardrails that could lead to TagStudio crashing or chewing up all of a system's memory if something malicious or malformed is encountered. Specifically in XML- and archive-based formats.

I understand this is a local app and it's the user's fault if they knowingly feed it bad data.

Since TagStudio uses Python's native xml.etree.ElementTree to parse XML, that leaves it open to Billion Laughs and XXE attacks. Several callers read an untrusted length straight into f.read() or archive.read() before parsing the file, so even a better parser like defusedxml wouldn't save us.

This needs a two-part fix:

  1. Swap xml.etree.ElementTree for defusedxml.ElementTree anywhere we touch untrusted XML.
  2. Size-gate those reads with a sane cap.

This is currently a problem in:

  1. ePub/CB* ComicInfo.xml and _epub_cover
  2. Legit .mpd file headers are a max of 32-bits, but have an unbounded read
  3. Legit .pdn file headers are a max of 24-bits, but have an unbounded read
  4. DupeGuru XML results are only bounded by system memory.

PR coming with more details and tests.

Expected Behavior

Parse XML and archives without being vulnerable to Billion Laughs / XXE / OOM issues.

Steps to Reproduce

A maliciously crafted .mdp:

  import struct
  # Magic + bin_header claiming a 1 GiB XML header that doesn't exist.
  with open("/tmp/bomb.mdp", "wb") as f:
      f.write(b"mdipack\x00")
      f.write(struct.pack("<LLL", 0, 1024 * 1024 * 1024, 0))

A maliciously crafted .pdn:

  # Magic "PDN3" + 24-bit little-endian header_size. The format caps the
  # field at ~16 MiB (0xFFFFFF), so the worst case is bounded by the spec,
  # but that's still 16-ish MiB allocated before defusedxml runs, and
  # nothing stops a user from receiving a file that hits that ceiling.
  declared = 0xFFFFFF  # ~16 MiB - 1, the format max
  with open("/tmp/bomb.pdn", "wb") as f:
      f.write(b"PDN3")
      f.write(declared.to_bytes(3, "little"))

Drop one or both into TagStudio and trigger thumbnail generation. TagStudio will happily chomp down all of the memory for those files.

Logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugSomething isn't working as intended

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions