Checklist
TagStudio Version
9.5.6
Operating System & Version
Fedora 43 x86_64
Description
There's a few spots where we don't have guardrails that could lead to TagStudio crashing or chewing up all of a system's memory if something malicious or malformed is encountered. Specifically in XML- and archive-based formats.
I understand this is a local app and it's the user's fault if they knowingly feed it bad data.
Since TagStudio uses Python's native xml.etree.ElementTree to parse XML, that leaves it open to Billion Laughs and XXE attacks. Several callers read an untrusted length straight into f.read() or archive.read() before parsing the file, so even a better parser like defusedxml wouldn't save us.
This needs a two-part fix:
- Swap
xml.etree.ElementTree for defusedxml.ElementTree anywhere we touch untrusted XML.
- Size-gate those reads with a sane cap.
This is currently a problem in:
- ePub/CB* ComicInfo.xml and _epub_cover
- Legit
.mpd file headers are a max of 32-bits, but have an unbounded read
- Legit
.pdn file headers are a max of 24-bits, but have an unbounded read
- DupeGuru XML results are only bounded by system memory.
PR coming with more details and tests.
Expected Behavior
Parse XML and archives without being vulnerable to Billion Laughs / XXE / OOM issues.
Steps to Reproduce
A maliciously crafted .mdp:
import struct
# Magic + bin_header claiming a 1 GiB XML header that doesn't exist.
with open("/tmp/bomb.mdp", "wb") as f:
f.write(b"mdipack\x00")
f.write(struct.pack("<LLL", 0, 1024 * 1024 * 1024, 0))
A maliciously crafted .pdn:
# Magic "PDN3" + 24-bit little-endian header_size. The format caps the
# field at ~16 MiB (0xFFFFFF), so the worst case is bounded by the spec,
# but that's still 16-ish MiB allocated before defusedxml runs, and
# nothing stops a user from receiving a file that hits that ceiling.
declared = 0xFFFFFF # ~16 MiB - 1, the format max
with open("/tmp/bomb.pdn", "wb") as f:
f.write(b"PDN3")
f.write(declared.to_bytes(3, "little"))
Drop one or both into TagStudio and trigger thumbnail generation. TagStudio will happily chomp down all of the memory for those files.
Logs
No response
Checklist
TagStudio Version
9.5.6
Operating System & Version
Fedora 43 x86_64
Description
There's a few spots where we don't have guardrails that could lead to TagStudio crashing or chewing up all of a system's memory if something malicious or malformed is encountered. Specifically in XML- and archive-based formats.
I understand this is a local app and it's the user's fault if they knowingly feed it bad data.
Since TagStudio uses Python's native
xml.etree.ElementTreeto parse XML, that leaves it open to Billion Laughs and XXE attacks. Several callers read an untrusted length straight intof.read()orarchive.read()before parsing the file, so even a better parser likedefusedxmlwouldn't save us.This needs a two-part fix:
xml.etree.ElementTreefordefusedxml.ElementTreeanywhere we touch untrusted XML.This is currently a problem in:
.mpdfile headers are a max of 32-bits, but have an unbounded read.pdnfile headers are a max of 24-bits, but have an unbounded readPR coming with more details and tests.
Expected Behavior
Parse XML and archives without being vulnerable to Billion Laughs / XXE / OOM issues.
Steps to Reproduce
A maliciously crafted
.mdp:A maliciously crafted
.pdn:Drop one or both into TagStudio and trigger thumbnail generation. TagStudio will happily chomp down all of the memory for those files.
Logs
No response