[Bug]+security: cap XML/archive reads to bound memory use

### Checklist

- [x] I am using an up-to-date version.
- [x] I have read the [documentation](https://github.com/TagStudioDev/TagStudio/blob/main/docs/index.md).
- [x] I have searched existing [issues](https://github.com/TagStudioDev/TagStudio/issues).

### TagStudio Version

9.5.6

### Operating System & Version

Fedora 43 x86_64

### Description

There's a few spots where we don't have guardrails that could lead to TagStudio crashing or chewing up all of a system's memory if something malicious or malformed is encountered. Specifically in XML- and archive-based formats.

I understand this is a local app and it's the user's fault if they knowingly feed it bad data.

Since TagStudio uses Python's native `xml.etree.ElementTree` to parse XML, that leaves it open to Billion Laughs and XXE attacks. Several callers read an untrusted length straight into `f.read()` or `archive.read()` before parsing the file, so even a better parser like `defusedxml` wouldn't save us.

This needs a two-part fix:
1. Swap `xml.etree.ElementTree` for `defusedxml.ElementTree` anywhere we touch untrusted XML. 
2. Size-gate those reads with a sane cap.

This is currently a problem in:
1. ePub/CB* ComicInfo.xml and _epub_cover
2. Legit `.mpd` file headers are a max of 32-bits, but have an unbounded read
3. Legit `.pdn` file headers are a max of 24-bits, but have an unbounded read
4. DupeGuru XML results are only bounded by system memory.

PR coming with more details and tests.

### Expected Behavior

Parse XML and archives without being vulnerable to Billion Laughs / XXE / OOM issues.

### Steps to Reproduce

A maliciously crafted `.mdp`:

```python
  import struct
  # Magic + bin_header claiming a 1 GiB XML header that doesn't exist.
  with open("/tmp/bomb.mdp", "wb") as f:
      f.write(b"mdipack\x00")
      f.write(struct.pack("<LLL", 0, 1024 * 1024 * 1024, 0))
```
A maliciously crafted `.pdn`:

```python  
  # Magic "PDN3" + 24-bit little-endian header_size. The format caps the
  # field at ~16 MiB (0xFFFFFF), so the worst case is bounded by the spec,
  # but that's still 16-ish MiB allocated before defusedxml runs, and
  # nothing stops a user from receiving a file that hits that ceiling.
  declared = 0xFFFFFF  # ~16 MiB - 1, the format max
  with open("/tmp/bomb.pdn", "wb") as f:
      f.write(b"PDN3")
      f.write(declared.to_bytes(3, "little"))
```

Drop one or both into TagStudio and trigger thumbnail generation. TagStudio will happily chomp down all of the memory for those files.

### Logs

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]+security: cap XML/archive reads to bound memory use #1341

Checklist

TagStudio Version

Operating System & Version

Description

Expected Behavior

Steps to Reproduce

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]+security: cap XML/archive reads to bound memory use #1341

Description

Checklist

TagStudio Version

Operating System & Version

Description

Expected Behavior

Steps to Reproduce

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions