Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 14 additions & 9 deletions verify_nzb.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,19 +115,24 @@ def _parse_yenc_attrs(line: bytes) -> dict[str, str]:
return attrs


_YENC_TRANS = bytes((i - 42) % 256 for i in range(256))


def _decode_yenc_lines(lines: Iterable[bytes]) -> bytes:
"""Decodes yEnc lines efficiently using C-level string operations."""
decoded = bytearray()
for line in lines:
index = 0
while index < len(line):
byte = line[index]
if byte == 61:
index += 1
if index >= len(line):
if b"=" not in line:
decoded.extend(line.translate(_YENC_TRANS))
else:
parts = line.split(b"=")
decoded.extend(parts[0].translate(_YENC_TRANS))
for part in parts[1:]:
if not part:
raise ValueError("dangling yEnc escape")
byte = (line[index] - 64) % 256
decoded.append((byte - 42) % 256)
index += 1
decoded.append((part[0] - 106) % 256)
if len(part) > 1:
decoded.extend(part[1:].translate(_YENC_TRANS))
Comment on lines +125 to +135
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical | ⚑ Quick win

Regression: split(b"=") mishandles escaped = characters.

When the byte value 211 (0xD3) is escaped in yEnc, it produces == in the encoded stream. The current split-based approach incorrectly treats this as a dangling escape:

>>> b"a==b".split(b"=")
[b'a', b'', b'b']  # parts[1] is empty β†’ raises ValueError

The original per-byte loop would correctly decode this by reading the second = as the escaped byte value (0x3D), decoding it to 211, then continuing.

Use find to locate escapes while preserving C-level performance:

πŸ› Proposed fix using find-based parsing
 def _decode_yenc_lines(lines: Iterable[bytes]) -> bytes:
     """Decodes yEnc lines efficiently using C-level string operations."""
     decoded = bytearray()
     for line in lines:
-        if b"=" not in line:
-            decoded.extend(line.translate(_YENC_TRANS))
-        else:
-            parts = line.split(b"=")
-            decoded.extend(parts[0].translate(_YENC_TRANS))
-            for part in parts[1:]:
-                if not part:
-                    raise ValueError("dangling yEnc escape")
-                decoded.append((part[0] - 106) % 256)
-                if len(part) > 1:
-                    decoded.extend(part[1:].translate(_YENC_TRANS))
+        pos = 0
+        while True:
+            escape_pos = line.find(b"=", pos)
+            if escape_pos == -1:
+                decoded.extend(line[pos:].translate(_YENC_TRANS))
+                break
+            decoded.extend(line[pos:escape_pos].translate(_YENC_TRANS))
+            if escape_pos + 1 >= len(line):
+                raise ValueError("dangling yEnc escape")
+            decoded.append((line[escape_pos + 1] - 106) % 256)
+            pos = escape_pos + 2
     return bytes(decoded)
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@verify_nzb.py` around lines 125 - 135, The split(b"=")-based yEnc decode
erroneously treats doubled '=' (escaped 0xD3) as a dangling escape; update the
parsing in verify_nzb.py (the block using decoded, _YENC_TRANS and handling
parts = line.split(b"=")) to a find()-driven loop that scans for b"=" positions,
copies non-escape spans through .translate(_YENC_TRANS), and when an '=' is
found consumes the following byte (ensuring it's present) to compute ((next_byte
- 106) % 256) and append it, then continue after that byte; this preserves
C-level performance and correctly decodes sequences like b"==".

return bytes(decoded)


Expand Down