Skip to content

fix: handle PAX and GNU metadata entry types in TAR extraction#70

Merged
bug-ops merged 2 commits intomainfrom
fix/pax-extraction
Mar 7, 2026
Merged

fix: handle PAX and GNU metadata entry types in TAR extraction#70
bug-ops merged 2 commits intomainfrom
fix/pax-extraction

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 7, 2026

Summary

The catch-all _ arm in to_entry_type() incorrectly mapped PAX headers and GNU metadata to SecurityViolation, breaking extraction of PAX archives.

  • Skip XHeader, XGlobalHeader, GNULongName, GNULongLink as format-internal metadata
  • Map Continuous and GNUSparse to File (real data entries)
  • Apply same filtering in list.rs for extract()/list_archive() consistency

Test plan

  • Tests for all new code paths (PAX skip, GNU metadata skip, sparse, continuous, unknown byte rejection)
  • 635 tests pass, fmt/clippy clean

Closes #69

The catch-all arm in to_entry_type() incorrectly mapped PAX headers
(XHeader, XGlobalHeader) and GNU metadata (GNULongName, GNULongLink)
to SecurityViolation, breaking extraction of PAX archives.

Skip these format-internal metadata entries instead. Map Continuous
and GNUSparse to File as they are real data entries. Apply the same
filtering in list.rs for consistency.

Closes #69
@github-actions github-actions bot added the core Changes to exarch-core label Mar 7, 2026
@bug-ops bug-ops mentioned this pull request Mar 7, 2026
3 tasks
The bitwise OR in validate_path() is intentional for constant-time
processing. Suppress clippy::needless_bitwise_bool (new in Rust 1.94).
@github-actions github-actions bot added the node Node.js bindings label Mar 7, 2026
@bug-ops
Copy link
Copy Markdown
Owner Author

bug-ops commented Mar 7, 2026

Verified with the reproducer from the issue description (built from #70):

  import io, tarfile, tempfile, exarch

  buf = io.BytesIO()
  with tarfile.open(fileobj=buf, mode="w:gz") as tar:
      pax = tarfile.TarInfo(name="././@PaxHeader")
      pax.type = b"g"
      pax_data = b"16 comment=hi\n"
      pax.size = len(pax_data)
      tar.addfile(pax, io.BytesIO(pax_data))

      info = tarfile.TarInfo(name="hello.txt")
      content = b"hi"
      info.size = len(content)
      tar.addfile(info, io.BytesIO(content))

  with tempfile.TemporaryDirectory() as tmp:
      archive_path = f"{tmp}/test.tar.gz"
      with open(archive_path, "wb") as f:
          f.write(buf.getvalue())
      report = exarch.extract_archive(archive_path, tmp)
      print(report)
ExtractionReport(files=1, dirs=0, symlinks=0, bytes=2, duration=0ms, skipped=0, warnings=0)

@bug-ops bug-ops merged commit de46861 into main Mar 7, 2026
20 checks passed
@bug-ops bug-ops deleted the fix/pax-extraction branch March 7, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to exarch-core node Node.js bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: PAX archive extraction fails

1 participant