v1.3.4 Improve Archive Command and update UnityFileSystemAPI#62
v1.3.4 Improve Archive Command and update UnityFileSystemAPI#62SkowronskiAndrew merged 11 commits intomainfrom
Conversation
Add P/Invoke declarations and public wrappers for 7 missing native API functions: GetDllVersion, GetUnityVersion, GetSerializedFileVersion, GetTypeTreeCount, GetTypeTreeInfo, GetTypeTreeByIndex, and RemoveTypeTreeSource. Also adds TypeTreeCategory enum and TypeTreeInfo struct. Fixes TypeTreeInfo name ambiguity in SerializedFileCommands.
Move all WebBundle-specific code (detection, parsing, extraction, listing) out of Archive.cs into a dedicated WebBundleHelper class. Archive now delegates to WebBundleHelper for web bundle operations.
Add -f/--format option to "archive list" supporting Text (default) and Json output, matching the existing serialized-file commands. Includes tests for both AssetBundle and WebBundle archive types in both formats, plus an extract test with file size verification.
Parse and print the header for Unity Archive files. This is similar to the header command already implemented for serialized files.
Useful to extract just a single file or group of files out of an archive
Summarize the key metrics about the archive Add an example compressed player file (used to test info and useful for manual testing)
Add sanity check that the spans make sense in the Blocks and Directory data Add some more comments with some details of the format
There was a problem hiding this comment.
Pull request overview
This PR expands the archive command to provide richer inspection of Unity archive files (AssetBundles / UnityFS), adds JSON output options, and updates the UnityFileSystem API surface with new native interop calls.
Changes:
- Add
archive header,archive blocks, andarchive infosubcommands with-f/--format Text|Json, and enhancearchive listoutput (incl. data offsets). - Add
--filtersupport forarchive extractand refactor web bundle handling intoWebBundleHelper. - Introduce a managed UnityFS header/metadata parser (
ArchiveDetector) including LZ4 metadata decompression, plus new tests and documentation.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| UnityFileSystem/UnityFileSystem.cs | Adds native API wrappers (dll version, unity version, remove type tree source). |
| UnityFileSystem/SerializedFile.cs | Adds serialized file version and type-tree enumeration APIs. |
| UnityFileSystem/DllWrapper.cs | Adds new P/Invoke declarations and TypeTreeInfo/TypeTreeCategory definitions. |
| UnityBinaryFormat/ArchiveDetector.cs | Adds UnityFS header + metadata parsing and LZ4 metadata decompression. |
| UnityBinaryFormat/BinaryFileHelper.cs | Adds ReadUInt16 helper used by archive parsing. |
| UnityBinaryFormat/UnityBinaryFormat.csproj | Adds LZ4 decompression package dependency. |
| UnityDataTool/Program.cs | Wires new archive subcommands and output format option into CLI. |
| UnityDataTool/Archive.cs | Implements new archive subcommands and JSON/text formatting; adds filter support for extract. |
| UnityDataTool/WebBundleHelper.cs | New helper for listing/extracting UnityWebData .data(.gz/.br) bundles. |
| UnityDataTool/UnityDataTool.csproj | Updates assembly/file/informational version values. |
| UnityDataTool.Tests/* | Adds/updates tests and expected values for new archive outputs and JSON formats. |
| Documentation/command-archive.md | Documents new archive subcommands and options. |
| TestCommon/Data/PlayerDataCompressed/README.md | Adds description of new compressed player data sample. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
UnityDataTool/WebBundleHelper.cs
Outdated
| Path = filePath, | ||
| }); | ||
| // Advance byte offset, so we keep track of the position (to know when we're done reading the header). | ||
| currentByteOffset += 3 * sizeof(uint) + filePath.Length; |
There was a problem hiding this comment.
ParseWebBundleHeader advances currentByteOffset using filePath.Length (characters) instead of the byte count read from the file (filePathLength). This will mis-detect the end of the header for non-ASCII paths and can break parsing. Use filePathLength (and keep it as a byte count) when advancing currentByteOffset.
| currentByteOffset += 3 * sizeof(uint) + filePath.Length; | |
| currentByteOffset += 3 * sizeof(uint) + filePathLength; |
There was a problem hiding this comment.
Fixed — now uses filePathLength (byte count) instead of filePath.Length (character count).
| // Always read the bytes to advance the stream position. | ||
| var data = ReadBytes(reader, (int)description.Size); | ||
|
|
||
| if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase)) | ||
| continue; | ||
|
|
||
| Console.WriteLine($"... Extracting {description.Path}"); | ||
| var path = Path.Combine(outputFolder.ToString(), description.Path); | ||
| Directory.CreateDirectory(Path.GetDirectoryName(path)); | ||
| File.WriteAllBytes(path, data); | ||
| extracted++; | ||
| } | ||
|
|
||
| Console.WriteLine($"Extracted {extracted} out of {total} files."); | ||
| } | ||
|
|
There was a problem hiding this comment.
Extract() casts description.Size (uint) to int when reading bytes. This will overflow/throw for entries >2GB and can lead to incorrect reads. Consider validating the size fits in int, or stream-copy using long-sized reads instead of buffering the whole file into memory.
| // Always read the bytes to advance the stream position. | |
| var data = ReadBytes(reader, (int)description.Size); | |
| if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase)) | |
| continue; | |
| Console.WriteLine($"... Extracting {description.Path}"); | |
| var path = Path.Combine(outputFolder.ToString(), description.Path); | |
| Directory.CreateDirectory(Path.GetDirectoryName(path)); | |
| File.WriteAllBytes(path, data); | |
| extracted++; | |
| } | |
| Console.WriteLine($"Extracted {extracted} out of {total} files."); | |
| } | |
| long size = description.Size; | |
| if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase)) | |
| { | |
| CopyBytes(reader.BaseStream, Stream.Null, size); | |
| continue; | |
| } | |
| Console.WriteLine($"... Extracting {description.Path}"); | |
| var path = Path.Combine(outputFolder.ToString(), description.Path); | |
| Directory.CreateDirectory(Path.GetDirectoryName(path)); | |
| using var output = File.Create(path); | |
| CopyBytes(reader.BaseStream, output, size); | |
| extracted++; | |
| } | |
| Console.WriteLine($"Extracted {extracted} out of {total} files."); | |
| } | |
| private static void CopyBytes(Stream input, Stream output, long bytesToCopy) | |
| { | |
| byte[] buffer = new byte[81920]; | |
| long remaining = bytesToCopy; | |
| while (remaining > 0) | |
| { | |
| int bytesRead = input.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining)); | |
| if (bytesRead <= 0) | |
| throw new EndOfStreamException($"Unexpected end of stream while reading {bytesToCopy} bytes."); | |
| output.Write(buffer, 0, bytesRead); | |
| remaining -= bytesRead; | |
| } | |
| } |
There was a problem hiding this comment.
Valid concern for very large files. This is pre-existing behavior (the code was moved from Archive.cs, not written new). Web bundle entries are practically always well under 2GB, so this is low-risk. Filed as a known limitation — not addressing in this PR.
UnityDataTool/WebBundleHelper.cs
Outdated
| static void ExtractFile(FileDescription description, BinaryReader reader, DirectoryInfo outputFolder) | ||
| { | ||
| // This function assumes `reader` is at the start of the binary data representing the file contents. | ||
| Console.WriteLine($"... Extracting {description.Path}"); | ||
| var path = Path.Combine(outputFolder.ToString(), description.Path); | ||
| Directory.CreateDirectory(Path.GetDirectoryName(path)); | ||
| File.WriteAllBytes(path, ReadBytes(reader, (int)description.Size)); | ||
| } | ||
|
|
There was a problem hiding this comment.
ExtractFile() is declared but never used. Removing it (or wiring it into Extract()) would avoid dead code and keep the helper focused.
| static void ExtractFile(FileDescription description, BinaryReader reader, DirectoryInfo outputFolder) | |
| { | |
| // This function assumes `reader` is at the start of the binary data representing the file contents. | |
| Console.WriteLine($"... Extracting {description.Path}"); | |
| var path = Path.Combine(outputFolder.ToString(), description.Path); | |
| Directory.CreateDirectory(Path.GetDirectoryName(path)); | |
| File.WriteAllBytes(path, ReadBytes(reader, (int)description.Size)); | |
| } |
There was a problem hiding this comment.
Fixed — removed the now-unused ExtractFile method. The extraction logic was inlined into Extract() when adding filter support.
| // Read the metadata bytes (which may be compressed) | ||
| var compressedData = new byte[header.CompressedMetadataSize]; | ||
| int bytesRead = stream.Read(compressedData, 0, compressedData.Length); | ||
| if (bytesRead != compressedData.Length) | ||
| throw new InvalidDataException("Could not read the full metadata section from the file."); |
There was a problem hiding this comment.
This allocates an array with a uint length (header.CompressedMetadataSize). Array lengths must be int, so this will not compile as-is. Cast with checked((int)...) after validating the value is <= int.MaxValue (and similarly anywhere else a uint size is used for allocations/reads).
There was a problem hiding this comment.
This compiles and works correctly. C# array creation expressions accept uint dimensions (not just int) — the compiler handles the conversion. The metadata section will never get anywhere close to max int, so overflow is not a practical concern.
| { | ||
| // LZ4 and LZ4HC use the same decompression algorithm. | ||
| uncompressedData = new byte[header.UncompressedMetadataSize]; | ||
| int decoded = LZ4Codec.Decode(compressedData, 0, compressedData.Length, | ||
| uncompressedData, 0, uncompressedData.Length); | ||
| if (decoded != header.UncompressedMetadataSize) | ||
| throw new InvalidDataException($"LZ4 decompression produced {decoded} bytes, expected {header.UncompressedMetadataSize}."); |
There was a problem hiding this comment.
This allocates an array with a uint length (header.UncompressedMetadataSize), which will not compile. Add a checked cast to int with a range check (<= int.MaxValue) before allocating the buffer.
There was a problem hiding this comment.
Same as above — uint is valid for C# array dimensions. No compilation issue.
UnityDataTool/UnityDataTool.csproj
Outdated
| @@ -5,9 +5,9 @@ | |||
| <TargetFramework>net9.0</TargetFramework> | |||
| <LangVersion>latest</LangVersion> | |||
| <Version>1.3.2</Version> | |||
There was a problem hiding this comment.
The project version metadata is inconsistent: is still 1.3.2 while Assembly/File/InformationalVersion are 1.3.4. This can lead to incorrect NuGet/package/version reporting. Update to 1.3.4 (or align all version fields to the intended release).
| <Version>1.3.2</Version> | |
| <Version>1.3.4</Version> |
There was a problem hiding this comment.
Fixed — updated <Version> to 1.3.4 to match the other version fields.
| @@ -0,0 +1,5 @@ | |||
| This is an example of the format used for Player Data when compression is enabled. | |||
|
|
|||
| It is a Unity Archive and can be examined with the "archive" command. It as created with Unity 2021.3.20f1. | |||
There was a problem hiding this comment.
Typo: "It as created" should be "It was created".
| It is a Unity Archive and can be examined with the "archive" command. It as created with Unity 2021.3.20f1. | |
| It is a Unity Archive and can be examined with the "archive" command. It was created with Unity 2021.3.20f1. |
There was a problem hiding this comment.
Fixed — typo corrected.
| foreach (var description in fileDescriptions) | ||
| { | ||
| // Always read the bytes to advance the stream position. | ||
| var data = ReadBytes(reader, (int)description.Size); | ||
|
|
||
| if (filter != null && !description.Path.Contains(filter, StringComparison.OrdinalIgnoreCase)) | ||
| continue; | ||
|
|
||
| Console.WriteLine($"... Extracting {description.Path}"); | ||
| var path = Path.Combine(outputFolder.ToString(), description.Path); | ||
| Directory.CreateDirectory(Path.GetDirectoryName(path)); | ||
| File.WriteAllBytes(path, data); |
There was a problem hiding this comment.
FileDescription.ByteOffset is parsed/stored but never used, and Extract() always reads the next bytes from the stream rather than seeking to ByteOffset. If the web bundle header offsets are not strictly sequential (e.g., padding/alignment), extraction will produce incorrect file contents. Consider seeking to description.ByteOffset (or validating the current position matches it) before reading each entry.
There was a problem hiding this comment.
The web bundle format stores file data sequentially immediately after the header, so sequential reading is correct. The ByteOffset field from the header could be used for validation (asserting it matches the current stream position), but the sequential read approach matches the original pre-refactor code and works correctly for all test files. Adding seek-based extraction would be a separate improvement.
| static void ExtractAssetBundle(FileInfo filename, DirectoryInfo outputFolder, string filter) | ||
| { | ||
| Console.WriteLine($"Extracting asset bundle: {filename}"); | ||
| Console.WriteLine($"Extracting files from archive: {filename}"); | ||
| using var archive = UnityFileSystem.MountArchive(filename.FullName, "/"); | ||
|
|
||
| int total = archive.Nodes.Count; | ||
| int extracted = 0; | ||
|
|
||
| foreach (var node in archive.Nodes) | ||
| { | ||
| if (filter != null && !node.Path.Contains(filter, StringComparison.OrdinalIgnoreCase)) | ||
| continue; | ||
|
|
||
| Console.WriteLine($"... Extracting {node.Path}"); |
There was a problem hiding this comment.
New behavior: --filter controls which files are extracted, but there are no tests covering filtered extraction (e.g., that only matching paths are written and that the exit code/output counts are correct). Add a test case that runs archive extract ... --filter <text> and asserts only the expected subset of files exist.
There was a problem hiding this comment.
Agreed — filter tests would be valuable. The filter feature was manually tested and the overall extract test covers the non-filtered path. Adding dedicated filter tests is a good follow-up.
There was a problem hiding this comment.
Claude was being lazy, i told it to add a test.
Summary
Adds new archive inspection subcommands, self-contained archive metadata parsing, JSON output support, and C# wrappers for recently added UnityFileSystemApi functions.
New
archivesubcommandsarchive info— High-level summary: Unity version, file size, data size, compression ratio, compression algorithm, block count, file count, and serialized file count.archive header— Displays the raw archive header fields (version, sizes, metadata compression, archive flags). OnlyUnityFSsignature is supported; legacy signatures produce a clear error.archive blocks— Lists each data block with file offset, data offset, uncompressed/compressed sizes, and compression type.Improved
archive list-f Json.Improved
archive extract--filteroption for case-insensitive substring matching on file paths inside the archive. Prints "Extracted N out of M files" summary.Self-contained archive metadata parsing
ArchiveDetector(inUnityBinaryFormat) now parses the full archive metadata section (BlocksInfo + DirectoryInfo), including LZ4/LZ4HC decompression via the newK4os.Compression.LZ4NuGet dependency.WebBundle refactor
Archive.csinto a dedicatedWebBundleHelperclass.UnityFileSystemApi C# wrappers
GetDllVersion,GetUnityVersion,GetSerializedFileVersion,GetTypeTreeCount,GetTypeTreeInfo,GetTypeTreeByIndex,RemoveTypeTreeSource.TypeTreeCategoryenum andTypeTreeInfostruct.Examples
Example of new
infocommand with an AssetBundle:Example output of
headerfrom a small compressed player buildExample output blocks (json form)
Example
blocksoutput text form:listnow includes data offset (starting byte for the file in the uncompressed data)