Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added Documentation/TypeTreeForPlayer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
107 changes: 107 additions & 0 deletions Documentation/unity-content-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Overview of Unity Content

This section gives an overview of the core Unity file types and how they are used in different types of builds. It also covers the important concept of "TypeTrees". This gives context for understanding what UnityDataTools can and cannot do.

## File Formats

### SerializedFile

A SerializedFile the name used for Unity's binary file format for serializing objects. It is made up of a file header,
then each Object, serialized one after another. This binary format is also available in the Editor, but typically Editor content uses the Unity YAML format instead.

The SerializedFiles in build output represent the project content, but optimized for the target platform. Unity will combine objects from multiple source assets together into files, exclude certain objects (for example editor-only objects), and potentially split or duplicate objects across multiple output files. This arrangement of objects is called the `build layout`. Because of all this transformation, there is not a one-to-one mapping between the source assets and the SerializedFiles in the build output.

### Unity Archive

An Unity Archive is a container file (similar to a zip file). Unity can `mount` this file, which makes the files inside it visible to Unity's loading system, via the Unity "Virtual File System" (VFS). Unity Archives often apply compression to the content, but it is also possible to create an uncompressed Archive.

## AssetBundles

[AssetBundles](https://docs.unity3d.com/Manual/AssetBundlesIntro.html) use the Unity Archive file format, with conventions for what to expect inside the archive. The [Addressables](https://docs.unity3d.com/Manual/com.unity.addressables.html) package uses AssetBundles, so its build output is also made up of Unity Archive files.

AssetBundles always contain at least one SerializedFile. In the case of an AssetBundle containing Scenes there will be multiple Serialized Files. AssetBundles can also contain auxiliary files, such as .resS files containing Textures and Meshes, and .resource files containing audio or video.

UnityDataTools supports opening Archive files, so it is able to analyze AssetBundles.

## Player Builds

A player build produces content as well as compiled code (assemblies, executables) and various configuration files. UnityDataTool only concerns itself with the content portion of that output.

The content compromises of the scenes in the Scene List, the contents of Resources folders, content from the Project Preferences (the "GlobalGameManagers") and also all Assets referenced from those root inputs. This translates into SerializedFiles in the build output.

The SerializedFiles are named in a predictable way. This is a very quick summary:

* Each scene in the SceneList becomes a "level" file, e.g. "level0", "level1".
* Referenced Assets shared between the Scenes becomes "sharedAssets" files, e.g. "sharedAssets0.assets", "sharedAssets1.assets".
* The contents of the Resources folder becomes "resources.assets".
* The Preferences become "globalgamemanager", "globalgamemanager.assets".

If [compression](https://docs.unity3d.com/6000.2/Documentation/ScriptReference/BuildOptions.CompressWithLz4HC.html) is enabled, the Player build will compress all the serialized files into a single Unity Archive file, called `data.unity3d`.

### Enabling TypeTrees in the Player

UnityDataTools supports Player build output, because that uses the same SerializedFiles and Archives that AssetBundles use. But often its output is not very useful. That is because, by default, Player builds do not include TypeTrees.

>[!IMPORTANT]
>It is possible to generate TypeTrees for the Player data, starting in Unity 2021.2.
>This makes that output compatible with UnityDataTool, but it is not a recommended flag to enable for your production builds.

To do so, the **ForceAlwaysWriteTypeTrees** Diagnostic Switch must be enabled in the Editor Preferences (Diagnostics->Editor section).

![](./TypeTreeForPlayer.png)

For more information about TypeTrees see the following section.

## TypeTrees

The TypeTree is a data structure exposing how objects have been serialized, i.e. the name, type and
size of their properties. It is used by Unity when loading an SerializedFile that was built by a
previous Unity version. When Unity is deserializing an object it needs to check if the current Type
definition exactly matches the Type definition used when the object was serialized. If they do not match
Unity will attempt to match up the properties as best as it can, based on the property names and structure
of the data. This process is called a "Safe Binary Read" and is somewhat slower than the regular fast binary read path.

TypeTrees are important in the case of AssetBundles, to avoid rebuilding and redistributing all AssetBundles after each minor upgrade of Unity or after doing minor changes to your MonoBehaviour and ScriptableObject serialization. However there can be a noticable overhead to storing the TypeTrees in each AssetBundle, e.g. the header size of each SerializedFile is bigger.

TypeTrees also make it possible to load an AssetBundle in the Editor, when testing game play.

>[!NOTE]
>There is a flag available when building AssetBundles that will exclude TypeTrees, see [BuildAssetBundleOptions.DisableWriteTypeTree](https://docs.unity3d.com/6000.2/Documentation/ScriptReference/BuildAssetBundleOptions.DisableWriteTypeTree.html). This has implications for future redistribution of your content, so use this flag with caution.

For Player Data the expectation is that you always rebuild all content together with each new build of the player.
So the Assemblies and serialized objects will all have matching types definitions. That is why, by default, the types are not included.

UnityDataTools relies on TypeTrees in order to understand the content of serialized objects. Using this approach it does
not need to hard code any knowledge about what exact types and properties to expect inside each built-in Unity type
(for example Materials and Transforms). And it can interpret serialized C# classes (e.g. MonoBehaviours, ScriptableObjects
and objects serialized through the SerializeReference attribute). That also means that UnityDataTools cannot understand
Player built content, unless the Player was built with TypeTrees enabled.

>[!TIP]
>The `binary2text` tool supports an optional argument `-typeinfo` to enable dumping out the TypeTrees in a SerializedFile header. That is a useful way to learn more about TypeTrees and to see exactly how Unity data is represented in the binary format.

### Platform details for using UnityDataTool with Player Data

The output structure and file formats for a Unity Player build are quite platform specific.

On some platforms the content is packaged into platform-specific container files, for example Android builds use .apk and .obb files. So accessing the actual SerializedFiles may involve mounting or extracting the content of those files, and possibly also opening a data.unity3d file inside them.

UnityDataTools directly supports opening the .data container file format used in Player builds that target Web platforms (e.g. WebGL). Specifically the "archive list" and "archive extract" command line option works with that format. Once extracted you can run other UnityDataTool commands on the output.

Android APK files are not difficult to open and expand using freely available utilities. For example on Windows they can be opened using 7-zip. Once the content is extracted you can run UnityDataTool commands on the output.

## Mapping back to Source Assets

Because Unity rearranges objects in the build into a build layout there is no 1-1 mapping between the output files and the original source assets. Only Scene files have a pretty direct mapping into the build output.

The UnityDataTool only looks at the output of the build, and has no information available about the source paths. This is expected, because the built output is optimized for speed and size, and there is no need to "leak" a lot of details about the source project in the data that gets shipped with the Player.

However in cases where you want to understand what contributes to the size your build, or to confirm whether certain content is actually included, then you may want to correlate the output back to the source assets in your project.

Often the source of content can be easily inferred, based on your own knowledge of your project, and the names of objects. For example the name of a Shader should be unique, and typically has a filename that closely matches the Shader name.

You can also use the [BuildReport](https://docs.unity3d.com/Documentation/ScriptReference/Build.Reporting.BuildReport.html) for Player and AssetBundle builds (excluding Addressables). The [Build Report Inspector](https://github.com/Unity-Technologies/BuildReportInspector) is a tool to aid in analyzing that data.

For AssetBundles built by [BuildPipeline.BuildAssetBundles()](https://docs.unity3d.com/ScriptReference/BuildPipeline.BuildAssetBundles.html), there is also source information available in the .manifest files for each bundle.

Addressables builds do not produce a BuildReport or .manifest files, but it offers similar build information in the user interface.
131 changes: 57 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,98 +1,81 @@
# UnityDataTools

The UnityDataTool is a set of command line tools showcasing what can be done with the
UnityFileSystemApi native dynamic library. The main purpose of these tools is to analyze the
content of Unity data files. You can directly jump
[here](https://github.com/Unity-Technologies/UnityDataTools/blob/main/UnityDataTool/README.md)
if your goal is to understand how to use the UnityDataTool command-line tool.
The UnityDataTool is a command line tool and showcase of the UnityFileSystemApi native dynamic library.
The main purpose is for analysis of the content of Unity data files, for example AssetBundles and
Player content.

The UnityFileSystemApi library is distributed in the Tools folder of the Unity editor (starting in
version 2022.1.0a14). For simplicity, it is also included in this repository. The library is somewhat
backward compatible, which means that it can read data files generated by any previous version of
Unity. Ideally, you should copy UnityFileSystemApi (.dll/.dylib) from Unity Editor install path
`Data/Tools/` subfolder to `UnityDataTool/UnityFileSystem/` of an Engine version that produced
serialized data you want to analyze.

## What is the purpose of the UnityFileSystemApi native library?

The purpose of the UnityFileSystemApi is to expose the functionalities of the WebExtract and
binary2text tools, but in a more flexible way. To fully understand what it means, let's first
discuss how Unity generates the data files in a build. The data referenced by the scenes in a build
is called the Player Data and is contained in SerializedFiles. A SerializedFile is the file format
used by Unity to store its data. In builds, they contain the serialized assets in the target's
platform-specific format.

When using AssetBundles or Addressables, things are slightly different. Firstly, note that
Addressables are AssetBundles on disk so we will only use the term AssetBundle in the remaining of
this document. AssetBundles are archive files (similar to zip files) that can be mounted at
runtime. They contain SerializedFiles, but contrary to those of the Player Data, they include what
is called a TypeTree<sup>[1](#footnote1)</sup>.

> Note: it is possible to generate TypeTrees for the Player data starting in Unity 2021.2.
> To do so, the *ForceAlwaysWriteTypeTrees* Diagnostic Switch must be enabled in the Editor
> Preferences (Diagnostic/Editor section).

The TypeTree is a data structure exposing how objects have been serialized, i.e. the name, type and
size of their properties. It is used by Unity when loading an AssetBundle that was built by a
previous Unity version (so you don't necessarily have to update all AssetBundles after upgrading a
project to a newer version of Unity).

The content of a SerializedFile including a TypeTree can be converted to a human-readable format
using the binary2text tool that can be found in the Tools folder of Unity. In the case of
AssetBundles, the SerializedFiles must first be extracted using the WebExtract tool that is also in
the Tools folder. For the Player Data, there is no TypeTree because it is included in a build and
therefore not sensitive to Unity version upgrades. Skipping TypeTrees yields reduced file size and
improved loading times.

The text file generated by binary2text can be very useful to
diagnose issues with a build, but they are usually very large and difficult to navigate. Because of
this, a tool called the [AssetBundle Analyzer](https://github.com/faelenor/asset-bundle-analyzer)
was created to make it easier to extract useful information from these files in the form of a
SQLite database. The AssetBundle Analyzer has been quite successful but it has several issues. It
is extremely slow as it runs WebExtract and binary2text on all the AssetBundles of a project and
has to parse very large text files. It can also easily fail because the syntax used by binary2text
is not standard and can even be impossible to parse in some occasions.
The [command line tool](./UnityDataTool/README.md) runs directly on Unity data files, without requiring the Editor to be running. It covers functionality of the Unity tools WebExtract and binary2text, with better performance. And it adds a lot of additional functionality, for example the ability to create a SQLite database for detailed analysis of build content. It is designed to scale for large build outputs and has been used to fine-tune big Unity-based games.

The UnityFileSystemApi library has been created to expose WebExtract and binary2text
functionalities. This enables the creation of tools that can read Unity data files with TypeTrees.
With it, it becomes very easy to create a binary2text-like tool that can output the data in any
format or a new faster and simpler AssetBundle Analyzer.
The command line tool uses the UnityFileSystemApi library to access the content of Unity Archives and Serialized files, which are Unity's primary binary formats. This repository also serves as a reference for how this library could be used as part of incorporating functionality into your own tools.

## Repository content

The repository contains the following items:
* UnityFileSystem: source code of a .NET class library exposing the functionalities or the
UnityFileSystemApi native library.
* UnityFileSystem.Tests: test suite for the UnityFileSystem library.
* UnityFileSystemTestData: the Unity project used to generate the test data.
* TestCommon: a helper library used by the test projects.
* [UnityDataTool](UnityDataTool/README.md): a command-line tool providing several features that can
be used to analyze the content of Unity data files.
* [UnityDataTool](UnityDataTool/README.md): a command-line tool providing access to the Analyzer, TextDumper and other class libraries.
* [Analyzer](Analyzer/README.md): a class library that can be used to extract key information
from Unity data files and output it into a SQLite database (similar to the
[AssetBundle Analyzer](https://github.com/faelenor/asset-bundle-analyzer)).
from Unity data files and output it into a SQLite database.
* [TextDumper](TextDumper/README.md): a class library that can be used to dump SerializedFiles into
a human-readable format (similar to binary2text).
* [ReferenceFinder](ReferenceFinder/README.md): a class library that can be used to find
reference chains from objects to other objects using a database created by the Analyzer
* UnityFileSystem: source code and binaries of a .NET class library exposing the functionalities or the
UnityFileSystemApi native library.
* UnityFileSystem.Tests: test suite for the UnityFileSystem library.
* UnityFileSystemTestData: the Unity project used to generate the test data.
* TestCommon: a helper library used by the test projects.

## Getting the UnityFileSystemApi library

The UnityFileSystemApi library is distributed in the Tools folder of the Unity editor (starting in
version 2022.1.0a14). For convenience this repository includes a copy of the Unity 2022 Windows, Mac and Linux builds of the
library, in the `UnityFileSystem/` directory. The library is somewhat backward compatible,
which means that it can read data files generated by any previous version of
Unity. Ideally, you should copy UnityFileSystemApi (.dll/.dylib) from Unity Editor install path
`Data/Tools/` subfolder to `UnityDataTool/UnityFileSystem/` of an Engine version that produced
serialized data you want to analyze.

## How to build

Currently, we do not host builds of UnityDataTools, you will need to clone or download this repo and build it yourself.

1) The projects in this solution require the [.NET 9.0 SDK](https://dotnet.microsoft.com/en-us/download/dotnet/9.0).
2) Copy `UnityFileSystemApi` library from UnityEditor installation
`{UnityEditor}/Data/Tools/` to `UnityDataTool/UnityFileSystem/` before building.
2) Copy `UnityFileSystemApi` library from your Unity Editor installation, in
`{UnityEditor}/Data/Tools/` to `UnityDataTool/UnityFileSystem/`. This step is typically optional, because a previously built version of the library is included in the repo that can read the output from most Unity Versions.
3) Build using `dotnet build -c Release`

Note: You can use your favorite IDE to build solution.
Tested Visual Studio and Rider on Windows and Rider on Mac.
Note: Alternatively you can build with your favorite IDE. This was tested with Visual Studio and Rider on Windows and Rider on Mac.

See the documentation page for the [command line tool](./UnityDataTool/README.md) for information about how to run the tool after you have built it.

## What is the purpose of the UnityFileSystemApi native library?

The purpose of the UnityFileSystemApi is to expose the functionalities of the WebExtract and
binary2text tools, but in a more flexible way.

To better understand the files and data formats that the Unity supports in the runtime see [this topic](./Documentation/unity-content-format.md).

## Origins

This tool is the evolution of the [AssetBundle Analyzer](https://github.com/faelenor/asset-bundle-analyzer)
written by [Francis Pagé](https://www.github.com/faelenor).

That project was the first to introduce the SQLite database analysis of Unity build output to address
the difficulty of diagnosing build issues through the raw binary2text output, which is large and difficult to navigate.

The AssetBundle Analyzer was quite successful, but it has several issues. It
is extremely slow as it runs WebExtract and binary2text on all the AssetBundles of a project and
has to parse very large text files. It can also easily fail because the syntax used by binary2text
is not standard and can even be impossible to parse in some occasions.

To address those problems [@faelenor](https://www.github.com/faelenor) established this UnityDataTools
repository and the UnityFileSystemApi library was created within Unity, to replace the usage of WebExtract and
binary2text functionalities. With the library, it becomes very easy to create a binary2text-like tool
that can output the data in any format, as well as the fast and simpler code for generating the SQLite output.

This tool continues to be useful in recent Unity versions, for example Unity 6.

## Disclaimer

This project is provided on an "as-is" basis and is not officially supported by Unity. It is an
experimental tool provided as an example of what can be done using the UnityFileSystemApi. You can
report bugs and submit pull requests, but there is no guarantee that they will be addressed.

---
*Footnotes*: <a name="footnote1">1</a>: AssetBundles include the TypeTree by default but this can
be disabled by using the
[DisableWriteTypeTree](https://docs.unity3d.com/ScriptReference/BuildAssetBundleOptions.DisableWriteTypeTree.html)
option.
Loading