-
-
Notifications
You must be signed in to change notification settings - Fork 193
Introduction to Mixins Obfuscation and Mixins
Before we can go any further with our exploration of Mixin architecture, let's take a quick detour to cover an important topic: Obfuscation in the Minecraft codebase and how it is relevant to Mixins.
Obfuscation is the process of converting otherwise human-readable code symbols into obscure ones which make it hard to read (in fact the word obfuscate just means "deliberately make obscure").
Because Minecraft is written in Java, it would be very easy to decompile to readable code if obfuscation techniques were not applied. Mojang applies obfuscation to Minecraft before releasing it, this presents a problem for modders for two reasons:
- the obfuscation applied places everything in the "default package", this makes it impossible to
import
classes from the codebase - working with the obfuscated names would be a nightmare because the code is basically unreadable.
This means that in order to be able to compile our code against Minecraft it is necessary to de-obfuscate the java classes beforehand, a community project known as the Mod Coder Pack (MCP for short) provides facilities for doing exactly this.
Once we've written our code, we then need to re-obfuscate our mod code so that it can work with the original (obfuscated) codebase. The development lifecycle thus looks something like this:
Let's start with the basics:
When working with Minecraft modding, fields and methods can have up to 3 names
-
The obfuscated name - this is the name assigned by mojang as part of their obfuscation of the codebase, it will generally be only 1-2 letters long, for example an obfuscated method may be called
k
-
The "Searge name" - this is a unique token assigned to the field or method in order to make decompilation possible. It consists of a prefix, a unique id number and the member's (original) name, for example
func_1234_k
-
The "MCP name" - this is a more human-readable name, crowdsourced from the community in order to make the codebase more understandable. For example
getHealth
During decompilation, the field will be transformed from one type to another, ending up at the "friendly" MCP name. During reobfuscation, the reverse process is performed. In our example, the method k
becomes func_1234_k
and finally getName
.
During each stage all fields and methods are renamed, and thus each set of obfuscations forms a discrete obfuscation environment, with all fields and methods having a name corresponding to that environment:
We will also refer to the the notional boundary between these hypothetical environments an "obfuscation boundary", since it should then be clear that crossing the boundary can be problematic. For example the method getHealth
(MCP name) will always expect the takeDamage
method to also have its MCP name during any particular execution cycle, if names from different environments are present at one time, then problems are likey to occur.
Transitioning between obfuscation environments has to happen "all at once", and is facilitated by mapping files which contain a mapping of one name to another. These mapping files contain an entry for every single field, method, argument and class in the codebase.
This is not strictly true, since the MCP name is crowd-sourced there are quite a few members with no defined MCP name, but we'll pretend for now that all obfuscations are present all of the time because this exception to the rule isn't important right now. Let's just imagine that un-mapped MCP names are effectively the same as if the MCP and SRG name were the same.
In order to ensure that not just the declaration of a symbol is renamed, but also all references to that symbol, the remapping phases must be applied to the entire codebase at once. Whilst SRG names are unique and can be remapped deterministically, other symbols are not so lucky and thus the remapping tools need to load and understand the entire codebase at once in order to perform effective remapping.
Because the tools work this way, and do have a rudimentary understanding of the code structure and the relationship of - for example - a method with an overridden method in a derived class, the remapper is able to remap references to obfuscated classes even in classes which aren't part of the original codebase. In this way, derived classes (for example ones we add in a mod) and classes with calls to remapped methods (like ones we might add in a mod) will also have these method calls and field accesses remapped!
You may be wondering why you need to know all this, what does it have to do with mixins? Well, by now the following things should be clear:
- Everything which is going to interact in some way with game classes must pass through the obfuscator before it can be used in production
- Anything which directly refers to a field, method or class in the game, will be handled automatically by the remapper, since the remapper "understands" these relationships already.
However, this is not the case with mixins, because we can create fields and methods in mixins which do not directly reference their counterparts - shadows!
As you may recall from part 1 of this series, it is possible to add shadow members to our mixin to indicate that a particular method or field will exist in the target class at runtime. The main problem this causes is that the obfuscator has no built-in understanding of these members, and thus will be unable to automatically obfuscate them.
Mixins tackle this problem by parsing the @Shadow
annotations at compile time and adding appropriate obfuscation entries for the shadow members to the obfuscation tables directly. This is handled by an Annotation Processor which plugs into the java compiler.
As we know, the obfuscator is already capable of understanding references to fields and methods in derived classes, and thus we only need to add obfuscation table entries for the shadow members themselves, references to those members in our mixins are then handled automatically. The mixin can then safely traverse the obfuscation boundary.
Later in this series you will be introduced to other mixin features which require this special handling to traverse the obfuscation boundary. The key things to remember at this point are:
-
Any direct references in your mixins to classes in the game codebase will be handled automatically, for example:
-
References to superclass methods when the mixin is derived from a game class.
-
Any
@Override
methods in your mixin which override methods in game classes or interfaces. -
Any external references in your mixin code to game classes or members.
-
Any mixin-specific mechanisms, such as Shadows, Overwrites (introduced in the next section) and Injectors (introduced later) will always be decorated with some kind of annotation. This makes them visible to the Mixin Annotation Processor which will handle their obfuscation traversal.
If you're reading this series as an introduction, you should stop here. The following sections provide some more technical detail and are included for completeness only, they will be referred to in later sections. They are not intended as introductory reading, you have been warned.
Symbol references passed into SpecialSource will of course be reobfuscated as we expect, and the symbols in the underlying bytecode will be reobfuscated as a result. This "hard" re-obfuscation applies to the following member types:
- Class references (when obfuscating to "notch names" only, not applicable with Forge)
- Method names
- Field names
However some member references are specified as strings inside annotations, in particular
- Injector declarations
- Rerouter declarations
Since SpecialSource cannot remap these "soft" references, a different mechanism is used.
In order to allow "soft" references to be obfuscated, the Annotation Processor bakes a mapping file which is included into the production jar and specified in the configuration file for the mixin set. This Reference Map (or "refmap") contains a mapping of all soft references in the mixin set to their obfuscated counterparts.
A single reference map is emitted for each compile stage, and thus each mixin set which is compiled during a particular pass should use the same refmap for that pass. A unique name for the refmap should be chosen to avoid conflicts.
For example, let's assume we define the following mixin sets in our mod:
mixins.myproject.core.json
mixins.myproject.extra.json
We may define a refmap file for both sets and name it mixins.myproject.refmap.json
for consistency.
Note that it is absolutely vital to include the refmap file in your production jar, and specify it in your mixin configuration. Failing to do so will result in errors at mixin application time since the obfuscated references will not be resolved by the mixin processor without it.
You can omit the refmap file under the following circumstances:
- You are not using injectors or rerouters in your mixins
Some target environments use partial runtime deobfuscation. That is, they de-obfuscate symbols to intermediate names (SRG names) at runtime, while others do not. This partial translation is done so that mods can have a more stable obfuscation environment to target across multiple versions of the game.
It is obviously important that mixin bytecode being blended with the target class is applied after the runtime deobfuscation is applied, so that the obfuscation mappings in the environment match those in the mixin. Let's revisit the diagram from the previous artice which shows the overview of mixins in the transformer chain:
When we consider where the runtime deobfuscation is applied in this picture (in the upstream transformer chain) we can see how the deobfuscation transformer itself represents the obfuscation boundary, and why mixins must be applied downstream of this transformer:
The exception to the deterministic rule of OBF -> SRG -> MCP arises due to synthetic members in target classes. Whilst synthetic members in the obfuscated codebase also have obfuscated names just like their first-class brethren, a problem arises in development because the re-establishment of inner class relationships causes these synthetic members to be stripped and then re-created by the compiler.
For example, let's consider a non-static inner class's reference to its outer class, typically named "this$0
" in classes generated by javac
. When obfuscated this member recieves the catchy name of "a
", and as it passes through the decompilation process is alternately renamed to "field_999_a
" before finally recieving the catchy MCP name of "myOuter
". However, as the final stage of setting up the development workspace is to re-integrate the inner class with its outer class in the source, the synthetic field is finally stripped and allowed to be recreated by the compiler, giving it a name in the development workspace of the original "this$0
".
This presents a problem, because if we wish to shadow the field we must name it myOuter
(as the value which appears in the mapping files is named as such), but if we do this then the shadow won't work at development time because no field named myOuter
actually exists!
It is possible to overcome this problem by specifying an alias for the shadow field. The alias exists as a resolver of last resort for the mixin processor when attempting to locate a shadow field's target. If the mixin processor is unable to locate the desired field in the target class, it first inspects the aliases list before failing with an error.
To specify aliases for a Shadow or Overwrite annotation, simply specify the aliases
value on the annotation:
@Shadow(aliases = {"this$0"})
private MyOuterClassType myOuter;
Note that aliases can only be used on private
fields and methods. This is because the alias can only be resolved at mixin application time and thus the rename of the field can only be propagated to the containing class and no further (because derived class mixins or other referring classes may already have been loaded and applied by this time). This is not generally a problem however, since the synthetic fields which are the reason for the alias mechanism in the first place are almost always private or package-private.