What is this?
These are symbol maps.
iometa by itself can find the names of classes, but the ones of methods are simply not preserved in release kernels. So these symbol maps are essentially huge lookup tables for virtual method names, which can be passed to
iometa as last argument to recover most symbols.
Symbol maps take the following form:
OSObject # This is a comment - ~OSObject() - ~OSObject() - release(int) const - getRetainCount() const - retain() const - release() const - serialize(OSSerialize*) const - getMetaClass() const - OSMetaClassBase::isEqualTo(OSMetaClassBase const*) const - taggedRetain(void const*) const - taggedRelease(void const*) const - taggedRelease(void const*, int) const - init() - free() OSString - initWithString(OSString const*) - initWithCString(char const*) - initWithCStringNoCopy(char const*) - getLength() const - getChar(unsigned int) const - setChar(char, unsigned int) - getCStringNoCopy() const - isEqualTo(OSString const*) const - isEqualTo(char const*) const - isEqualTo(OSData const*) const OSSymbol - isEqualTo(OSSymbol const*) const
Basic class and method names should be fairly obvious, but a few things should be noted:
Comments can be started with
#and extend to the end of the line. These are entirely ignored by parsing, so
iometa -Mwill also strip them out.
The class inheritance is not reflected in symbol maps, and is only parsed from kernels. However, inherited methods are not listed in child classes (e.g. see how
OSStringdoes not list
free(), etc., because they are inherited from
- ~OSObject() - ~OSObject()
Destructors of the form
~ClassName()(and in theory constructors of the form
ClassName(), but iOS doesn't have them in vtabs) are detected, and will have their name replaced by the class name in child classes.
- OSMetaClassBase::isEqualTo(OSMetaClassBase const*) const
The recorded class name for a method can be overridden by prepending
ClassName::in front of the method name. This is sometimes necessary in cases where XNU's OSMetaClass RTTI system doesn't accurately reflect the actual C++ inheritance structures.
Empty placeholders. Those are not shown above, but if a line contains nothing but a dash, it denoted that there exists a virtual method in that place, but its name and arguments are unknown. Example would be:
OSString - initWithString(OSString const*) - - initWithCStringNoCopy(char const*)
Where did these symbols come from?
With the iOS 12.0 beta 1, Apple introduced a new kernelcache format for some devices where kexts were no longer just prelinked like before, but effectively directly compiled in. This new format allows for many optimisations, and had as a consequence the complete removal of all symbols (previously we had some 4000-and-something symbols left). However, on the very first beta, Apple accidentally shipped kernels for A7 iPads and A8 iPhones with all symbols left in, more than 90'000 in total! Out of all of those, about 25'000 are symbols corresponding to virtual methods, and the original symbol maps were generated from that with
Those are the
A8-dense.txt files you'll find in the
12.0b1 folder, but you'll notice that those aren't the only symbol maps in there. I've tried my best to match those symbols against the kernelcaches of all other devices, and for those methods that got no match, to recover their names and argument list from panic strings or debugging information left in the kernels - with not overwhelming, but I think reasonable results. At the time of writing, I've also ported these symbol maps forward in time to the iOS 12 beta 2 (which additionally switched the iPhone 5s and iPod touch 6G to the new kernelcache format) and iOS 12.0 Golden Master (which was the first version to include A12 devices).
Where are we going from here?
I'm obviously gonna continue to ports these symbols onto newer versions, because that's the entire point of keeping these maps. Now, since I don't have any of the highly sophisticated binary matching algorithms I wish I did, chances are I'm gonna miss a ton of stuff like:
- Methods getting swapped around or replaced by others, but with the number of methods per class staying the same
- Methods changing the amount and types of arguments
- New methods whose names are mentioned somewhere in the binary where I happen to not look for it
So I would greatly appreciate if you could point out any kind of error you detect in these maps, as well as any symbol name or argument list that you believe I missed or messed up. In that spirit, I'm also going to document how these lists are organised, how I try and update them to new versions/devices, as well as noteworthy things I've come across while doing so.
Ok, first of all, the symbol maps are organised by device class - A7, A8, etc. Originally I wanted to put all symbols for all devices into a single file, but in attempting to do that my own tool greeted me with warnings like:
[WRN] Symmap entry for AppleBCMWLANBusInterface has 60 methods, vtab has 88.
[WRN] Symmap entry for AppleBCMWLANCore has 84 methods, vtab has 136.
[WRN] Symmap entry for AppleBCMWLANBSSBeacon has 61 methods, vtab has 66.
[WRN] Symmap entry for AppleBCMWLANIO80211APSTAInterface has 88 methods, vtab has 83.
[WRN] Symmap entry for AppleBCMWLANProximityInterface has 88 methods, vtab has 83.
You can reproduce that by attempting to use an A7 symbol map on an A8 cache or vice versa. Basically different device generations have, under the same name, different classes implementing different methods. So in order to work around that, I gave each generation its own map, since within generations there's at best very little difference. With maps provided on this repo, you should only ever see two kinds of warnings:
[WRN] Symmap entry for <Class> has X methods, vtab has 0.
[WRN] Symmap entry for <Class> has X methods, but class has no vtab.
Both are symptoms of the same condition, namely the symbol map holding information on a class when the kernel effectively optimised that class out of existence for that device. And I can live with that.
Then the next split is by kernelcache format. This is
A8-legacy.txt. The reason these need a split is optimisation, namely abstract classes having been optimised out. The problem arises that when you have a class hierarchy like so:
Ais a non-abstract base class declaring virtual method
Bis an abstract class inheriting from class
Aand declaring virtual method
Cis a non-abstract class inheriting from class
Band declaring virtual method
Now in the "legacy" kernelcache format, class
B usually gets its own vtable and everything, and a symbol map would look as following:
A - x() B - y() C - z()
In the "dense" kernelcache format however, class
B will have been mostly optimised out and not get a vtable, which means that no methods for
B will be recorded, which in turn will make it look like all of
B's methods were in fact introduced by
A - x() B C - y() - z()
For one, this makes the two symbol maps inherently incompatible, and for two this is also the reason for the "class override" feature, so that
y() can be accurately attributed to
B if we have that knowledge:
A - x() B C - B::y() - z()
If you ever end up porting a symbol map for a device class that just switched from legacy to dense kernelcache format, you'll no doubt notice that this is the biggest change you'll have to make: moving methods of abstract classes into their child classes. The second biggest will probably be deleting all the stuff that has been optimised out now. ;P
With that sorted out, here's how I actually go at updating symbol maps:
- I simply run
iometa -M kernel old.txt >/tmp/new.txtagainst a kernel, using the symbol map from the last version (or in the case of a new device, the closest existing device I have a map for). Usually that will throw a bunch of warnings and turn between a few hundred and a few thousand functions into
fn_0x..., but the vast majority will go through just fine, and I blindly assume those to still be accurate.
I do this for each device belonging to a generation, collect all newly generated symbol maps, and then merge them back into one with my ugly script (this is necessary in order to keep classes that only e.g. either iPads or iPhones have, but yet get rid of classes that were actually removed).
- I go through all classes with
fn_0x...methods and, before even looking at assembly, compare a bunch of vtables between this and the last generation. Of particular interest are "pure virtual" methods (i.e. those showing up red in
iometaoutput) as well as those overridden in child classes:
- When you've finally exhausted pattern matching, it's time to dive into assembly and find out which of those methods in between were added or removed. And if methods were added and we're somewhat lucky, it will also pass its own name and/or signature to some logging function. Now if it's just the name without signature, recovering the argument list can be challenge, so here are a few tricks:
When arguments are either stored to memory or passed to printf-like functions, that usually gives away their exact size. Otherwise you only get the information whether they're 32- or 64bit.
For 32bit values I usually assume
unsigned intunless a comparison instruction suggests signed-ness, or if it's only tested for zero vs non-zero, in which case I assume
For 64bit values my base assumption is
void*unless something clearly indicates a size, magic constant, bitmask, or similar, in which case I go for
unsigned long long.
For pointer types it should be fairly obvious what types they have, with probably the most complicated case being C++ objects. This is an area where A12 devices with PAC come in really handy. A virtual method call with PAC looks something like this:
0xfffffff00809f4e8 080040f9 ldr x8, [x0] 0xfffffff00809f4ec e83bc1da autdza x8 0xfffffff00809f4f0 09e11691 add x9, x8, 0x5b8 0xfffffff00809f4f4 08dd42f9 ldr x8, [x8, #1464] 0xfffffff00809f4f8 6944fdf2 movk x9, 0xea23, lsl #48 0xfffffff00809f4fc e10302aa mov x1, x2 0xfffffff00809f500 09093fd7 blraa x8, x9
And this neat little value
0xea23is the same thing that
iometa -Adisplays for each method with
pac=0xNNNN. In most cases that alone should be unique to a single method, but even when it isn't, that together with the vtable offset (
0x5b8here) should definitely allow you to uniquely identify the method, and with that the minimum type that C++ object is expected to conform to.
For the absolute hardest cases, which are arguments that are either blindly passed through to other functions or simply ignored, the same PAC trick as above can help again, just in reverse this time. By looking up the PAC tag of the current method and searching the kernelcache for all instructions of the form
movk x.*, 0xNNNN, lsl #48, you should be able to find any last invocation of that method, and thus can look at how the arguments are loaded.
And that's about it. Every now and then you'll come across methods whose names are simply lost (like when the function consists of a single
ret) or whose arguments are passed around way too long before their type becomes obvious. Just put those down as
void* and if someone ever goes on to reverse that method/class/kext, they can hit me up once they've figured it out. ;)
|Generation||Devices||Identifiers||Models||New kernelcache format since|
|A7||iPad Air||iPad4,1||J71AP||12.0 beta 1|
|iPad mini 2||iPad4,4||J85AP|
|iPad mini 3||iPad4,7||J85mAP|
|iPhone 5s||iPhone6,1||N51AP||12.0 beta 2|
|A8||iPad mini 4||iPad5,1||J96AP||N/A|
|iPad Air 2||iPad5,3||J81AP|
|iPhone 6+||iPhone7,1||N56AP||12.0 beta 1|
|iPod touch 6G||iPod7,1||N102AP||12.0 beta 2|
|A9||iPad Pro (9.7in)||iPad6,3||J127AP||N/A|
|iPad Pro (12.9in)||iPad6,7||J98aAP|
|A10||iPad Pro 2 (12.9in)||iPad7,1||J120AP||N/A|
|iPad Pro 2 (10.5in)||iPad7,3||J207AP|
|A12||iPad Pro 3 (11.0in)||iPad8,1||J317AP||12.1|
|iPad Pro 3 (12.9in)||iPad8,5||J320AP|
|iPhone XS||iPhone11,2||D321AP||12.0 GM|
|iPhone XS Max||iPhone11,4||D331AP|