Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
465 lines (427 sloc) 19.4 KB

IOKit resymbolication

What is this?

These are symbol maps. iometa by itself can find the names of classes, but the ones of methods are simply not preserved in release kernels. So these symbol maps are essentially huge lookup tables for virtual method names, which can be passed to iometa as last argument to recover most symbols.
Symbol maps take the following form:

OSObject                    # This is a comment
- ~OSObject()
- ~OSObject()
- release(int) const
- getRetainCount() const
- retain() const
- release() const
- serialize(OSSerialize*) const
- getMetaClass() const
- OSMetaClassBase::isEqualTo(OSMetaClassBase const*) const
- taggedRetain(void const*) const
- taggedRelease(void const*) const
- taggedRelease(void const*, int) const
- init()
- free()
OSString
- initWithString(OSString const*)
- initWithCString(char const*)
- initWithCStringNoCopy(char const*)
- getLength() const
- getChar(unsigned int) const
- setChar(char, unsigned int)
- getCStringNoCopy() const
- isEqualTo(OSString const*) const
- isEqualTo(char const*) const
- isEqualTo(OSData const*) const
OSSymbol
- isEqualTo(OSSymbol const*) const

Basic class and method names should be fairly obvious, but a few things should be noted:

  1. Comments can be started with # and extend to the end of the line. These are entirely ignored by parsing, so iometa -M will also strip them out.

  2. The class inheritance is not reflected in symbol maps, and is only parsed from kernels. However, inherited methods are not listed in child classes (e.g. see how OSString does not list init(), free(), etc., because they are inherited from OSObject).

  3. Destructors:

    - ~OSObject()
    - ~OSObject()
    

    Destructors of the form ~ClassName() (and in theory constructors of the form ClassName(), but iOS doesn't have them in vtabs) are detected, and will have their name replaced by the class name in child classes.

  4. Class inaccuracy:

    - OSMetaClassBase::isEqualTo(OSMetaClassBase const*) const
    

    The recorded class name for a method can be overridden by prepending ClassName:: in front of the method name. This is sometimes necessary in cases where XNU's OSMetaClass RTTI system doesn't accurately reflect the actual C++ inheritance structures.

  5. Empty placeholders. Those are not shown above, but if a line contains nothing but a dash, it denoted that there exists a virtual method in that place, but its name and arguments are unknown. Example would be:

    OSString
    - initWithString(OSString const*)
    -
    - initWithCStringNoCopy(char const*)
    

Where did these symbols come from?

With the iOS 12.0 beta 1, Apple introduced a new kernelcache format for some devices where kexts were no longer just prelinked like before, but effectively directly compiled in. This new format allows for many optimisations, and had as a consequence the complete removal of all symbols (previously we had some 4000-and-something symbols left). However, on the very first beta, Apple accidentally shipped kernels for A7 iPads and A8 iPhones with all symbols left in, more than 90'000 in total! Out of all of those, about 25'000 are symbols corresponding to virtual methods, and the original symbol maps were generated from that with iometa -M.
Those are the A7-dense.txt and A8-dense.txt files you'll find in the 12.0b1 folder, but you'll notice that those aren't the only symbol maps in there. I've tried my best to match those symbols against the kernelcaches of all other devices, and for those methods that got no match, to recover their names and argument list from panic strings or debugging information left in the kernels - with not overwhelming, but I think reasonable results. At the time of writing, I've also ported these symbol maps forward in time to the iOS 12 beta 2 (which additionally switched the iPhone 5s and iPod touch 6G to the new kernelcache format) and iOS 12.0 Golden Master (which was the first version to include A12 devices).

Where are we going from here?

I'm obviously gonna continue to ports these symbols onto newer versions, because that's the entire point of keeping these maps. Now, since I don't have any of the highly sophisticated binary matching algorithms I wish I did, chances are I'm gonna miss a ton of stuff like:

  • Methods getting swapped around or replaced by others, but with the number of methods per class staying the same
  • Methods changing the amount and types of arguments
  • New methods whose names are mentioned somewhere in the binary where I happen to not look for it

So I would greatly appreciate if you could point out any kind of error you detect in these maps, as well as any symbol name or argument list that you believe I missed or messed up. In that spirit, I'm also going to document how these lists are organised, how I try and update them to new versions/devices, as well as noteworthy things I've come across while doing so.

Ok, first of all, the symbol maps are organised by device class - A7, A8, etc. Originally I wanted to put all symbols for all devices into a single file, but in attempting to do that my own tool greeted me with warnings like:

[WRN] Symmap entry for AppleBCMWLANBusInterface has 60 methods, vtab has 88.
[WRN] Symmap entry for AppleBCMWLANCore has 84 methods, vtab has 136.
[WRN] Symmap entry for AppleBCMWLANBSSBeacon has 61 methods, vtab has 66.
[WRN] Symmap entry for AppleBCMWLANIO80211APSTAInterface has 88 methods, vtab has 83.
[WRN] Symmap entry for AppleBCMWLANProximityInterface has 88 methods, vtab has 83.

You can reproduce that by attempting to use an A7 symbol map on an A8 cache or vice versa. Basically different device generations have, under the same name, different classes implementing different methods. So in order to work around that, I gave each generation its own map, since within generations there's at best very little difference. With maps provided on this repo, you should only ever see two kinds of warnings:

[WRN] Symmap entry for <Class> has X methods, vtab has 0.
[WRN] Symmap entry for <Class> has X methods, but class has no vtab.

Both are symptoms of the same condition, namely the symbol map holding information on a class when the kernel effectively optimised that class out of existence for that device. And I can live with that.

Then the next split is by kernelcache format. This is A8-dense.txt vs A8-legacy.txt. The reason these need a split is optimisation, namely abstract classes having been optimised out. The problem arises that when you have a class hierarchy like so:

  • Class A is a non-abstract base class declaring virtual method x().
  • Class B is an abstract class inheriting from class A and declaring virtual method y().
  • Class C is a non-abstract class inheriting from class B and declaring virtual method z().

Now in the "legacy" kernelcache format, class B usually gets its own vtable and everything, and a symbol map would look as following:

A
- x()
B
- y()
C
- z()

In the "dense" kernelcache format however, class B will have been mostly optimised out and not get a vtable, which means that no methods for B will be recorded, which in turn will make it look like all of B's methods were in fact introduced by C:

A
- x()
B
C
- y()
- z()

For one, this makes the two symbol maps inherently incompatible, and for two this is also the reason for the "class override" feature, so that y() can be accurately attributed to B if we have that knowledge:

A
- x()
B
C
- B::y()
- z()

If you ever end up porting a symbol map for a device class that just switched from legacy to dense kernelcache format, you'll no doubt notice that this is the biggest change you'll have to make: moving methods of abstract classes into their child classes. The second biggest will probably be deleting all the stuff that has been optimised out now. ;P

With that sorted out, here's how I actually go at updating symbol maps:

  1. I simply run iometa -M kernel old.txt >/tmp/new.txt against a kernel, using the symbol map from the last version (or in the case of a new device, the closest existing device I have a map for). Usually that will throw a bunch of warnings and turn between a few hundred and a few thousand functions into fn_0x..., but the vast majority will go through just fine, and I blindly assume those to still be accurate.
    I do this for each device belonging to a generation, collect all newly generated symbol maps, and then merge them back into one with my ugly script (this is necessary in order to keep classes that only e.g. either iPads or iPhones have, but yet get rid of classes that were actually removed).
  2. I go through all classes with fn_0x... methods and, before even looking at assembly, compare a bunch of vtables between this and the last generation. Of particular interest are "pure virtual" methods (i.e. those showing up red in iometa output) as well as those overridden in child classes:
    vtab-1a vtab-1b
    vtab-2a vtab-2b
    vtab-3a vtab-3b
    You can tell a damn lot from just those patterns.
  3. When you've finally exhausted pattern matching, it's time to dive into assembly and find out which of those methods in between were added or removed. And if methods were added and we're somewhat lucky, it will also pass its own name and/or signature to some logging function. Now if it's just the name without signature, recovering the argument list can be challenge, so here are a few tricks:
    • When arguments are either stored to memory or passed to printf-like functions, that usually gives away their exact size. Otherwise you only get the information whether they're 32- or 64bit.

    • For 32bit values I usually assume unsigned int unless a comparison instruction suggests signed-ness, or if it's only tested for zero vs non-zero, in which case I assume bool.

    • For 64bit values my base assumption is void* unless something clearly indicates a size, magic constant, bitmask, or similar, in which case I go for unsigned long long.

    • For pointer types it should be fairly obvious what types they have, with probably the most complicated case being C++ objects. This is an area where A12 devices with PAC come in really handy. A virtual method call with PAC looks something like this:

      0xfffffff00809f4e8      080040f9       ldr x8, [x0]
      0xfffffff00809f4ec      e83bc1da       autdza x8
      0xfffffff00809f4f0      09e11691       add x9, x8, 0x5b8
      0xfffffff00809f4f4      08dd42f9       ldr x8, [x8, #1464]
      0xfffffff00809f4f8      6944fdf2       movk x9, 0xea23, lsl #48
      0xfffffff00809f4fc      e10302aa       mov x1, x2
      0xfffffff00809f500      09093fd7       blraa x8, x9
      

      And this neat little value 0xea23 is the same thing that iometa -A displays for each method with pac=0xNNNN. In most cases that alone should be unique to a single method, but even when it isn't, that together with the vtable offset (0x5b8 here) should definitely allow you to uniquely identify the method, and with that the minimum type that C++ object is expected to conform to.

    • For the absolute hardest cases, which are arguments that are either blindly passed through to other functions or simply ignored, the same PAC trick as above can help again, just in reverse this time. By looking up the PAC tag of the current method and searching the kernelcache for all instructions of the form movk x.*, 0xNNNN, lsl #48, you should be able to find any last invocation of that method, and thus can look at how the arguments are loaded.

And that's about it. Every now and then you'll come across methods whose names are simply lost (like when the function consists of a single ret) or whose arguments are passed around way too long before their type becomes obvious. Just put those down as void* and if someone ever goes on to reverse that method/class/kext, they can hit me up once they've figured it out. ;)

Device/Version Overview

Generation Devices Identifiers Models New kernelcache format since
A7 iPad Air iPad4,1 J71AP 12.0 beta 1
iPad4,2 J72AP
iPad4,3 J73AP
iPad mini 2 iPad4,4 J85AP
iPad4,5 J86AP
iPad4,6 J87AP
iPad mini 3 iPad4,7 J85mAP
iPad4,8 J86mAP
iPad4,9 J87mAP
iPhone 5s iPhone6,1 N51AP 12.0 beta 2
iPhone6,2 N53AP
A8 iPad mini 4 iPad5,1 J96AP N/A
iPad5,2 J97AP
iPad Air 2 iPad5,3 J81AP
iPad5,4 J82AP
iPhone 6+ iPhone7,1 N56AP 12.0 beta 1
iPhone 6 iPhone7,2 N61AP
iPod touch 6G iPod7,1 N102AP 12.0 beta 2
A9 iPad Pro (9.7in) iPad6,3 J127AP N/A
iPad6,4 J128AP
iPad Pro (12.9in) iPad6,7 J98aAP
iPad6,8 J99aAP
iPad 5 iPad6,11 J71sAP
J71tAP
iPad6,12 J72sAP
J72tAP
iPhone 6s iPhone8,1 N71AP
N71mAP
iPhone 6s+ iPhone8,2 N66AP
N66mAP
iPhone SE iPhone8,4 N69AP
N69uAP
A10 iPad Pro 2 (12.9in) iPad7,1 J120AP N/A
iPad7,2 J121AP
iPad Pro 2 (10.5in) iPad7,3 J207AP
iPad7,4 J208AP
iPad 6 iPad7,5 J71bAP
iPad7,6 J72bAP
iPhone 7 iPhone9,1 D10AP
iPhone9,3 D101AP
iPhone 7+ iPhone9,2 D11AP
iPhone9,4 D111AP
A11 iPhone 8 iPhone10,1 D20AP N/A
D20AAP
iPhone10,4 D201AP
D201AAP
iPhone 8+ iPhone10,2 D21AP
D21AAP
iPhone10,5 D211AP
D211AAP
iPhone X iPhone10,3 D22AP
iPhone10,6 D221AP
A12 iPad Pro 3 (11.0in) iPad8,1 J317AP 12.1
iPad8,2 J317xAP
iPad8,3 J318AP
iPad8,4 J318xAP
iPad Pro 3 (12.9in) iPad8,5 J320AP
iPad8,6 J320xAP
iPad8,7 J321AP
iPad8,8 J321xAP
iPhone XS iPhone11,2 D321AP 12.0 GM
iPhone XS Max iPhone11,4 D331AP
iPhone11,6 D331pAP
iPhone XR iPhone11,8 N841AP
You can’t perform that action at this time.