Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for shared debug info items in DEX files #843

Open
zerny opened this issue Jan 31, 2022 · 7 comments
Open

Improve support for shared debug info items in DEX files #843

zerny opened this issue Jan 31, 2022 · 7 comments

Comments

@zerny
Copy link

zerny commented Jan 31, 2022

The DEX format allows sharing the same debug_info_item by several methods / code_items. The is used by D8 to canonicalize debug info and reduce the size the DEX files and has been in place since the introduction of D8. R8 and other tools make further use of this to share a small set of very large debug_info_item for almost all methods in the program.

The shared use of debug_info_item is within the existing specification of the format and has been tested to work on all VMs back to 4.0.4. There is one requirements to avoid some issues on legacy VMs, namely that the number of parameters in the debug_info_item matches the parameter count of all methods referencing it.

A suggestion to support these cases would be to:

  1. Don't count the size of the debug_info_item towards the size of a method.
  2. Don't represent debug events for addresses passed the instruction offset of the last instruction in a method referencing it.
  3. Don't represent debug line entries with no instruction offset change.

If 1. is not an acceptable solution, the fix for 2. and 3. will mitigate the double accounting by at least only attributing the (shared) contribution from the given method.

Item 3. is inline with the debugging behavior on runtimes where it is not possible to break on lines with no pc change. The D8 compiler will explicitly ensure that a nop is always inserted if ever there are two lines with no intermediate instructions.

A final possible feature would be to support canonicalizing the debug information again when writing to DEX with smali. I think for pipelines needing this is is likely best to do a subsequent run of D8, as that tool will also support translating mapping files such that the output can again use the highly compressed representation of debug info.

@JesusFreke
Copy link
Owner

I'm not sure I really understand how this new shared debug info works. Do you have an example dex file that can be shared, or more information about how that works?

If I recall correctly, other items that are shared between multiple entities have their size split up evenly among the entities. So we could split up the size of the shared debug info item between all of the methods that reference it.

@zerny
Copy link
Author

zerny commented Feb 4, 2022

D8 has been sharing the info since it was introduced. It happens mostly for release builds as there is a higher chance of the info being equal when only lines on throwing instructions remain and no locals. However it can be shared if equal in any D8/R8 build. Likely candidates are small methods such as default constructors. If they happen to be on the same line in the files then the debug info will coincide and be shared. This has a non-trivial size saving on larger apps.

The new feature in R8 is to use an identity encoding which maps each pc to a line of the same value. With that all methods can share the same debug info item so long as their max instruction pc is within the encoded range in the debug info item event stream (and the caveat about matching parameter count). The pc-based encoding works for the R8 compiler because it will also produce a mapping file which can be used to retrace to original lines/stacktrace.

Attached is an example zip using this encoding by R8. The source of the code is:

class SharedPcEncodedDebugInfo {

  public static void m1() {
    System.out.print("m");
    System.out.println("1");
  }

  public static void m2() {
    System.out.print("m");
    System.out.print("2");
    System.out.println();
  }

  public static void m3() {
    PrintStream out = System.out;
    out.println("m3");
  }

  public static void main(String[] args) {
    m1();
    m2();
    m3();
  }
}

Below is the disassemble output from R8. In it you can see that only two debug info items are created. One for 0 params which is shared by m1, m2 and m3. It encodes +1+1 pc-line delta events up to pc 0x13; and one for 1 param which is used by the main method.

$ ./tools/disasm.py shared-pc-debug-info.zip 
<snip>
Number of markers: 1
~~R8{"backend":"dex","compilation-mode":"release","has-checksums":false,"min-api":1,"pg-map-id":"c73f034","r8-mode":"full","sha-1":"engineering","version":"main"}
# Bytecode for
# Class: 'com.android.tools.r8.SharedPcEncodedDebugInfo'


#
# Method: '<init>':
# 
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.<init>()
registers: 1, inputs: 1, outputs: 1
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: InvokeDirect        { v0 } Ljava/lang/Object;-><init>()V
    1:   0x03: ReturnVoid          

#
# Method: 'm1':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.m1()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    1:   0x02: ConstString         v1, "m"
    2:   0x04: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
    3:   0x07: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    4:   0x09: ConstString         v1, "1"
    5:   0x0b: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->println(Ljava/lang/String;)V
    6:   0x0e: ReturnVoid          
PcBasedDebugInfo (params: 0, max-pc: 0x13)

#
# Method: 'm2':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.m2()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    1:   0x02: ConstString         v1, "m"
    2:   0x04: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
    3:   0x07: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    4:   0x09: ConstString         v1, "2"
    5:   0x0b: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
    6:   0x0e: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    7:   0x10: InvokeVirtual       { v0 } Ljava/io/PrintStream;->println()V
    8:   0x13: ReturnVoid          
PcBasedDebugInfo (params: 0, max-pc: 0x13)

#
# Method: 'm3':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.m3()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    1:   0x02: ConstString         v1, "m3"
    2:   0x04: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->println(Ljava/lang/String;)V
    3:   0x07: ReturnVoid          
PcBasedDebugInfo (params: 0, max-pc: 0x13)

#
# Method: 'main':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.main(java.lang.String[])
registers: 1, inputs: 1, outputs: 0
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: InvokeStatic        {  } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m1()V
    1:   0x03: InvokeStatic        {  } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m2()V
    2:   0x06: InvokeStatic        {  } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m3()V
    3:   0x09: ReturnVoid          
PcBasedDebugInfo (params: 1, max-pc: 0x09)

shared-pc-debug-info.zip

@zerny
Copy link
Author

zerny commented Feb 4, 2022

Regarding the "fix suggestion 2." in comment 1, the code to prune the event stream for the current method can be found for dexdump here: https://android-review.googlesource.com/c/platform/art/+/1967643

@zerny
Copy link
Author

zerny commented Feb 4, 2022

Regarding splitting the shared size, do you have a pointer to that being done for a similar component in the code base?

@zerny
Copy link
Author

zerny commented Feb 28, 2022

Let me know if the proposed changes in the pull request #844 are acceptable or if you have any comments regarding them.

Thanks, Ian

@zerny
Copy link
Author

zerny commented Mar 28, 2022

Just a friendly ping

@benjaminRomano
Copy link

Bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants