Skip to content

Latest commit

 

History

History
550 lines (437 loc) · 24.7 KB

diffs.md

File metadata and controls

550 lines (437 loc) · 24.7 KB

Assembly diffs

A valuable technique when developing the JIT is generating assembly code output with a baseline compiler as well as with a compiler that has changes -- the "diff compiler" -- and examining the generated code differences between the two. The tools described here automate that process.

Assumptions

This guide assumes that you have built CoreCLR. See the CoreCLR GitHub repo for directions on building.

Assembly diff tools

  • jit-diff - Driver tool that implements a common developer work flow. Allows specifying a configuration file to store common defaults, and implements a directory scheme for "installing" tools for use later.
  • jit-dasm - Produce *.dasm from a compiler for an assembly or set of assemblies via prejitting. Used by jit-diff to do its work.
  • jit-dasm-pmi - like jit-dasm but allows you to look at jitted code instead of prejitted code.
  • jit-analyze - Compare and analyze *.dasm files from baseline/diff. Produces a report on diffs, total size regression/improvement, and size regression/improvement by file and method.
  • jit-tp-analyze - Compare trace files with per-function instruction counts from baseline/diff. See jit-tp-analyze.md.

Dependencies

  • dotnet - The 2.1 dotnet CLI is used to build the tools. It is also used to determine the current processor architecture "RID". Install it from here.
  • git - The jit-analyze tool uses git diff to check for textual differences since this is consistent across platforms, and fast. It is also used to determine if the current directory is a dotnet/coreclr repo root, to provide for default arguments.

Build the tools

The easy way

A bootstrap.{cmd,sh} script is provided in the jitutils root directory which will validate all tool dependencies, build the repo, publish the resulting binaries to a common "bin" directory, and place them on the path. This can be run to set up the developer in one shot.

The flexible way

To build jitutils not using the bootstrap script:

  • Run dotnet restore in the root (once).
  • Run build.{cmd,sh}.

By default the script just builds the tools and does not publish them in a separate directory. To publish the utilities add the '-p' flag which publishes each utility to the ./bin directory in the root of the repo. Additionally, to download the default set of framework assemblies that can be used for generating asm diffs, add '-f'.

 $ ./build.sh -h

build.sh [-b <BUILD TYPE>] [-f] [-h] [-p] [-t <TARGET>]

    -b <BUILD TYPE> : Build type, can be Debug or Release.
    -h              : Show this message.
    -p              : Publish utilities.

50,000 foot view

By default, assembly code output (aka, "dasm") is generated by running crossgen with a specified JIT to compile a specified set of assemblies, by setting the following JIT environment variables to generate the output:

  • DOTNET_JitDisasm
  • DOTNET_JitUnwindDump
  • DOTNET_JitEHDump
  • DOTNET_JitDisasmDiffable
  • optionally, DOTNET_JitGCDump

Generating "diffs" involves generating assembly code output for both a baseline and a "diff" JIT, and comparing the results.

Passing the --pmi option to jit-diffs will instead use reflection to jit each method in the assembly, setting these options:

  • DOTNET_JitDisasm
  • DOTNET_JitUnwindDump
  • DOTNET_JitEHDump
  • DOTNET_JitDisasmDiffable
  • DOTNET_JitDisasmAssemblies
  • optionally, DOTNET_JitGCDump

What can jit-diff produce asm diffs for?

jit-diff has built-in knowledge of how to generate asm diffs for the following sets of assemblies:

  • System.Private.CoreLib.dll. Use -c or --corelib.
  • A set of about 130 .NET Core frameworks assemblies (including System.Private.CoreLib.dll). Use -f or --frameworks.
  • The entire dotnet/coreclr test tree. Use --tests.
  • Of the test tree, only the benchmarks. Use --benchmarks.
  • An arbitrary assembly via --assembly.

--corelib is the default.

Only one of --corelib or --frameworks can be specified.

Only one of --tests or --benchmarks can be specified.

If --tests or --benchmarks is specified, you may also specify --test_root so the tool knows where to find the test tree you wish to use, or use the computed default for --test_root.

To generate diffs for everything jit-diff knows how to diff, use both --frameworks and --tests.

Jitting vs Prejitting

jit-dasm and jit-dasm-pmi provide complementary views of the impact of a jit change on generated code. For a robust assessment of the impact of a jit change you may need to run diffs both ways.

The jit will produce different code when jitting than when prejitting and will produce code for somewhat different sets of methods.

Prejitting

  • Prejitted code interacts with the runtime differently, and has restrictions on cross assembly inlining.
  • Generic methods and methods defined in generic types may not get prejitted.
  • Methods using SIMD (Vector<T>) and methods using hardware intrinsics will not be prejitted.
  • Methods using generic types or methods from other assemblies may not get prejitted.

Jitting

  • Methods that are prejitted generally won't be jitted.
  • jit-dasm-pmi uses heuristics to try and cover generic types and methods that give a basic but simplistic view of possible instantations.

Preparing to generate diffs

First, you must build the dotnet/coreclr repo to produce a crossgen and a JIT (e.g., clrjit.dll). You also need to have a baseline crossgen / JIT available. One way to do this is to have a separate clone of the dotnet/coreclr repro that is identical to your working / "diff" clone, except that it has no changes made to it. For example, you might have these two directories:

  • c:\coreclr - main development directory; clone of dotnet/coreclr. This is where you work.
  • c:\coreclr_base - a "baseline" clone of dotnet/coreclr that matches the source of your main development directory except for your experimental changes to the JIT.

Build both of these directories with the same architecture and build flavor (e.g., x64 checked), producing the compilers to compare.

Also, if you want to generate diffs using the assemblies in the test tree, build the tests in the "diff" tree (e.g., in the example above, c:\coreclr).

Ensure the jitutils tools are built, and jit-diff, jit-analyze, and jit-dasm are on the path.

jit-diff command line

jit-diff has three top-level commands, as shown by the help message:

    $ jit-diff --help
    usage: jit-diff <command> [<args>]

        diff         Run asm diff.
        list         List defaults and available tools in config.json.
        install      Install tool in config.json.
        uninstall    Uninstall tool from config.json.

The "jit-diff diff" command has this help message:

    usage: jit-diff diff [-b [arg]] [-d [arg]] [--crossgen <arg>] [-o <arg>] [--noanalyze] [-s]
                    [-t <arg>] [-c] [-f] [--benchmarks] [--tests] [--gcinfo] [-v] [--core_root <arg>]
                    [--test_root <arg>] [--base_root <arg>] [--diff_root <arg>] [--arch <arg>]
                    [--build <arg>] [--altjit <arg>]

        -b, --base [arg]      The base compiler directory or tag. Will use crossgen or clrjit from this
                              directory.
        -d, --diff [arg]      The diff compiler directory or tag. Will use crossgen or clrjit from this
                              directory.
        --crossgen <arg>      The crossgen or crossgen2 compiler exe. When this is specified,
                              will use clrjit from the --base and --diff directories with this crossgen.
        -o, --output <arg>    The output path.
        --noanalyze           Do not analyze resulting base, diff dasm directories. (By default, the
                              directories are analyzed for diffs.)
        -s, --sequential      Run sequentially; don't do parallel compiles.
        -t, --tag <arg>       Name of root in output directory. Allows for many sets of output.
        -c, --corelib         Diff System.Private.CoreLib.dll.
        -f, --frameworks      Diff frameworks.
        --benchmarks          Diff core benchmarks.
        --tests               Diff all tests.
        --gcinfo              Add GC info to the disasm output.
        -v, --verbose         Enable verbose output.
        --core_root <arg>     Path to test CORE_ROOT.
        --test_root <arg>     Path to test tree. Use with --benchmarks or --tests.
        --base_root <arg>     Path to root of base dotnet/coreclr repo.
        --diff_root <arg>     Path to root of diff dotnet/coreclr repo.
        --arch <arg>          Architecture to diff (x86, x64).
        --build <arg>         Build flavor to diff (Checked, Debug).
        --altjit <arg>        If set, the name of the altjit to use (e.g., clrjit_win_arm64_x64.dll).
        --pmi                 Generate diffs via jitting instead of running crossgen
        --assembly <arg>      Look at diffs for methods in the specified assembly

    Examples:

      jit-diff diff --output c:\diffs --corelib --core_root c:\coreclr\bin\tests\windows.x64.Release\Tests\Core_Root --base c:\coreclr_base\bin\Product
    \windows.x64.Checked --diff c:\coreclr\bin\Product\windows.x86.Checked
          Generate diffs of System.Private.CoreLib.dll by specifying baseline and
          diff compiler directories explicitly.

      jit-diff diff --output c:\diffs --base c:\coreclr_base\bin\Product\windows.x64.Checked --diff
          If run within the c:\coreclr git clone of dotnet/coreclr, does the same
          as the prevous example, using defaults.

      jit-diff diff --output c:\diffs --base --base_root c:\coreclr_base --diff
          Does the same as the prevous example, using -base_root to find the base
          directory (if run from c:\coreclr tree).

      jit-diff diff --base --diff
          Does the same as the prevous example (if run from c:\coreclr tree), but uses
          default c:\coreclr\bin\diffs output directory, and `base_root` must be specified
          in the config.json file in the directory pointed to by the JIT_UTILS_ROOT
          environment variable.

      jit-diff diff --diff
          Only generates asm using the diff JIT -- does not generate asm from a baseline compiler --
          using all computed defaults.

      jit-diff diff --diff --arch x86
          Generate diffs, but for x86, even if there is an x64 compiler available.

      jit-diff diff --diff --build Debug
          Generate diffs, but using a Debug build, even if there is a Checked build available.

The "jit-diff list" command has this help message:

    $ jit-diff list --help
    usage: jit-diff list [-v]

        -v, --verbose    Enable verbose output

The "jit-diff install" command has this help message:

    $ jit-diff install --help
    usage: jit-diff install [-j <arg>] [-n <arg>] [-b <arg>] [-v]

        -j, --job <arg>          Name of the job.
        -n, --number <arg>       Job number.
        -b, --branch <arg>       Name of branch.
        -v, --verbose            Enable verbose output

The "jit-diff uninstall" command has this help message:

    $ jit-diff uninstall --help
    usage: jit-diff uninstall [-t <arg>]

        -t, --tag <arg>    Name of tool tag in config file.

Examples of generating diffs

The tool needs to know:

  • Which base and diff JIT and crossgen or corerun to use.
  • Which assemblies to generate dasm for.
  • Where to put the generated dasm.
  • Whether or not you want diffs for prejitted code (default) or jitted code (via --pmi)

These can all be specified explicitly. For example:

    c:\coreclr> jit-diff diff --output c:\diffs --corelib --core_root c:\coreclr\bin\tests\windows.x64.release\Tests\Core_Root --base e:\coreclr2\bin\Product\windows.x64.checked --diff c:\coreclr\bin\Product\windows.x64.checked --crossgen c:\coreclr\bin\Product\windows.x64.release

Explanation:

  1. --output c:\diffs -- specify the root directory where diffs will be placed.
  2. --corelib -- generate diffs using System.Private.CoreLib.dll.
  3. --core_root -- specify the CORE_ROOT directory (the "test layout"). Used to specify to crossgen where the platform assemblies are. Also, used as the directory where framework assemblies such as System.Runtime.dll can be found for the purpose of using them to generate dasm.
  4. --base -- specify the directory in which a baseline JIT can be found.
  5. --diff -- specify the directory in which a diff (experimental) JIT can be found.
  6. --crossgen -- specify the crossgen.exe or crossgen2.exe to use. Note that this must match the build flavor of --core_root.

You create the CORE_ROOT directory "layout" by running the runtest script. On Windows, this can be created by running the following in the dotnet/coreclr repo root.

    c:\coreclr> tests\runtest.cmd

or

    c:\coreclr> tests\runtest.cmd GenerateLayoutOnly

On non-Windows, consult the test instructions here. Note that you can pass --testDir=NONE to runtest.sh to get the same effect as passing GenerateLayoutOnly to runtest.cmd on Windows.

The above jit-diff command will generate both baseline and diff asm code into the specified output directory. It will automatically create a unique named subdirectory (if the --tag switch isn't specified to override the default name), and within that subdirectory will be "base" and "diff" directories, containing the diffs. The default subdirectory name looks like dasmset_12, with a unique number for every run of jit-diff.

jit-analyze will be run to compare the generated output if both baseline and diff asm are generated.

You can also run a recursive textual comparison tool like windiff on Windows to visually compare the diffs, e.g.:

    c:\coreclr> windiff c:\diffs\dasmset_12\base c:\diffs\dasmset_12\diff

Simplified jit-diff usage: defaults

As seen above, specifying all required information can be quite verbose. jit-diff can automatically determine most of the arguments using computed defaults. The above diff can be accomplished using simply:

    c:\coreclr> jit-diff diff --diff --base --base_root e:\coreclr2

You minimally specify:

  • --diff with no argument to request the diff compiler be used to generate asm (to a "diff" directory).
  • --base with no argument to request the baseline compiler be used to generate asm (to a "base" directory).
  • --base_root to specify the dotnet/coreclr repo root that contains the baseline build.

The defaults are:

  • If jit-diff is invoked with the current directory within the dotnet/coreclr repo, then the root of this repo serves to find the diff compiler and CORE_ROOT directory. (Note that we have no reasonable default for determining what the baseline toolset or repo is. This is specified with the --base_root argument or by providing a full path with the --base argument.)
  • The default output directory is <repo_root>\bin\diffs (in this case, c:\coreclr\bin\diffs).
  • The default architecture is x64. If this isn't found, x86 is tried.
  • The default diff and baseline JIT build flavor is checked. If this isn't found, debug is tried. (Both baseline and diff must be the same flavor.)
  • By default, diffs are done using System.Private.CoreLib.dll. (That is, --corelib is the default.)
  • By default, a release build is used for --core_root and --test_root. If not available, it falls back to checked or debug (but gives a warning that release is preferred).

To instead do diffs over the framework assemblies (not just System.Private.CoreLib.dll), using an x86 debug build, run:

    c:\coreclr> jit-diff diff --base --diff --frameworks --arch x86 --build debug --base_root e:\coreclr2

To simplify this more, create a dotnet/coreclr repo clone that you will always use for baselines. Create a configuration file that specifies your baseline root directory, specifying a base_root default. See the document configuring defaults for details.

With a --base_root default in the config.json file, you can simply run:

    c:\coreclr> jit-diff diff --base --diff

to generate diffs using all the defaults.

If you only want to generate asm from the diff compiler, omit the --base argument, e.g.:

    c:\coreclr> jit-diff diff --diff

Similarly, only pass --base to generate baselines (although in this case you also must specify --base_root so the tool can find the baseline).

The following command-line argument are used to adjust the defaults, such as specifying x86 diffs instead of x64 diffs. They are not otherwise required.

  • --base_root
  • --diff_root
  • --arch
  • --build

Simplified jit-diff usage with PMI

The various simplified jit-diff invocations above can also be used to invoke diffs for jitted code by adding --pmi as an additional argument. For example:

Analyze difference in jit codegen for methods in corelib:

    c:\coreclr> jit-diff diff --pmi --diff --base --base_root e:\coreclr2

Or, disassemble the jitted code for all the methods in mytest.exe:

    c:\coreclr> jit-diff diff --pmi --diff --assembly mytest.exe

Note this latter run should produce similar disassembly as running mytest.exe via corerun (with appropriate DOTNET_ flags set) for the methods that are executed during the run. But jit-diff diff -pmi will attempt to show code generated for all methods, executed or not. And it also works on libraries which are not directly executable on their own. So PMI offers a potentially faster and more comprehensive view of jit codegen.

Using tags

jit-diff takes an optional '--tag' command-line argument. This tag can be used to label different directories of *.dasm in the output directory so multiple runs can be done. This supports a scenario like the following:

  • Build base CoreCLR
  • Produce baseline diffs by invoking the tool with '--base'
  • Make changes to CoreCLR JIT subdirectory to fix a bug.
  • Produce tagged output by invoking jit-diff --diff ... --tag bugfix1
  • Make changes to CoreCLR JIT subdirectory to address review feedback/throughput issue.
  • Produce tagged output by invoking jit-diff --diff ... --tag reviewed1
  • Address more review feedback in CoreCLR JIT.
  • Produce tagged output by invoking jit-diff --diff ... --tag reviewed_final
  • ...

Analyzing diffs: jit-analyze

The jitutils suite includes the jit-analyze tool for analyzing diffs produced by the jit-diff/jit-dasm utilities. It is automatically run, by default, when jit-diff diff --base --diff is used.

jit-analyze cracks the generated baseline and diff *.dasm files and computes the code size difference between the two based on the output produced by the JIT. This data is keyed by file and method name - for instance two files with different names will not be compared even if passed as the base and diff since the tool is looking to identify files missing from the base dataset versus the diff dataset.

For the simplest case just point the tool at a base and diff directory produce by jit-diff and it will summarize code size differences across the whole diff. This is what the jit-diff command lines in the previous section do.

On a significant set of diffs it will produce output like the following:

$ jit-analyze --base ~/Work/dotnet/output/base --diff ~/Work/dotnet/output/diff --recursive
Found files with textual diffs.

Summary:
(Note: Lower is better)

Total bytes of diff: -4124
    diff is an improvement.

Top file regressions by size (bytes):
    193 : Microsoft.CodeAnalysis.dasm
    154 : System.Dynamic.Runtime.dasm
    60 : System.IO.Compression.dasm
    43 : System.Net.Security.dasm
    43 : System.Xml.ReaderWriter.dasm

Top file improvements by size (bytes):
    -1804 : mscorlib.dasm
    -1532 : Microsoft.CodeAnalysis.CSharp.dasm
    -726 : System.Xml.XmlDocument.dasm
    -284 : System.Linq.Expressions.dasm
    -239 : System.Net.Http.dasm

21 total files with size differences.

Top method regessions by size (bytes):
    328 : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.DocumentationCommentXmlTokens:.cctor()
    266 : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.MethodTypeInferrer:Fix(int,byref):bool:this
    194 : mscorlib.dasm - System.DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
    187 : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseModifiers(ref):this
    163 : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceAssemblySymbol:DecodeWellKnownAttribute(byref,int,bool):this

Top method improvements by size (bytes):
    -160 : System.Xml.XmlDocument.dasm - System.Xml.XmlTextWriter:AutoComplete(int):this
    -124 : System.Xml.XmlDocument.dasm - System.Xml.XmlTextWriter:WriteEndStartTag(bool):this
    -110 : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.MemberSemanticModel:GetEnclosingBinder(ref,int):ref:this
    -95 : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.CSharpDataFlowAnalysis:AnalyzeReadWrite():this
    -85 : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseForStatement():ref:this

3762 total methods with size differences.

If --tsv <file_name> or --json <file_name> is passed, all the diff data extracted and analyzed will be written out to the specified file for further analysis.

jit-analyze command line help:

    $ jit-analyze --help
    usage: jit-analyze [-b <arg>] [-d <arg>] [-r] [-c <arg>] [-w]
                       [--json <arg>] [--tsv <arg>]

        -b, --base <arg>     Base file or directory.
        -d, --diff <arg>     Diff file or directory.
        -r, --recursive      Search directories recursively.
        -c, --count <arg>    Count of files and methods (at most) to output
                             in the summary. (count) improvements and
                             (count) regressions of each will be included.
                             (default 5)
        -w, --warn           Generate warning output for files/methods that
                             only exists in one dataset or the other (only
                             in base or only in diff).
        --json <arg>         Dump analysis data to specified file in JSON
                             format.
        --tsv <arg>          Dump analysis data to specified file in
                             tab-separated format.

Configuring defaults

See the document configuring defaults for details on setting up a set of default configurations.

jit-dasm

This is a general tool to produce assembly output via prejitting for compiled MSIL assemblies.

Sample help command line:

    $ jit-dasm --help
    usage: jit-dasm [--altjit <arg>] [-c <arg>] [-j <arg>] [-o <arg>]
                [-f <arg>] [--gcinfo] [-v] [-r] [-p <arg>...] [--]
                <assembly>...

    --altjit <arg>             If set, the name of the altjit to use
                               (e.g., clrjit_win_arm64_x64.dll).
    -c, --crossgen <arg>       The crossgen or crossgen2 compiler exe.
    -j, --jit <arg>            The full path to the jit library.
    -o, --output <arg>         The output path.
    -f, --file <arg>           Name of file to take list of assemblies
                               from. Both a file and assembly list can
                               be used.
    --gcinfo                   Add GC info to the disasm output.
    -v, --verbose              Enable verbose output.
    -r, --recursive            Scan directories recursively.
    -p, --platform <arg>...    Path to platform assemblies
    <assembly>...              The list of assemblies or directories to
                               scan for assemblies.

jit-dasm-pmi

This is a general tool to produce jitted assembly output for compiled MSIL assemblies.

Sample help command line:

    $ jit-dasm-pmi --help
    usage: jit-dasm-pmi [--altjit <arg>] [-c <arg>] [-j <arg>] [-o <arg>]
                [-f <arg>] [--gcinfo] [-v] [-r] [-p <arg>...] [--]
                <assembly>...

    --altjit <arg>             If set, the name of the altjit to use
                               (e.g., protononjit.dll).
    -c, --corerun <arg>        The corerun driver exe.
    -j, --jit <arg>            The full path to the jit library.
    -o, --output <arg>         The output path.
    -f, --file <arg>           Name of file to take list of assemblies
                               from. Both a file and assembly list can
                               be used.
    --gcinfo                   Add GC info to the disasm output.
    -v, --verbose              Enable verbose output.
    -r, --recursive            Scan directories recursively.
    -p, --platform <arg>...    Path to platform assemblies
    <assembly>...              The list of assemblies or directories to
                               scan for assemblies.