Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using binary logs to significantly increase CodeQL analysis performance for C# #16346

Open
jaredpar opened this issue Apr 26, 2024 · 6 comments
Assignees
Labels
C# question Further information is requested

Comments

@jaredpar
Copy link

The nature of the C# driver means that it needs to see all csc invocations for a build. Today that is achieved by disabling shared compilation during build. That unfortunately creates a significant performance penalty when building C# code. The larger the repo the more significant the penalty. The rule of thumb is that for solutions of at least medium size this will cause somewhere from a 3-4X build slow down. For larger repositories this can be even bigger.

Consider as a concrete example the dotnet/runtime repository. When their CodeQL pipeline runs and shared compilation is disabled it increases their build time by 1 hour and 38 minutes. Or alternatively building the dotnet/roslyn repository. Building that locally results in a ~4X perf penalty when shared compilation is disabled.

I understand from previous conversations that this is done because the extractor needs to see every C# compiler call in the build and potentially process it. An alternative approach to doing this is to have the build produce a binary log. That is a non-invasive change to builds and there is ample tooling for reading C# compiler calls from those once the build completes. The code for reading compiler calls is at the core very straight forward:

using var fileStream = new FileStream(binlogPath, FileMode.Open, FileAccess.Read, FileShare.Read);
List<CompilerCall> compilerCalls = BinaryLogUtil.ReadAllCompilerCalls(fileStream);

I have a sample here of integrating this approach into the existing extractor tool. On my machine I was able to use this sample to run over a full build of roslyn without any obvious issues. And most importantly, no changes to how I built roslyn 😄

Using libraries like this have other advantages because you could also offload the work of creating CSharpCompilation objects. There is code in that library that will very faithfully recreate a CSharpCompilation for every CompilerCall instance. Furthermore, think there are other optimization opportunities for the extractor once you are running the analysis in bulk: caching MetadataReferences, caching SourceText, etc ...

More than happy to chat about this, the different ways binary logs can be used, moving analysis off machine, etc ... My big end goal is to get back to us using shared compilation so we can keep our build times down.

@jaredpar jaredpar added the question Further information is requested label Apr 26, 2024
@jaredpar jaredpar changed the title Using binary logs to significantly increase CodeQL analysis on C# code Using binary logs to significantly increase CodeQL analysis performance for C# Apr 26, 2024
@rainersigwald
Copy link

The MSBuild team is also happy to support here.

@sidshank sidshank added the C# label Apr 30, 2024
@tamasvajk tamasvajk self-assigned this May 1, 2024
@tamasvajk
Copy link
Contributor

Thank you for putting together the sample. I think your suggestion makes sense, I'm exploring it a bit further to understand what else would need to be changed.

One point that is not immediately clear to me is how to get hold of the syntax trees produced by source generators. Is there a Roslyn API that would return these trees?

@jaredpar
Copy link
Author

jaredpar commented May 1, 2024

One point that is not immediately clear to me is how to get hold of the syntax trees produced by source generators. Is there a Roslyn API that would return these trees?

The default experience is that generated files are embedded into the PDB produced by the build. There is no official API for this but it's in a well known part of the PDB. The code for reading that is here.

If the goal is to just read these files for a given compilation on demand then it's a fairly simple change to expose an API directly that lets you access them. It would be something along the lines of

List<SyntaxTree> ReadAllGeneratedFiles(CompilerCall compilerCall);

There is one caveat for that approach though: it only works if the build is using portable PDBs. Native PDBs don't have a facility for that. Using native PDbs with source generators though is decently rare. Seems reasonable that for a CodeQL run you could ask that portable PDBs be generated.

There is code in that library that will very faithfully recreate a CSharpCompilation for every CompilerCall instance.

Curious if you're asking about generated files in this context: essentially how can CSharpCompilation be faithfully generated if there are generators involved?

If so the underlying library takes a couple of different approaches:

  1. It just loads the analyzers / generators into memory and runs them exactly as the build did. In that case the generated files are produced when they're asked for.
  2. It just reads the generated files from the PDB and puts them into the correct place in the Compilation. In this mode no analyzer or generator is run.

@tamasvajk
Copy link
Contributor

Thanks for the explanation.

If the goal is to just read these files for a given compilation on demand then it's a fairly simple change to expose an API directly that lets you access them.

Yes, this is the goal. BTW, what is the reason that Roslyn doesn't expose an API on CSharpCompilation to run these source generators and access the trees?

Curious if you're asking about generated files in this context: ...

We're injecting the following properties into the build command:

  • /p:UseSharedCompilation=false
  • /p:MvcBuildViews=true
  • /p:EmitCompilerGeneratedFiles=true

We're adding MvcBuildViews=true to precompile ASP.NET MVC views. Our current solution intercepts csc calls, and as a result we also inspect these precompilation calls. I'm not sure what information we'd have in the binlog about these. (But this is probably not critical to us.) We inject EmitCompilerGeneratedFiles=true to be able to inspect source generator produced files. We're mostly interested in the files generated from .cshtml views. This is the main reason I was asking about source generators.

@jaredpar
Copy link
Author

jaredpar commented May 2, 2024

Yes, this is the goal.

Okay will add an API for that

jaredpar/complog#125

BTW, what is the reason that Roslyn doesn't expose an API on CSharpCompilation to run these source generators and access the trees?

The underlying APIs for running the generators are available. Can see how compiler logs approaches that problem here. That provides the Compilation after generators execute and from there you can see the generated SyntaxTree values. They are always added to the end of the SyntaxTree list so they can be seen by doing effectively the following:

Compilation compilation = ...;
var driver = CreateGeneratorDriver();
driver.RunGeneratorsAndUpdateCompilation(compilation, out var compilation2, out var diagnostics, cancellationToken);
var generatedSyntaxTrees = compilation2.SyntaxTrees.Skip(compilation.SyntaxTrees.Count());

That does require the host to write a bit of code though to load the analyzers, run the driver, etc ... The reason there isn't a higher level API that just takes care of all of this for you is that it's hard to find a single solution that fits all hosts. In particular analyzer loading is very complex and as a result different hosts tend to make different decisions on how to approach it. Consider that even within the Roslyn compiler analyzers are loaded with three different strategies depending on the scenario.

The reason the NuPkg I used in the sample has easy APIs for creating Compilations, running generators, etc ... is that it's geared for a specific usage: analyzing compilations. That makes it easier to have an opinionated API on how analyzers should be loaded. It also provides the option of just not running them at all and throwing the generated source directly into the compilation.

We're adding MvcBuildViews=true to precompile ASP.NET MVC views. Our current solution intercepts csc calls, and as a result we also inspect these precompilation calls

I'm not as familiar with this scenario. I attempted to recreate it with a .NET Framework ASPNet application. Setting that property I do see it go into a MvcBuildViews target but that runs the aspnet_compiler vs. csc.

image

Are you all intercepting that call? Or is there a subtly I missed where this also generates a separate csc call. There is a csc invocation but I see that with our without the -p:MvcBuildViews

We're mostly interested in the files generated from .cshtml views. This is the main reason I was asking about source generators.

Yep those are just a normal source generator at the end of the day and should behave like other generators in terms of finding the files.

@tamasvajk
Copy link
Contributor

I'm not that familiar with aspnet_compiler.exe either, but I'm under the impression that it's a thin wrapper on top of some codedom based code generator, and it uses csc to compile the generated files. I think aspnet_compiler.exe is calling bin\roslyn\csc.exe in the project directory. These are the arguments that I see for a specific call:

# Arguments to Roslyn: /shared /keepalive:10 /noconfig /fullpaths /warnaserror-
/t:library /utf8output /nostdlib+ /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\mscorlib.dll" /R:"C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Runtime\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Runtime.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Configuration.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.ServiceModel.Activation.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Xml.Linq.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\12594037\00c8d184_3aaece01\Antlr3.Runtime.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Web.Services.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Activities.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\25149f25\124d5d3d_5b7fda01\WebApplication2.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\Microsoft.CSharp.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\46885272\00b3fcae_3a18cf01\WebGrease.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Web.Extensions.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\167affe1\00a1802f_db0bd501\AspNet.ScriptManager.jQuery.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Web.Entity.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\c1742244\00a6d731_3527cf01\System.Web.Optimization.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.ComponentModel.DataAnnotations.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_Web_sxfooglg.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\9af866f8\008c3534_9ff8d401\Newtonsoft.Json.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\544178f8\0096b2ab_eaaace01\Microsoft.ScriptManager.MSAjax.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Data.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_global.asax.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\251878ba\0077a8b1_eaaace01\Microsoft.ScriptManager.WebForms.dll" /R:"C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.VisualStudio.Web.PageInspector.Loader\v4.0_1.0.0.0__b03f5f7f11d50a3a\Microsoft.VisualStudio.Web.PageInspector.Loader.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.WorkflowServices.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\6c53e78c\003be294_1a72d801\Microsoft.Web.Infrastructure.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\51b5a119\00f96fa1_2a45d401\Microsoft.CodeDom.Providers.DotNetCompilerPlatform.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.EnterpriseServices.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Drawing.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Runtime.Serialization.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Xml.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.ServiceModel.Web.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\73ffb031\00d1c97e_3d8dce01\Microsoft.AspNet.FriendlyUrls.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Web.ApplicationServices.dll" /R:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\assembly\dl3\aeb29fab\00a6d731_3527cf01\Microsoft.AspNet.Web.Optimization.WebForms.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.ServiceModel.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.IdentityModel.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Data.DataSetExtensions.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.ServiceModel.Activities.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Core.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Web.DynamicData.dll" /R:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.8\System.Web.dll" /out:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_Web_fip03p34.dll" /debug- /optimize+ /win32res:"C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\fip03p34.res" /warnaserror- /w:4 /nowarn:1659;1699;1701;612;618 /langversion:default /nowarn:1659;1699;1701  "C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_Web_fip03p34.0.cs" "C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_Web_fip03p34.1.cs" "C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_Web_fip03p34.2.cs" "C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_Web_fip03p34.3.cs" "C:\Users\TAMASVAJK\AppData\Local\Temp\Temporary ASP.NET Files\root\a76ce4fc\25014e45\App_Web_fip03p34.4.cs"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C# question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants