Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iOS/Catalyst build performance regression from net8.0 to net9.0 #110406

Open
jeremy-visionaid opened this issue Dec 2, 2024 · 19 comments
Open

iOS/Catalyst build performance regression from net8.0 to net9.0 #110406

jeremy-visionaid opened this issue Dec 2, 2024 · 19 comments
Assignees
Milestone

Comments

@jeremy-visionaid
Copy link

jeremy-visionaid commented Dec 2, 2024

Description

There's quite a noticeable decrease in performance building iOS and Catalyst apps:

dotnet build -bl:../net8.0-ios.binlog
Restore complete (0.5s)
  MauiTestApp net8.0-ios succeeded (12.6s) → bin/Debug/net8.0-ios/iossimulator-arm64/MauiTestApp.dll

Build succeeded in 13.3s
dotnet build -bl:../net9.0-ios.binlog
Restore complete (0.4s)
  MauiTestApp net9.0-ios succeeded (17.4s) → bin/Debug/net9.0-ios/iossimulator-arm64/MauiTestApp.dll

Build succeeded in 18.1s

The performance regresson is also present for maccatalyst

Steps to Reproduce

The problem is reproducible with a MAUI template project, changing the TFM between net8.0-ios and net9.0-ios

Link to public reproduction project repository

No response

Version with bug

9.0.10 SR1

Is this a regression from previous behavior?

Yes, this used to work in .NET MAUI

Last version that worked well

8.0.100 SR10

Affected platforms

iOS, macOS

Affected platform versions

No response

Did you find any workaround?

Roll back to dotnet 8

Relevant log output

net8.0-ios.binlog.txt
net9.0-ios.binlog.txt

@jeremy-visionaid
Copy link
Author

Also, the GitHub form won't allow uploading files with the extension ".binlog" which is less than ideal

@Ying-6
Copy link

Ying-6 commented Dec 2, 2024

This issue has been verified using latest version VSCode (9.0.10 & 9.0.0 & 8.0.100). Can unstably repro this issue on iOS and MacOS platforms. I verified it 5 times and repro once.

Unstable repro information:

aspnet@Yings-Mac-mini MauiApp1 % dotnet build -bl:../net8.0-maccatalyst.binlog
Restore complete (0.6s)
  MauiApp1 net8.0-maccatalyst succeeded (10.4s) → bin/Debug/net8.0-maccatalyst/maccatalyst-x64/MauiApp1.dll

Build succeeded in 11.5s
aspnet@Yings-Mac-mini MauiApp1 % dotnet build -bl:../net9.0-maccatalyst.binlog
Restore complete (0.6s)
  MauiApp1 net9.0-maccatalyst succeeded (11.6s) → bin/Debug/net9.0-maccatalyst/maccatalyst-x64/MauiApp1.dll

Build succeeded in 12.8s

@jeremy-visionaid
Copy link
Author

@Ying-6 Thanks for checking the repro. On my system (Mac Mini M2 Pro), the repro rate is 100%. My figures were for a clean rebuild (git clean -dfx), but the incremental build performance also seems pretty bad. Nearly 20 seconds just to build what's essentially a Hello World app on a relatively fast machine. It wasn't exactly great with net8.0, but net9.0 really negatively affects the build, run, test cycle.

@Ying-6
Copy link

Ying-6 commented Dec 3, 2024

I verified this issue again on an M1 Mac Mini, changed the framework from net8.0-ios to net9.0-ios, verified it 5 times and the reproduction rate was 100%.
Note: The first verification result was on Intel Mac Mini.

admin@hotia-surface MauiApp1 % dotnet build -bl:../net8.0-ios.binlog     
Restore complete (0.3s)
  MauiApp1 net8.0-ios succeeded (7.1s) → bin/Debug/net8.0-ios/iossimulator-arm64/MauiApp1.dll

Build succeeded in 7.7s
admin@hotia-surface MauiApp1 % dotnet build -bl:../net9.0-ios.binlog     
Restore complete (0.4s)
  MauiApp1 net9.0-ios succeeded (16.4s) → bin/Debug/net9.0-ios/iossimulator-arm64/MauiApp1.dll

Build succeeded in 17.0s

@jeremy-visionaid
Copy link
Author

Oh wow. Other members of my team use M1s, no wonder there have been some complaints! Though I'm also curious how/why your M1 builds faster than my M2 Pro... 7 or 8 seconds would be OK, but 18 is really pushing the bounds of acceptability.

@maonaoda
Copy link

maonaoda commented Dec 3, 2024

M2 ...

SkiaSharpBug % dotnet build -bl:../net9.0-ios.binlog
Restore complete (0.6s)
  SkiaSharpSample net9.0-ios succeeded (24.4s) → bin/Debug/net9.0-ios/iossimulator-arm64/SkiaSharpSample.dll

Build succeeded in 25.3s

SkiaSharpBug % dotnet build -bl:../net8.0-ios.binlog
Restore complete (0.3s)
  SkiaSharpSample net8.0-ios succeeded (16.8s) → bin/Debug/net8.0-ios/iossimulator-arm64/SkiaSharpSample.dll

Build succeeded in 17.3s

@tj-devel709
Copy link
Member

@jonathanpeppers do you have any thoughts on this one?

@jonathanpeppers
Copy link
Member

Maybe @rolfbjarne can comment, the difference appears to be the top 2 expensive tasks. Reviewing the .binlog above:

.NET 8:

Image

.NET 9:

Image

The <CompileNativeCode/> task just looks like it shell's out to xcrun clang, so probably not much control on how long that takes.

<AOTCompile/> calls mono-aot-cross, so this is from the dotnet/runtime repo.

@rolfbjarne
Copy link
Member

I've looked a bit into this, and this is what I've found so far:

  1. With the interpreter enabled, the compilation time is much slower because it takes much longer to compile the AOT-compiled aot-instances code. I'm guessing this is because there's twice as much native code:
$ ls -lah /Users/rolf/test/dotnet/aotperf/TestApp/obj/Debug/net*.0-ios/iossimulator-arm64/nativelibraries/aot-output/arm64/aot-instances.dll.s
.rw-r--r--@  88,225,773 rolf 2024-12-04 18:01:48.832720614 +0100 obj/Debug/net8.0-ios/iossimulator-arm64/nativelibraries/aot-output/arm64/aot-instances.dll.s
.rw-r--r--@ 151,735,792 rolf 2024-12-04 18:02:08.333604974 +0100 obj/Debug/net9.0-ios/iossimulator-arm64/nativelibraries/aot-output/arm64/aot-instances.dll.s
  • .NET 8: 88 mb
  • .NET 9: 151 mb

The AOT compilation is also slower, but that shows up better in the next finding:

  1. With the interpreter disabled (/p:UseInterpreter=false), we AOT much more code, and since the AOT compiler is also much slower, it has a much bigger effect:

Timings for only the AOT compilation:

  • .NET 8: 33s
  • .NET 9: 1m23s

The resulting app bundle is also a bit bigger, but nothing that would explain the almost 3x difference in AOT compilation:

$ du -hs bin/Debug/net*.0-ios/iossimulator-arm64/TestApp-net*.0.app
397M	bin/Debug/net8.0-ios/iossimulator-arm64/TestApp-net8.0.app
420M	bin/Debug/net9.0-ios/iossimulator-arm64/TestApp-net9.0.app

So it seems this is a runtime issue, so moving there.

@rolfbjarne rolfbjarne transferred this issue from dotnet/maui Dec 4, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 4, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 4, 2024
@rolfbjarne
Copy link
Member

This is my test project: TestApp-94f8396.zip

Run prepare.sh to build the project a few times for net8.0+net9.0 and with+without interpreter.

Then run reproduce.sh to show timings.

I get this:

$ ./prepare.sh
[...]
$ ./reproduce.sh
Compiling (with clang) the aot-instances.dll.s file generated by the AOT compiler for an interpreted build (.NET 8)

real	0m3.406s
user	0m2.976s
sys	0m0.268s

Compiling (with clang) the aot-instances.dll.s file generated by the AOT compiler for an interpreted build (.NET 9)

real	0m5.739s
user	0m5.104s
sys	0m0.335s

AOT-compiling all assemblies for a non-interpreted build (.NET 8)
Unable to compile method 'void System.Runtime.Serialization.SerializationGuard:<ThrowIfDeserializationInProgress>g__ThrowIfDeserializationInProgress|0_0 (System.Runtime.Serialization.SerializationInfo,string,int&)' due to: 'Method has zero rva'.
Unable to compile method 'void System.Runtime.Serialization.SerializationGuard:<ThrowIfDeserializationInProgress>g__ThrowIfDeserializationInProgress|0_0 (System.Runtime.Serialization.SerializationInfo,string,int&)' due to: 'Method has zero rva'.

real	0m33.807s
user	0m30.247s
sys	0m1.816s

AOT-compiling all assemblies for a non-interpreted build (.NET 9)

real	1m23.448s
user	1m19.085s
sys	0m2.176s

@rolfbjarne
Copy link
Member

I just tested passing compile-in-child to the aot compiler, and saw no significant difference in timing. Updated reproduce.sh script: https://gist.github.com/rolfbjarne/665a9d4e7058a6a6dbbfaea71a71f09a

AOT-compiling all assemblies for a non-interpreted build (.NET 9) options: compile-in-child,temp-path=tmppath,

real	1m23.375s
user	1m18.554s
sys	0m3.079s

AOT-compiling all assemblies for a non-interpreted build (.NET 9) options:

real	1m22.578s
user	1m18.602s
sys	0m2.126s

@steveisok steveisok added area-Codegen-AOT-mono os-ios Apple iOS and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 5, 2024
Copy link
Contributor

Tagging subscribers to 'os-ios': @vitek-karas, @kotlarmilos, @ivanpovazan, @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

@steveisok
Copy link
Member

Fwiw, I got similar times on my m2 w/ what @rolfbjarne provided.

@maonaoda
Copy link

maonaoda commented Dec 6, 2024

After adding this, the time was slightly reduced

		<!-- https://github.com/dotnet/sdk/issues/44932 -->
		<CompressionEnabled>false</CompressionEnabled>
SkiaSharpBug % dotnet build -bl:../net9.0-ios.binlog
Restore complete (0.5s)
  SkiaSharpSample net9.0-ios succeeded (20.1s) → bin/Debug/net9.0-ios/iossimulator-arm64/SkiaSharpSample.dll

Build succeeded in 20.8s

SkiaSharpBug % dotnet build -bl:../net8.0-ios.binlog
Restore complete (0.3s)
  SkiaSharpSample net8.0-ios succeeded (15.0s) → bin/Debug/net8.0-ios/iossimulator-arm64/SkiaSharpSample.dll

Build succeeded in 15.5s

@jeremy-visionaid
Copy link
Author

@maonaoda Thanks for the suggestion, but I assume that only affects Blazor/ASP projects, so wouldn't affect the issue reported here? I don't find it makes any measurable difference.

@miko1258
Copy link

miko1258 commented Dec 12, 2024

Struggling with the same issue on quite big project. Can't share binlog, however I will paste suspicious fragments from it:

.net8 MtouchLink=SdkOnly, Target _AOTCompile took 7:27:062

Mono Ahead of Time compiler - compiling assembly /Users/runner/work/1/s/.../obj/Release/net8.0-ios/ios-arm64/linked/aot-instances.dll
Adding 189341 dedup-ed methods.
Compiled: 204048/204067
Output file: 'obj/Release/net8.0-ios/ios-arm64/nativelibraries/aot-output/arm64/aot-instances.dll.s'.
Linking symbol: '_mono_aot_module_aot_instances_info'.
JIT time: 95567 ms, Generation time: 135788 ms, Assembly+Link time: 5 ms.

.net9 MtouchLink=SdkOnly, Target _AOTCompile took 49:06.155

Mono Ahead of Time compiler - compiling assembly /Users/runner/work/1/s/.../obj/Release/net9.0-ios/ios-arm64/linked/aot-instances.dll
AOTID D40720D9-E52C-B309-F30B-2764A328EE50
Adding 199729 dedup-ed methods.
Compiled: 217410/217439
Output file: 'obj/Release/net9.0-ios/ios-arm64/nativelibraries/aot-output/arm64/aot-instances.dll.s'.
Linking symbol: '_mono_aot_module_aot_instances_info'.
JIT time: 304743 ms, Generation time: 23051 ms, Assembly+Link time: 0 ms.

.net9 MtouchLink=Full, Target _AOTCompile took 1:48:586

Mono Ahead of Time compiler - compiling assembly /Users/runner/work/1/s/.../obj/Release/net9.0-ios/ios-arm64/linked/aot-instances.dll
AOTID B743CE1F-39E6-5F34-4B4F-D41CA695FBBC
Adding 36107 dedup-ed methods.
Compiled: 31572/31594
Output file: 'obj/Release/net9.0-ios/ios-arm64/nativelibraries/aot-output/arm64/aot-instances.dll.s'.
Linking symbol: '_mono_aot_module_aot_instances_info'.
JIT time: 13739 ms, Generation time: 4360 ms, Assembly+Link time: 0 ms.

Build time bloated a lot. Unfortunately the app is not ready to be fully linked, SdkOnly is the preferable option for us.
Interpreter is turned off.

@kotlarmilos kotlarmilos self-assigned this Dec 17, 2024
@kotlarmilos kotlarmilos removed the untriaged New issue has not been triaged by the area owner label Dec 17, 2024
@kotlarmilos kotlarmilos added this to the 10.0.0 milestone Dec 17, 2024
@kotlarmilos
Copy link
Member

According to the generated symbols, these generics bloated in .NET 9 within the repro app due to the specialization of types: diff-aot-instances-symbols.txt. The most impactful ones are System.Runtime.Intrinsics.ISimdVector<T,T> and System.Collections.Generic.SegmentedArrayBuilder<T>.

Disabling dedup optimization (_IsDedupEnabled) might help, as AOT compilation could be parallelized across multiple assemblies, but this would result in size increase.

With the interpreter enabled, the compilation time is much slower because it takes much longer to compile the AOT-compiled aot-instances code. I'm guessing this is because there's twice as much native code:

I suggest enabling dedup optimization in fullAOT mode only: xamarin/xamarin-macios#20687

.net8 MtouchLink=SdkOnly, Target _AOTCompile took 7:27:062
JIT time: 95567 ms, Generation time: 135788 ms, Assembly+Link time: 5 ms.

.net9 MtouchLink=SdkOnly, Target _AOTCompile took 49:06.155
JIT time: 304743 ms, Generation time: 23051 ms, Assembly+Link time: 0 ms.

It is interesting how JIT time increased while generation time decreased.

@rolfbjarne
Copy link
Member

I suggest enabling dedup optimization in fullAOT mode only: xamarin/xamarin-macios#20687

That got changed here:

xamarin/xamarin-macios#20936

with a follow up fix here:

xamarin/xamarin-macios#20945

and a final fix here:

xamarin/xamarin-macios#20953

so I'm a bit unsure as to exactly when it's safe/correct to enable/disable dedup at this point.

That said, I have a tentative PR up to disable dedup if interpreting every assembly: xamarin/xamarin-macios#21661.

@jeremy-visionaid
Copy link
Author

@rolfbjarne That's awesome, thanks. Maybe the conditions can be improved/tailored, but if it's useful to anyone, I find the following gives a pretty decent productivity bump as a workaround until the above patch gets released.

<_IsDedupEnabled Condition="'$(Configuration)' == 'Debug'">false</_IsDedupEnabled>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants