Add support for dehydrated runtime data structures #77884

MichalStrehovsky · 2022-11-04T09:05:47Z

This adds support for dehydrating pointer-rich data structures at compile time and rehydrating them at runtime.

NativeAOT compiler generates several pointer-heavy data structures (the worst offender being MethodTable). These data structures get emitted at compile time and used at runtime to e.g. support casting or virtual method dispatch.

We want to be able to generate structures that have pointers in them because e.g. virtual method dispatch needs to be fast and we don't want to be doing extra math to compute destination (just dereference a pointer in the data structure the compiler generated and call it).

But pointers are big, and there's extra metadata the OS needs in the executable file on top of that (2 bytes on Windows, 24 (!!) bytes on Linux/ELF).

This adds support for "dehydrating" the data structures with pointers at compile time (representing pointers more efficiently) and "rehydrating" them at runtime.

The rehydration is quite fast - I'm seeing 2.2 GB/s throughput on my machine. Hello world rehydrates under a millisecond.

The size savings are significant: 7+% on Windows, 30+% on Linux.

Depends on #77972 getting through (with the update from llvm-project repo).

Cc @dotnet/ilc-contrib

VSadov · 2022-11-07T03:18:49Z

So far I assumed we could have method table pointers emitted in the native code and relocations would take care of them when loaded. But how that would work with rehydration of the method tables?
For example if I have if (o.GetType() == typeof(string)) , would I have an address to string's method table baked into the native code? What happens when I new something? How the code knows where the rehydrated method table is located? Or is the rehydration done in-place? (like we reserve the space and store enough data to later rehydrate/patch up). How do we guarantee the data fits (i.e. compression rate is better than 1)?

I am obviously missing some important parts. I think some more explanation on how this works could be helpful.

MichalStrehovsky · 2022-11-07T04:27:53Z

The comments are somewhat scattered across the code, but the key part is:

                        // If the object node is getting dehydrated, emit it into a zero-initialized
                        // data section along with all its symbols.
                        // Dehydrated data will be emitted elsewhere.

So the executable now defines an extra region within the .bss section. Code points to this region. It is zero filled at startup by the OS and we "decompress" the compressed data structures into it during very early startup. At runtime there's no difference in efficiency of accessing this area - it's still all direct pointers - the only difference is that previously it would be a pointer to the "const data" section, and after this it would be a pointer to "uninitialized data" that we initialized ourselves.

VSadov · 2022-11-07T05:40:10Z

Code points to this region. It is zero filled at startup by the OS and we "decompress" the compressed data structures into it during very early startup.

I see. Makes sense.
Alternatively, it could be just writeable data filled with compressed representation + padding to reserve correct size, but then we would need to be sure compressed representation actually fits and the obj file would need to store the padding.

danmoseley · 2022-11-15T18:17:44Z

src/coreclr/tools/Common/Internal/Runtime/DehydratedData.cs

+            int command, payload;
+
+            byte[] buf = new byte[5];
+            Debug.Assert(Encode(1, 0, buf) == 1);


this looks like IfFailGo type macro style. Why not just throw onerror instead of error codes?

This is under #if false - it was my debugging helper to make sure I didn't mess up the bit packing (this file can be compiled on its own with the region enabled).

I'm still debating whether to check it in, delete it, or spend more time converting it to a unit test that is never going to fail because nobody will want to touch this anyway.

sbomer

Neat! LGTM as someone not super familiar with the code. :)

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/DehydratedDataNode.cs

src/coreclr/tools/Common/Internal/Runtime/DehydratedData.cs

LakshanF

Thanks for walking through the code and doing this work to reduce the size!

I took a look at the Encode and Decode routines on DehydratedDataCommand class and the testing there. The testing seems to cover the 1 byte boundary for EncodeShort well and I couldn't think of any missing cases. I also took look at the DehydratedDataNode & StartupCodeHelpers where this is used and while not fully able to follow the pointer manipulation for RehydrateData for the last 2 options, I assume the lookup table logic holds (and will blow up the program if it doesn't :-))

Hopefully the dependent update merge happens soon so the larger testing matrix can follow.

VSadov · 2022-11-16T01:44:22Z

src/coreclr/nativeaot/Common/src/Internal/Runtime/CompilerHelpers/StartupCodeHelpers.cs

+                switch (command)
+                {
+                    case DehydratedDataCommand.Copy:
+                        // TODO: can we do any kind of memcpy here?


Buffer.MemoryCopy ?

We can't call that from Test.CoreLib. I'm also a little bit worried - this code runs during very early startup - casting, typeof, newobj and many other things are not available. Buffer.MemoryCopy feels a bit higher up the stack that I'm not sure it would be safe to call.

Yes this runs very early. I am not sure if Buffer.MemoryCopy does anything complex, but since the data here is unmanaged, it would eventually just call something like InternalCalls.memmove.

Are these copied chunks long enough to involve memmove? If they are typically just 10-20 bytes, it may not get any faster.

VSadov

LGTM! Nice!

Co-authored-by: Sven Boemer <sbomer@gmail.com>

lambdageek · 2022-11-16T21:08:30Z

/cc @ivanpovazan we should rerun our sample after this is merged and see if we get more size reduction

eerhardt · 2022-11-18T21:24:20Z

I can confirm this change makes a major size reduction on Linux. Publishing NativeAOT a simple ASP.NET WebAPI app:

Before: 29.31 MB
After: 19.78 MB

This is a 32.5% size reduction!

This is a follow up to #77884. In the original pull request, all relocation targets went into a lookup table. This is not a very efficient way to represent rarely used relocs. In this update, I'm extending the dehydration format to allow representing relocations inline - instead of indirecting through the lookup table, the target immediately follows the instruction. I'm changing the emitter to emit this if there's less than 3 references to the reloc. This produces a ~0.5% size saving. It likely also speeds up the decoding at runtime since there's less cache thrashing. On a hello world, the lookup table originally had about 11k entries. With this change, the lookup table only has 1700 entries. If multiple relocations follow after each other, generate a single command with the payload specifying the number of subsequent relocations. This saves additional 0.1%.

MichalStrehovsky · 2022-11-21T00:03:23Z

I can confirm this change makes a major size reduction on Linux. Publishing NativeAOT a simple ASP.NET WebAPI app:

Nice! Thank you for measuring it! #78545 should bring another maybe ~1% saving, and I have a list of a couple more savings (not quite 32.5%, but maybe another 5%).

eerhardt · 2022-11-21T15:20:44Z

Just a quick clarification - I later discovered that these numbers also included #78198, which cut ~1 MB of that app size off. So the full 32.5% decrease was with both of these changes. #77884 contributed the vast majority of it though.

dotnet-issue-labeler bot added the area-NativeAOT-coreclr label Nov 4, 2022

ghost assigned MichalStrehovsky Nov 4, 2022

wip

204d46a

MichalStrehovsky force-pushed the rehydrate branch from 8d9a137 to 204d46a Compare November 10, 2022 11:11

wip

3d9c1fd

This was referenced Nov 11, 2022

Tracking issue for CI build timeouts #76454

Closed

Response status code does not indicate success: 503 (Service Unavailable) dotnet/arcade#11597

Closed

Fix tests

f1e2b3d

danmoseley reviewed Nov 15, 2022

View reviewed changes

sbomer approved these changes Nov 15, 2022

View reviewed changes

LakshanF approved these changes Nov 15, 2022

View reviewed changes

VSadov reviewed Nov 16, 2022

View reviewed changes

VSadov approved these changes Nov 16, 2022

View reviewed changes

Apply suggestions from code review

2a8111a

Co-authored-by: Sven Boemer <sbomer@gmail.com>

MichalStrehovsky added 2 commits November 18, 2022 14:36

Merge branch 'main' into rehydrate

7b63510

FB

24eb4a5

MichalStrehovsky merged commit 4a223c9 into dotnet:main Nov 18, 2022

MichalStrehovsky deleted the rehydrate branch November 18, 2022 07:31

MichalStrehovsky mentioned this pull request Nov 18, 2022

Allow inline relocations in dehydrated data #78545

Merged

MichalStrehovsky mentioned this pull request Nov 22, 2022

Place frozen objects into dehydrated section #78688

Merged

ghost locked as resolved and limited conversation to collaborators Dec 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dehydrated runtime data structures #77884

Add support for dehydrated runtime data structures #77884

MichalStrehovsky commented Nov 4, 2022 •

edited

Loading

VSadov commented Nov 7, 2022

MichalStrehovsky commented Nov 7, 2022

VSadov commented Nov 7, 2022

danmoseley Nov 15, 2022

MichalStrehovsky Nov 16, 2022

sbomer left a comment

LakshanF left a comment

VSadov Nov 16, 2022

MichalStrehovsky Nov 16, 2022

VSadov Nov 16, 2022

VSadov left a comment

lambdageek commented Nov 16, 2022

eerhardt commented Nov 18, 2022

MichalStrehovsky commented Nov 21, 2022

eerhardt commented Nov 21, 2022

Add support for dehydrated runtime data structures #77884

Add support for dehydrated runtime data structures #77884

Conversation

MichalStrehovsky commented Nov 4, 2022 • edited Loading

VSadov commented Nov 7, 2022

MichalStrehovsky commented Nov 7, 2022

VSadov commented Nov 7, 2022

danmoseley Nov 15, 2022

Choose a reason for hiding this comment

MichalStrehovsky Nov 16, 2022

Choose a reason for hiding this comment

sbomer left a comment

Choose a reason for hiding this comment

LakshanF left a comment

Choose a reason for hiding this comment

VSadov Nov 16, 2022

Choose a reason for hiding this comment

MichalStrehovsky Nov 16, 2022

Choose a reason for hiding this comment

VSadov Nov 16, 2022

Choose a reason for hiding this comment

VSadov left a comment

Choose a reason for hiding this comment

lambdageek commented Nov 16, 2022

eerhardt commented Nov 18, 2022

MichalStrehovsky commented Nov 21, 2022

eerhardt commented Nov 21, 2022

MichalStrehovsky commented Nov 4, 2022 •

edited

Loading