Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singlefile: enabling compression for managed assemblies. #50817

Merged
merged 9 commits into from
Apr 15, 2021

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Apr 6, 2021

The second part of singlefile compression change which enables compression/decompression of managed assemblies.

=== Implementation details:

When creating a PE layout for a compressed file, we start with the same steps as before - with memory mapping the region of the singlefile app that contains the file we need. No changes here.

For obvious reasons we cannot use the direct mapping as-is since that contains compressed data. So we create another intermediate anonymous (memory-only) mapping of appropriate size and decompress the file into the mapping.
After that we can release the original mapping/view that were created for the compressed data.

The PE layout with the anonymous mapping will manage the life time of the decompressed data (same as before) and can be used when flat image is sufficient, or can be further transformed into "mapped" executable form if the file contains native executable sections (such as R2R).

Such conversion was already performed on Windows, while Unix could always get away with virtually mapping to the original data with appropriate alignment and access. Now the conversion will be used on Unix as well (when the original data is compressed).

@ghost
Copy link

ghost commented Apr 6, 2021

Tagging subscribers to this area: @agocke, @vitek-karas, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

In progress.

Author: VSadov
Assignees: -
Labels:

area-Single-File

Milestone: -

@VSadov VSadov changed the title [WIP] Singlefile: enabling compression for managed assemblies. Singlefile: enabling compression for managed assemblies. Apr 9, 2021
@VSadov VSadov marked this pull request as ready for review April 9, 2021 23:45
@VSadov
Copy link
Member Author

VSadov commented Apr 9, 2021

I think this PR is ready to be reviewed.

Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume most of the test coverage is provided by the existing tests, right?

@@ -126,7 +126,7 @@ CORECLR_HOSTING_API(coreclr_execute_assembly,
//
// Callback types used by the hosts
//
typedef bool(CORECLR_CALLING_CONVENTION BundleProbeFn)(const char* path, int64_t* offset, int64_t* size);
typedef bool(CORECLR_CALLING_CONVENTION BundleProbeFn)(const char* path, int64_t* offset, int64_t* sizes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we pass two separate parameters?

Copy link
Member Author

@VSadov VSadov Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about framework-dependent scenarios where we may be calling the 5.0 version of the probe. Having the same signature allows the old clients to be forwards-compatible with new runtimes.
I.E. 5.0 probe will use only the first element of the array, which contains the file size, which is compatible with the new {size, uncompressedSize}.

Passing an extra parameter "may work", depending on calling conventions. Passing an array does not need to rely on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - if we may need another piece of information from the probe in the future, we can continue adding optional array elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But framework dependent app should use hostpolicy.dll which is next to the runtime - we should not be statically linking hostpolicy anywhere but superhost. So far we've made the assumption that hostpolicy/runtime interface is not "public" and can be change at any time without versioning considerations, because we always ship hostpolicy and runtime together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think I saw adding an extra parameter causing failures in framework-dependent 5.0 tests on OSX.
Although we also had uninitialized data problem as well and it could be just that. Let me think about this.

I thought we could have a case where 6.0 clr calls the probe and there is 5.0 hostpolicy on the other end. Perhaps this can't happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct. The probe is always in sync with runtime. hostpolicy is either bundled together with runtime in superhost or picked up next to the runtime. Either way it will match the version/build.

That allows just using another parameter, so I have reverted the change that packs sizes in an array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted the change that packs sizes into an array. The failures in framework-dependent 5.0 tests on OSX were indeed caused by uninitialized data (although they looked like a probe mismatch)

if (sizes[1] != 0)
{
loc.Size = sizes[1];
loc.UncompresedSize = sizes[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just occurred to me - technically there's no reason to store the uncompressed size in the bundle headers. We would really need just a boolean. I assume it makes the code more efficient to know how much to allocate upfront to uncompress.

But what if the size in the header doesn't match the actual uncompressed size - we should have a test for that at the very least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing uncompressed allows to set up the storage for uncompressed data precisely, so it is slightly more convenient than a bool.

Copy link
Member Author

@VSadov VSadov Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have corrupted data we will not overrun the buffer, since we provide the size limit to the zlib.
It does make sense to add a check though and failfast if uncompressed size is less than expected, or compressed data was not all consumed. Good point!

@@ -502,6 +519,7 @@ MappedImageLayout::MappedImageLayout(PEImage* pOwner)

HANDLE hFile = pOwner->GetFileHandle();
INT64 offset = pOwner->GetOffset();
_ASSERTE(!pOwner->GetUncompressedSize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Personally I would prefer to have a full == 0 in this case. I hate the C++ tricks of treating numericals as booleans - it confuses me when reading the code.


#else
_ASSERTE(!"Failure extracting contents of the application bundle. Compressed files used with a standalone (not singlefile) apphost.");
ThrowLastError();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this actually fail correctly - ThrowLastError presumably throws the last Win32 error code, but in this case there was no OS call which failed before this... so the last error code is effectively random value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a failfast for "impossible" situation. Something is badly corrupted and we cannot recover, if we hit this.
I will check what we use in such cases.

@@ -126,7 +126,7 @@ CORECLR_HOSTING_API(coreclr_execute_assembly,
//
// Callback types used by the hosts
//
typedef bool(CORECLR_CALLING_CONVENTION BundleProbeFn)(const char* path, int64_t* offset, int64_t* size);
typedef bool(CORECLR_CALLING_CONVENTION BundleProbeFn)(const char* path, int64_t* offset, int64_t* sizes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Windows not allow unicode characters in the path? Or is this really a byte stream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly - all runtime/host interactions use UTF8 strings.

@@ -13,16 +13,26 @@ public static class Program
// The bundle-probe callback is only called from native code in the product
// Therefore the type on this test is adjusted to circumvent the failure.
[UnmanagedFunctionPointer(CallingConvention.StdCall)]
public delegate byte BundleProbeDelegate([MarshalAs(UnmanagedType.LPUTF8Str)] string path, IntPtr size, IntPtr offset);
public unsafe delegate byte BundleProbeDelegate([MarshalAs(UnmanagedType.LPUTF8Str)] string path, IntPtr offset, long* sizes);
Copy link
Member

@agocke agocke Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use IntPtr for offset here, but the function pointer at the top is uint64_t. Won't that be the wrong size on x86?

Never mind, forgot the pointer part.

src/coreclr/utilcode/pedecoder.cpp Show resolved Hide resolved
@@ -891,6 +895,11 @@ endif(CLR_CMAKE_TARGET_WIN32)
set (VM_SOURCES_WKS_SPECIAL
codeman.cpp
ceemain.cpp
peimagelayout.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why couldn't it stay in the common list? I have originally structured the lists so that no file needed to be duplicated in multiple lists.

Copy link
Member Author

@VSadov VSadov Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common list is compiled by the build only once. That includes almost every file. And we reuse objects for singlefile and regular coreclr.

Very few files need to know about singlefile and have parts conditionally compiled for CORECLR_EMBEDDED. peimagelayout.cpp cannot and does not need to support decompression in regular coreclr mode, so it needs to use #ifdef CORECLR_EMBEDDED

@@ -887,7 +887,9 @@ void PEImage::EnumMemoryRegions(CLRDataEnumMemoryFlags flags)


PEImage::PEImage():
m_path(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these additions really needed? Isn't the parameter-less constructor used by default?

Copy link
Member Author

@VSadov VSadov Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if you do not have a constructor, compiler may generate one. Once you have a constructor, compiler just calls it.

new PEImage() would leave these fields uninitialized, but typically we were lucky except for OSX with optimizations (i.e. does not fail in Debug, but started failing in Checked/Release)

#if defined(CORECLR_EMBEDDED)
extern "C"
{
#include "../../../libraries/Native/AnyOS/zlib/pal_zlib.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like long relative paths in general, they are a burden when structure of the source tree changes. It seems it would be nicer to add the ../../../libraries/Native/AnyOS/zlib to the include paths in cmake scripts and have just

#include "pal_zlib.h"

here. It would be also nice to define the libraries path at one place in the cmake files and use it as a basis for this and the path in src/coreclr/vm/CMakeLists.txt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was another long pall_zlib.h include in apphost/static. I have changed that too.

@agocke
Copy link
Member

agocke commented Apr 13, 2021

Any perf numbers for this change? Size changes? Startup perf changes? Working set?

@VSadov
Copy link
Member Author

VSadov commented Apr 13, 2021

@agocke - no measurements yet. So far I was more concerned about making it work correctly. The size will definitely be smaller, but how much of a startup penalty it will require is yet to be seen.
It would be a lot easier to measure a range of scenarios once this change meets its SDK counterpart (dotnet/sdk#16838)

@VSadov
Copy link
Member Author

VSadov commented Apr 13, 2021

@vitek-karas @janvorli - I think I have addressed all the concerns. Can you take another look?

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@@ -94,15 +101,21 @@ PEImageLayout* PEImageLayout::Map(PEImage* pOwner)
}
CONTRACT_END;

PEImageLayoutHolder pAlloc(new MappedImageLayout(pOwner));
PEImageLayoutHolder pAlloc = pOwner->GetUncompressedSize() ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as the other place - I would prefer pOwner->GetUncompressedSize() != 0 - using numbers as booleans is confusing.

src/coreclr/vm/peimagelayout.cpp Outdated Show resolved Hide resolved
Co-authored-by: Vitek Karas <vitek.karas@microsoft.com>
@ghost
Copy link

ghost commented Apr 14, 2021

Hello @VSadov!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@VSadov VSadov merged commit 7114248 into dotnet:main Apr 15, 2021
@VSadov VSadov deleted the compression02 branch April 15, 2021 02:00
@ghost ghost locked as resolved and limited conversation to collaborators May 15, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants