Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-File: Process bundles in the framework #34274

Merged
merged 18 commits into from
Apr 8, 2020

Conversation

swaroop-sridhar
Copy link
Contributor

This change implements the host changes proposed in the design

The main changes for single-file bundles are:

  • Bundle processing code is moved from apphost to hostpolicy
  • HostFxr and HostPolicy process deps.json and runtimeconfig.json files directly from the bundle.
  • HostPolicy performs verification wrt deps.json based on the contents of the single-file bundle.
  • AppContext.BaseDirectory is set as explained here

Currently, all files except deps.json and runtimeconfig.json are extracted to disk.
Once the runtime is able to processing assemblies directly from the bundle, they will no longer be extracted.

Notable details:

  • The bundle driver (formarly runner.cpp) is divided into two parts:
    • bundle::info which describes basic information about the bundle available from the headers only
    • bundle::runner which has information about all embedded files, and the ability to drive extraction
      This facilitates linking only parts of the bundle handling code with hostfxr, while all code is linked with hostpolicy.
  • The AppHost only links with bundle_marker to identify itself as a single-file bundle.
  • If the AppHost is a single-file bundle, it notifies hostfxr using the new hostfxr_main_bundle_startup_info() API
  • The HostFxr comminucates the single-file-information with HostPolicy using the host_interface_t structure.

Fixes #32821

@@ -46,7 +46,7 @@ class CursorStreamWrapper : public GenericStreamWrapper<InputStream, Encoding> {
// counting line and column number
Ch Take() {
Ch ch = this->is_.Take();
if(ch == '\n') {
if (ch == '\n') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a harmless change, but I'd prefer if rapidjson were kept as a pristine copy from upstream, as it's a 3rd-party component. (If we need to actually fix things in there, I'd prefer that we also sent patches their way instead of maintaining a fork.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK @lpereira I'll back off the change from this PR.
I've submitted a PR for rapidjson: Tencent/rapidjson#1685
We should subsequently fix it in our code too.
I usually do if( for( while( searches before submitting PRs, and it would be nice to not hit these cases every time.

Comment on lines 162 to 163
realloc_buffer(location->size);
memcpy(m_json.data(), data, location->size);
Copy link
Contributor

@lpereira lpereira Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually see the idiom of using the swap() method to initialize a std::vector from an array, so I was curious to see how Clang would generate the code. It's identical. I'm impressed. :)

However, you can avoid this copy/allocation altogether here if this file is mapped with MAP_PRIVATE. In fact, this will simplify how this function works, since it won't need the #ifdef. Pseudocode:

if (is_single_file) {
    bundled_file_t bundled{path, location};   // RAII to map/unmap
    return parse_json(bundled.data(), context);
}

pal::ifstream_t file{path};
return parse_stream(file, path);

Then you don't need to split parse_json() into a parse/validate combo, and parse_json() can always work with a mutable char*. This also gets rid of duplicate, implementation-detail specific comments.

Comment on lines 160 to 161
// On Linux, copy the input data, since m_document.ParseInsitu
// requires a mutable data input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// On Linux, copy the input data, since m_document.ParseInsitu
// requires a mutable data input.
// On other platforms, copy the input data, since m_document.ParseInsitu
// requires a mutable data input.

@@ -398,13 +401,13 @@ bool runtime_config_t::ensure_parsed()
trace::verbose(_X("Did not successfully parse the runtimeconfig.dev.json"));
}

if (!pal::file_exists(m_path))
json_parser_t json;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
json_parser_t json;

{
// Not existing is not an error.
return true;
}

json_parser_t json;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Suggested change
json_parser_t json;
json_parser_t json;

Comment on lines +100 to +103
// * Files can only be memory mapped at page-aligned offsets, and in whole page units.
// Therefore, mapping only portions of the bundle will involve align-down/round-up calculations, and associated offset adjustments.
// We choose the simpler approach of rounding to the whole file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to comment that this might be a problem in 32-bit systems and very large applications due to exhaustion of address space.

private:
pal::string_t m_path;
const location_t *m_location;
} json_info;
Copy link
Contributor

@lpereira lpereira Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json_info doesn't seem to be used from a quick glance.

{
struct config_t
{
config_t() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't initialize m_location, and config_t::matches() will dereference that pointer. When info_t::m_deps_jsonand info_t::m_runtimeconfig_json are constructed, they use this constructor here.

@@ -12,7 +12,9 @@ manifest_t manifest_t::read(reader_t& reader, int32_t num_files)

for (int32_t i = 0; i < num_files; i++)
{
manifest.files.emplace_back(file_entry_t::read(reader));
file_entry_t entry = file_entry_t::read(reader);
manifest.files.emplace_back(entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is there a benefit for using emplace_back()here? Maybe file_entry_t should have a move constructor and std::movebe used instead?

const int8_t* info_t::config_t::map(const pal::string_t& path, const location_t* &location)
{
const bundle::info_t* app = bundle::info_t::the_app;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the_app is static and initialized to nullptr, might be a good idea to assert if it's not nullptr where it's dereferenced

pal::string_t file_base = entry->needs_extraction() ? app->extraction_path() : app->base_path();

// Reserve space for the path below
full_path.reserve(file_base.length() + relative_path.length() + 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the 3 for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is code copied from https://github.com/dotnet/runtime/blob/master/src/installer/corehost/cli/deps_entry.cpp#L32.
Two characters are required for DIR_SEPARATOR and null-terminator. Not sure why the third spot is necessary.

Comment on lines 28 to 32
pal::string_t pal_relative_path = asset.relative_path;
if (_X('/') != DIR_SEPARATOR)
{
replace_char(&pal_relative_path, _X('/'), DIR_SEPARATOR);
}
Copy link
Contributor

@lpereira lpereira Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pal::string_t pal_relative_path = asset.relative_path;
if (_X('/') != DIR_SEPARATOR)
{
replace_char(&pal_relative_path, _X('/'), DIR_SEPARATOR);
}
pal::string_t pal_relative_path = normalize_dir_separator(asset.relative_path);


if(address == nullptr)
if (address == nullptr)
Copy link
Contributor

@lpereira lpereira Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (address == nullptr)
close(fd);
if (address == MAP_FAILED)


if(address == nullptr)
if (address == nullptr)
{
trace::error(_X("Failed to map file. mmap(%s) failed with error %d"), path.c_str(), errno);
close(fd);
Copy link
Contributor

@lpereira lpereira Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
close(fd);

(And also remove the other close(fd)below, which GitHub doesn't allow me to comment on.)

{
if (input->app_bundle != nullptr)
{
static bundle::runner_t bundle_runner(input->app_bundle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep reading this as Blade Runner. :)

@@ -2094,11 +2094,11 @@ class GenericValue {

const SizeType len1 = GetStringLength();
const SizeType len2 = rhs.GetStringLength();
if(len1 != len2) { return false; }
if (len1 != len2) { return false; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have yet to look at tests. The most important feedback is about the UX when we run 5.0 single-file on machine which doesn't have 5.0. We need to get that right.

src/installer/corehost/cli/fxr/hostfxr.cpp Outdated Show resolved Hide resolved
@@ -238,7 +238,8 @@ bool hostpolicy_resolver::try_get_dir(

// Resolve hostpolicy version out of the deps file.
pal::string_t version = resolve_hostpolicy_version_from_deps(resolved_deps);
if (trace::is_enabled() && version.empty() && pal::file_exists(resolved_deps))
if (trace::is_enabled() && version.empty() &&
(bundle::info_t::config_t::probe(resolved_deps)|| pal::file_exists(resolved_deps)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(bundle::info_t::config_t::probe(resolved_deps)|| pal::file_exists(resolved_deps)))
(bundle::info_t::config_t::probe(resolved_deps) || pal::file_exists(resolved_deps)))

src/installer/corehost/cli/fxr/hostpolicy_resolver.cpp Outdated Show resolved Hide resolved
@@ -343,6 +343,12 @@ bool deps_resolver_t::probe_deps_entry(const deps_entry_t& entry, const pal::str
else
{
// Non-rid assets, lookup in the published dir.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we only handle bundle for non-rid assets as we expect the app to be rid-specific (since it's an apphost after all). But I think this is potentially dangerous. For framework depend apps it is in theory possible to have an app which is published for linux-x64 but carries different assets for different distros. I don't know how SDK would handle that, but it feels possible. At least add this scenario to your list of things to validate.
I would probably also prefer if we actually checked the path even in the rid-specific case above and maybe failed if we can't handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitek-karas, regarding rid-specific assets:

  • I've added the bundle-check in the rid-specific assets case too, as you've recommended.
  • However, when an app is published specific to a RID, my understanding is that the SDK will never create assets for other RIDs. The SDK will find the best compatible RID assets and copy them to the publish directory. So, there won't be a <rid>\publish\runtimes\<rid> structure.

For example, if we have a package (libuv 1.9.0 is a very good example)

runtimes
   |- debian-x64/native/lib.so
   |- rhel-x64/native/lib.so
   |- win-x64/native/lib.dll

dotnet publish will generate the above directory structure within the bin/.net5/publish directory.

However, dotnet publish -r debian-x64 --self-contained=false will generate bin/.net5/debian-x64/publish that only contains lib.so from runtime/debian-x64/native.

dotnet publish -r ubuntu-x64 --self-contained=false will generate bin/.net5/ubuntu-x64/publish that contains lib.so from runtime/debian-x64/native.

dotnet publish -r linux-x64 --self-contained=false will generate bin/.net5/linux-x64/publish without lib.so

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotnet publish -r linux-x64 --self-contained=false will generate bin/.net5/linux-x64/publish without lib.so

This sounds like a problem/bug. I assume the app would not run in that case at all... right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is an underlying problem, because the app may fail to run because of missing assemblies in this situation. However, I'm not sure it is an SDK bug, but rather a product issue.

The missing RID-asset (here linux-x64) in a package may actually mean that the library (lib.so) is unnecessary for that platform -- although it is weird that the lib is not necessary at a higher RID but necessary in a lower RID. This is somewhat similar to missing an assembly that is only loaded at runtime for reflection -- where the SDK cannot know the exact requirements.

@dsplaisted @wli3: Should we consider guards in the SDK to check for this case? (ex: missing assets at a parent RID)? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swaroop-sridhar I have a feeling trying to add guards for this type of thing would be more complex and complicated than would be warranted for the benefit we would get out of it. But you can file an issue in the SDK repo if you want to discuss it further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/installer/corehost/cli/bundle/info.cpp Outdated Show resolved Hide resolved
src/installer/corehost/cli/bundle/info.cpp Outdated Show resolved Hide resolved
src/installer/corehost/corehost.cpp Outdated Show resolved Hide resolved
src/installer/corehost/corehost.cpp Outdated Show resolved Hide resolved
src/installer/corehost/corehost.cpp Outdated Show resolved Hide resolved
swaroop-sridhar and others added 9 commits April 4, 2020 00:00
This change implements the host changes proposed in the [design](https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/design.md#startup)

The main changes for single-file bundles are:
* Bundle processing code is moved from apphost to hostpolicy
* HostFxr and HostPolicy process deps.json and runtimeconfig.json files directly from the bundle.
* HostPolicy performs verification wrt deps.json based on the contents of the single-file bundle.
* AppContext.BaseDirectory is set as explained [here](https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/design.md#appcontextbasedirectory)

Currently, all files except deps.json and runtimeconfig.json are extracted to disk.
Once the runtime is able to processing assemblies directly from the bundle, they will no longer be extracted.

Notable details:
* The bundle driver (formarly runner.cpp) is divided into two parts:
   * bundle::info which describes basic information about the bundle available from the headers only
   * bundle::runner which has information about all embedded files, and the ability to drive extraction
  This facilitates linking only parts of the bundle handling code with hostfxr, while all code is linked with hostpolicy.
* The AppHost only links with bundle_marker to identify itself as a single-file bundle.
* If the AppHost is a single-file bundle, it notifies hostfxr using the new hostfxr_main_bundle_startup_info() API
* The HostFxr comminucates the single-file-information with HostPolicy using the host_interface_t structure.

Fixes dotnet#32821
The rename test would hang forever if AppWithWait() fails before creating the resume file.
So, while waiting, check that the app-process is still running.
* Fix HostFxr startup function naming in CoreHost
* deps_resolver: Lookup up bundle even for rid-specific assets within the app
* Ensure CUI/GUI error output when expected framework is not found for single file apps.
* Other minor fixes
Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I 'm struggling to understand what is the hostpolicy/coreclr interface. In the design doc https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/design.md#dependency-resolution it says:

The assemblies bundled within the single-file are not enumerated in TRUSTED_PLATFORM_ASSEMBLIES. The absolute-path of assemblies on disk are listed in TRUSTED_PLATFORM_ASSEMBLIES as usual.

But if I read the code correctly it will populate TPA with full paths even for bundled assemblies - relative to the app's base.

Similar comment on resource paths.

src/installer/corehost/cli/deps_entry.cpp Outdated Show resolved Hide resolved
src/installer/corehost/cli/deps_entry.cpp Outdated Show resolved Hide resolved
src/installer/corehost/cli/bundle/runner.cpp Show resolved Hide resolved
src/installer/corehost/cli/deps_entry.cpp Show resolved Hide resolved
@swaroop-sridhar
Copy link
Contributor Author

I think I 'm struggling to understand what is the hostpolicy/coreclr interface. In the design doc https://github.com/dotnet/designs/blob/master/accepted/2020/single-file/design.md#dependency-resolution it says:

The assemblies bundled within the single-file are not enumerated in TRUSTED_PLATFORM_ASSEMBLIES. The absolute-path of assemblies on disk are listed in TRUSTED_PLATFORM_ASSEMBLIES as usual.

But if I read the code correctly it will populate TPA with full paths even for bundled assemblies - relative to the app's base.

Similar comment on resource paths.

There is a small nuance to this:

  • The assmeblies bundled into the single-file, that are not extracted to disk are not entered to the TPA (since there is no associated path "real" path). The runtime will know to probe for these assemblies and then load them directly from bundle.
  • The assemblies on disk (whether extracted to disk or left on disk) will remain on the TPA as usual.
  • We do not anticipate having to extract assemblies to disk once the runtime is able to process assemblies from bundle. So, the design doesn't explicitly mention this.
  • We could (depending on feedback) extract assemblies to disk in netcoreapp3.0 compatibility mode.

Also update a few comments to clarify that assemblies are currently extracted to disk.
@@ -58,6 +59,7 @@ struct host_interface_t
const pal::char_t* host_info_host_path;
const pal::char_t* host_info_dotnet_root;
const pal::char_t* host_info_app_path;
const bundle::info_t *app_bundle;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundle::info_t does not look to be plain data, so using it cross-binary like this where hostpolicy reads the class allocated by hostfxr could lead to compat issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @elinor-fung . I've fixed this in 12b63ba

@@ -12,6 +12,7 @@
#include <utils.h>

#include "json_parser.h"
#include "bundle/info.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed?

Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - the one pending question I have is the RID specific asset handling.

Host::info* is not POD, so pass the bundle-header offset instead.
The deps-json and runtime-config json locations are recomputed in hostpolicy.
Since hostpolicy needs to re-map the bundle, recomputing these locations is not expensive than passing them through the interface.
This also keeps the host_interface cleaner.
@swaroop-sridhar
Copy link
Contributor Author

swaroop-sridhar commented Apr 8, 2020

Looks good

Thanks a lot for the detailed feedback @vitek-karas
I have a few minor fixes in the last couple of commits. It'd be great if you can take a look at these too.

the one pending question I have is the RID specific asset handling.

I've now answered this in #34274 (comment)

Copy link
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Please followup on the issue with RID-specific assets. I do agree that vast majority of the apps won't have the problem. If SDK will never add a RID-specific asset to a RID-specific build, then this is not an issue for host/runtime (yet), but it seems to be an SDK issue.

@swaroop-sridhar swaroop-sridhar merged commit 78b303d into dotnet:master Apr 8, 2020
swaroop-sridhar added a commit to swaroop-sridhar/sdk that referenced this pull request Apr 30, 2020
…sing extraction

Customer Scenario
Self-contained Apps published as a single-file fail at run-time.

Problem

Single-file bundles are now processed in the framework rather than the apphost (dotnet/runtime#34274).
This means that hostpolicy and hostfxr DLLs are excluded from being bundled themselves.
In the case of self-contained single-file apps, these files need to be separate files until static-apphost is available.
This needs to be ensured by the SDK; otherwise app execution will fail.

Solution

This change fixes the problem by adapting the SDK to:
* Pass the correct target-OS settings (otherwise cross-targetting builds will not work correctly)
* Copy files excluded by the Bundler to the publish directory.

The stage-0 netcoreapp is updated to preview4, because preview2 apphost is not compatible with preview4 bundler.

Risk
Low
swaroop-sridhar added a commit to swaroop-sridhar/runtime that referenced this pull request Apr 30, 2020
This commit reverts:
Revert "Single-File: Process bundles in the framework (dotnet#34274)"
This reverts commit 78b303d.

Revert "Single-File Bundler: Add a FileSize test (dotnet#35149)"
This reverts commit 779588a.
swaroop-sridhar added a commit that referenced this pull request May 1, 2020
This commit reverts:
Revert "Single-File: Process bundles in the framework (#34274)"
This reverts commit 78b303d.

Revert "Single-File Bundler: Add a FileSize test (#35149)"
This reverts commit 779588a.

*Customer Scenario*

Publishing apps as a self-contained single-file doesn't work as expected.

* Publish needs to generate hostpolicy and hostfxr separate from the single file bundle
* Cross-platform publishing is incorrect

*Problem*

Since Static-apphost is not yet ready, processing bundle content in hostpolicy means that  hostpolicy and hostfxr DLLs need to be separate from the bundle. This causes self-contained single-file apps to not be a "single file" temporarily. 

The change also requires supporting changes from the SDK, to publish hostfxr and hostpolicy as separate files, and to invoke HostModel library with arguments that facilitate cross-platform publishing.

*Solution*

To solve these, problem, this change reverts:

Revert "Single-File: Process bundles in the framework (#34274)" commit 78b303d.

and a dependent test-only change:

Revert "Single-File Bundler: Add a FileSize test (#35149)" commit 779588a.

*Risk*

Medium
The change is contained to only host components: apphost, hostpolicy, and hostfxr.
However, the change is big, and needs testing in runtime and SDK repos.

*Testing*

Manually tested the SDK by inserting apphost, hostfxr, hostpolicy, and hostmodel library from this build into the `dotnet/packs` preview-4 SDK from dotnet/sdk#11518 build.

Verified that:
* Singlefile apps can be published and run OK for  { Windows, Linux, Osx } x {netcoreapp3.0, netcoreapp3.1, netcoreapp5.0} 
* Cross-targeting builds of single-file apps build and run OK (ex: built on Windos, run on Mac).
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Single-File: Move bundle-processing code into the framework
6 participants