Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add switch: Do not resolve URLs by defaults for XML #84169

Merged
merged 6 commits into from
Jul 18, 2023

Conversation

krwq
Copy link
Member

@krwq krwq commented Mar 31, 2023

Fixes: #83495

The approach is adding new resolver which resolves URIs with file scheme (accessible with XmlResolver.FileSystemResolver) and switch which changes so that we use it rather than XmlUrlResolver. In most situation we never do any resolving for XMLs but XmlUrlResolver is still used to resolve URIs for root level APIs, the switch changes so that we use file resolver instead and never access the network.

This:

  • adds XmlFileSystemResolver (accessible with XmlResolver.FileSystemResolver)
  • adds a switch (System.Xml.XmlResolver.IsNetworkingEnabledByDefault) to use it

Creating PR with initial version since I wanted to collect feedback on this approach

Open questions:

  • current design explicitly does not prefix switch name with "Switch." but other XML switches do, other projects do not use that prefix, which convention should be followed? Are old switches mostly relevant to netfx?

TODOs:

  • API review
  • tests

Size impact:

MStat numbers:

  • Types Total Size 883,620 => 502,900
  • Methods Total Size 3,628,957 => 1,835,091
  • Blobs Total Size 2,056,593 => 1,284,014

App for measurement (with switch set to false):

using System.Xml.Linq;

XDocument doc = XDocument.Load("file.xml");

string value = doc.Descendants("some-element").Single().Value;
if (value == "test-value")
{
    Console.WriteLine("SUCCESS");
    return 0;
}

return 1;

file.xml:

<?xml version="1.0" encoding="utf-8" ?>
<root>
  <some-element>test-value</some-element>
</root>

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost assigned krwq Mar 31, 2023
@ghost
Copy link

ghost commented Mar 31, 2023

Tagging subscribers to this area: @dotnet/area-system-xml
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes: #83495

This:

  • adds XmlFileSystemResolver
  • adds a switch to use it

Creating PR with initial version since I wanted to collect feedback on this approach

TODOs:

  • API review
  • tests

Size impact:

MStat numbers:

  • Types Total Size 883,620 => 502,900
  • Methods Total Size 3,628,957 => 1,835,091
  • Blobs Total Size 2,056,593 => 1,284,014
Author: krwq
Assignees: -
Labels:

area-System.Xml, new-api-needs-documentation

Milestone: -

@MichalStrehovsky
Copy link
Member

I got intrigued by the LocalAppContextSwitches.AllowDefaultResolver and found https://stackoverflow.com/questions/54764271/different-behavior-for-full-framework-and-net-core-for-xml-schema-compilation. Is it true that the resolver in general is already an opt in thing? Is there any way we could hook into that? (Unless the user does the opt in gesture, this is trimmed.)

I guess if the actual API to enable the resolvers is AppContext.SetSwitch("Switch.System.Xml.AllowDefaultResolver", true); there's probably nothing we can do but it still feels like a loss that we need to introduce another gesture around this.


namespace System.Xml
{
public abstract partial class XmlResolver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need to be in a new file? Why can't it just be in the main XmlResolver.cs file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed convention for the ThrowingResolver

{
if (absoluteUri.Scheme == "file")
{
return await Task.Run<Stream>(() => new FileStream(absoluteUri.LocalPath, FileMode.Open, FileAccess.Read, FileShare.Read, 1, useAsync: true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting... Does FileStream's ctor really require a separate thread to run?

Why wouldn't this just be return Task.FromResult(new FileStream(...));?

Copy link
Member Author

@krwq krwq Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted this logic from XmlDownloadManager.GetStreamAsync, I think we should be able to change this without issues in both places though since I don't think ctors can be async (unless there is some magic behind the back to make that work)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub - is there anything I'm missing here? I see you modified the XmlDownloadManager code in dotnet/corefx#41111, but the Task.Run<Stream>(...) code was already there. Is there a reason to create the FileStream on a separate thread?

cc @adamsitnik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to create the FileStream on a separate thread?

Maybe someone was concerned that in these scenarios the file would be living on a remote file share that could take a non-trivial amount of time to open?

@krwq
Copy link
Member Author

krwq commented Apr 4, 2023

I got intrigued by the LocalAppContextSwitches.AllowDefaultResolver and found https://stackoverflow.com/questions/54764271/different-behavior-for-full-framework-and-net-core-for-xml-schema-compilation. Is it true that the resolver in general is already an opt in thing? Is there any way we could hook into that? (Unless the user does the opt in gesture, this is trimmed.)

I guess if the actual API to enable the resolvers is AppContext.SetSwitch("Switch.System.Xml.AllowDefaultResolver", true); there's probably nothing we can do but it still feels like a loss that we need to introduce another gesture around this.

It's true but there are 2 levels of default resolver:

  • root level APIs - since user explicitly uses specific path/url we assume it is safe - the old switch doesn't block that and also old switch would also block file names in those APIs making them effectively useless
  • references etc. - anything which loads when there is something in the payload - this is what old switch blocks for security reasons - for some places like schema it's a bit borderline that we blocked that because for schema you could argue that schema is constant and trusted while payload is not

also for this particular scenario when we use default resolver we might still want to opt-out of URLs which this switch enables.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
get
{
return GetCachedSwitchValue("Switch.System.Xml.IgnoreEmptyKeySequencess", ref s_ignoreEmptyKeySequences);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: there used to be a typo in the switch name here, I have fixed it here according to how it's documented:
https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/appcontextswitchoverrides-element

ReferenceOutputAssembly="false"
SetTargetFramework="TargetFramework=netstandard2.0"
OutputItemType="Analyzer" />
<ProjectReference Include="$(LibrariesProjectRoot)System.Text.RegularExpressions\gen\System.Text.RegularExpressions.Generator.csproj" ReferenceOutputAssembly="false" SetTargetFramework="TargetFramework=netstandard2.0" OutputItemType="Analyzer" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried reverting this but VS keeps on changing this back

{
return XmlDownloadManager.GetStream(absoluteUri, _credentials, _proxy);
}

throw new XmlException(SR.Xml_UnsupportedClass, string.Empty);
}

// Maps a URI to an Object containing the actual resource.
public override async Task<object> GetEntityAsync(Uri absoluteUri, string? role, Type? ofObjectToReturn)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this without changes, for some reason we had separate file for async version (same for XmlDownloadManager)

{
if (absoluteUri.Scheme == "file")
{
return Task.FromResult<object>(new FileStream(absoluteUri.LocalPath, FileMode.Open, FileAccess.Read, FileShare.Read, 1, useAsync: true));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this method is moved from *Async file which got removed. The only difference here is that we use Task.FromResult rather than Task.Run

<linker>
<assembly fullname="System.Private.Xml">
<type fullname="System.Xml.LocalAppContextSwitches">
<method signature="System.Boolean get_AllowResolvingUrlsByDefault()" body="stub" value="false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File name URLs are still valid URLs.

Should the name rather be AllowResolvingNonFileSystemUrlsByDefault?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API review decided on System.Xml.XmlResolver.IsNetworkingEnabledByDefault

@ghost
Copy link

ghost commented May 4, 2023

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@ghost
Copy link

ghost commented Jun 4, 2023

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@ghost ghost closed this Jun 4, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 4, 2023
@krwq krwq reopened this Jul 17, 2023
@krwq krwq force-pushed the xml-trimming-not-pull-networking branch from 063d2a7 to 5c49b26 Compare July 17, 2023 13:23
@krwq krwq marked this pull request as ready for review July 17, 2023 14:07
krwq and others added 2 commits July 18, 2023 08:10
Co-authored-by: David Cantú <dacantu@microsoft.com>
…olverDefaults.IsNetworkingEnabledByDefault.cs

Co-authored-by: David Cantú <dacantu@microsoft.com>
@krwq
Copy link
Member Author

krwq commented Jul 18, 2023

merging since the failures are unrelated to my changes

@krwq krwq merged commit c2bd0c9 into dotnet:main Jul 18, 2023
@MichalStrehovsky
Copy link
Member

Real world example from these savings: #89489. The size of ilc.exe shrinking by almost 30% from 15.2 MB to 11.1 MB! Awesome savings!

(It also speeds up the inner loop of rebuilding the AOT compiler.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make non-file URL support optional in XmlUrlResolver
6 participants