Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency when combining file: URIs #85229

Closed
gregsdennis opened this issue Apr 23, 2023 · 11 comments
Closed

Inconsistency when combining file: URIs #85229

gregsdennis opened this issue Apr 23, 2023 · 11 comments
Assignees
Milestone

Comments

@gregsdennis
Copy link
Contributor

gregsdennis commented Apr 23, 2023

Description

It seems that the Uri(baseUri, newUri) constructor behaves inconsistently when the base is a file: URI depending on how that URI was created, specifically whether the file:/// protocol is prefixed onto the string.

This occurs in my JsonSchema.Net library when loading schemas from files. When I do this, I need to set the base URI for the schema as the file path, so I've been doing this:

var text = File.ReadAllText(fileName);
var schema = FromText(text, options);
schema.BaseUri = new Uri(Path.GetFullPath(fileName));

This creates a Uri object that reveals the correct string, but it doesn't combine with JSON Pointer fragments correctly. Instead of appending a fragment, it replaces the file name with the # from the pointer fragment.

Reproduction Steps

var filePath = "C:\\Folder\\Issue435_schema.json";

var withoutProtocol = new Uri(filePath);
var withProtocol = new Uri($"file:///{filePath}");

var fragment = new Uri("#/$defs/DerivedType", UriKind.RelativeOrAbsolute);

var withoutProtocolResult = new Uri(withoutProtocol, fragment);
var fileUriResult = new Uri(withProtocol, fragment);

Console.WriteLine("File path: {0}", filePath);
Console.WriteLine();
Console.WriteLine("Without protocol: {0}", withoutProtocol);
Console.WriteLine("With protocol:    {0}", withProtocol);
Console.WriteLine();
Console.WriteLine("Combined, Without: {0}", withoutProtocolResult);
Console.WriteLine("Combined, With:    {0}", fileUriResult);

Expected behavior

File path: C:\Folder\Issue435_schema.json

Without protocol: file:///C:/Folder/Issue435_schema.json
With protocol:    file:///C:/Folder/Issue435_schema.json

Combined, Without: file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType
Combined, With:    file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType

Actual behavior

File path: C:\Folder\Issue435_schema.json

Without protocol: file:///C:/Folder/Issue435_schema.json
With protocol:    file:///C:/Folder/Issue435_schema.json

Combined, Without: file:///C:/Folder/%23/$defs/DerivedType
Combined, With:    file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType

Regression?

Unknown

Known Workarounds

For now, I can update my URI creation to explicitly include the file: protocol, but I don't think I should have to. If the Uri class is smart enough to recognize that a file path has been passed to it (which it is because it internally prepends the file: protocol), then these two cases should behave the same.

Configuration

My library is .Net Standard. I've tested this running .Net Core 3.1, .Net 6, and .Net 7.

I'm running Windows 11 x64. It seems to work fine in my online evaluator (Blazor) at https://json-everything.net/json-schema, so I'm moderately certain it's contained to Windows, but I haven't tested anywhere else.

Other information

Discovered with help from users in gregsdennis/json-everything#436.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 23, 2023
@ghost
Copy link

ghost commented Apr 23, 2023

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

It seems that the Uri(baseUri, newUri) constructure behaves inconsistently when the base is a file: URI depending on how that URI was created, specifically whether the "file:/// protocol is prefixed onto the string.

This occurs in my JsonSchema.Net library when loading schemas from files. When I do this, I need to set the base URI for the schema as the file path, so I've been doing this:

var text = File.ReadAllText(fileName);
var schema = FromText(text, options);
schema.BaseUri = new Uri(Path.GetFullPath(fileName));

This creates a Uri object that reveals the correct string, but it doesn't combine with JSON Pointer fragments correctly. Instead of appending a fragment, it replaces the file name with the # from the pointer fragment.

Reproduction Steps

var filePath = "C:\\Folder\\Issue435_schema.json";

var withoutProtocol = new Uri(filePath);
var withProtocol = new Uri($"file:///{filePath}");

var fragment = new Uri("#/$defs/DerivedType", UriKind.RelativeOrAbsolute);

var withoutProtocolResult = new Uri(withoutProtocol, fragment);
var fileUriResult = new Uri(withProtocol, fragment);

Console.WriteLine("File path: {0}", filePath);
Console.WriteLine();
Console.WriteLine("Without protocol: {0}", withoutProtocol);
Console.WriteLine("With protocol:    {0}", withProtocol);
Console.WriteLine();
Console.WriteLine("Combined, Without: {0}", withoutProtocolResult);
Console.WriteLine("Combined, With:    {0}", fileUriResult);

Expected behavior

File path: C:\Folder\Issue435_schema.json

Without protocol: file:///C:/Folder/Issue435_schema.json
With protocol:    file:///C:/Folder/Issue435_schema.json

Combined, Without: file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType
Combined, With:    file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType

Actual behavior

File path: C:\Folder\Issue435_schema.json

Without protocol: file:///C:/Folder/Issue435_schema.json
With protocol:    file:///C:/Folder/Issue435_schema.json

Combined, Without: file:///C:/Folder/%23/$defs/DerivedType
Combined, With:    file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType

Regression?

Unknown

Known Workarounds

For now, I can update my URI creation to explicitly include the file: protocol, but I don't think I should have to. If the Uri class is smart enough to recognize that a file path has been passed to it (which is it because it internally prepends the file: protocol), then these two cases should behave the same.

Configuration

My library is .Net Standard. I've tested this running .Net Core 3.1, .Net 6, and .Net 7.

I'm running Windows 11 x64. It seems to work fine in my online evaluator (Blazor) at https://json-everything.net/json-schema, so I'm moderately certain it's contained to Windows, but I haven't tested anywhere else.

Other information

Discovered with help from users in gregsdennis/json-everything#436.

Author: gregsdennis
Assignees: -
Labels:

area-System.IO

Milestone: -

@ghost
Copy link

ghost commented May 19, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

It seems that the Uri(baseUri, newUri) constructor behaves inconsistently when the base is a file: URI depending on how that URI was created, specifically whether the file:/// protocol is prefixed onto the string.

This occurs in my JsonSchema.Net library when loading schemas from files. When I do this, I need to set the base URI for the schema as the file path, so I've been doing this:

var text = File.ReadAllText(fileName);
var schema = FromText(text, options);
schema.BaseUri = new Uri(Path.GetFullPath(fileName));

This creates a Uri object that reveals the correct string, but it doesn't combine with JSON Pointer fragments correctly. Instead of appending a fragment, it replaces the file name with the # from the pointer fragment.

Reproduction Steps

var filePath = "C:\\Folder\\Issue435_schema.json";

var withoutProtocol = new Uri(filePath);
var withProtocol = new Uri($"file:///{filePath}");

var fragment = new Uri("#/$defs/DerivedType", UriKind.RelativeOrAbsolute);

var withoutProtocolResult = new Uri(withoutProtocol, fragment);
var fileUriResult = new Uri(withProtocol, fragment);

Console.WriteLine("File path: {0}", filePath);
Console.WriteLine();
Console.WriteLine("Without protocol: {0}", withoutProtocol);
Console.WriteLine("With protocol:    {0}", withProtocol);
Console.WriteLine();
Console.WriteLine("Combined, Without: {0}", withoutProtocolResult);
Console.WriteLine("Combined, With:    {0}", fileUriResult);

Expected behavior

File path: C:\Folder\Issue435_schema.json

Without protocol: file:///C:/Folder/Issue435_schema.json
With protocol:    file:///C:/Folder/Issue435_schema.json

Combined, Without: file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType
Combined, With:    file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType

Actual behavior

File path: C:\Folder\Issue435_schema.json

Without protocol: file:///C:/Folder/Issue435_schema.json
With protocol:    file:///C:/Folder/Issue435_schema.json

Combined, Without: file:///C:/Folder/%23/$defs/DerivedType
Combined, With:    file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType

Regression?

Unknown

Known Workarounds

For now, I can update my URI creation to explicitly include the file: protocol, but I don't think I should have to. If the Uri class is smart enough to recognize that a file path has been passed to it (which it is because it internally prepends the file: protocol), then these two cases should behave the same.

Configuration

My library is .Net Standard. I've tested this running .Net Core 3.1, .Net 6, and .Net 7.

I'm running Windows 11 x64. It seems to work fine in my online evaluator (Blazor) at https://json-everything.net/json-schema, so I'm moderately certain it's contained to Windows, but I haven't tested anywhere else.

Other information

Discovered with help from users in gregsdennis/json-everything#436.

Author: gregsdennis
Assignees: -
Labels:

area-System.Net, untriaged

Milestone: -

@gregsdennis
Copy link
Contributor Author

@karelz This has been unaddressed for over a month. Can I get some direction or feedback on this, please?

@karelz
Copy link
Member

karelz commented Jun 6, 2023

@gregsdennis I fear this may be by design - when you pass file path without file: prefix, it tries to combine it as file paths where fragments do not exist, which is the right result IMO. When you have file: prefix, then it is recognized as URI with all the URI features like fragment.
I will let @MihaZupan, our expert in the space, to comment -- warning: it might make few weeks to get response as he is away at the moment.

That said, Uri is extremely tricky and was source of many security issues and breaking changes over time. It was designed with some unfortunate default behaviors which keep biting us :( ... Even if we agree it is undesired behavior, we might still triage it as Won't Fix. Given the low impact (first report so far) and tricky area + load on the team at the moment, there is nearly no chance it will be addressed in 8.0. (just setting expectations)

@wfurt
Copy link
Member

wfurt commented Jun 6, 2023

This does not seems like regression. and Uri has historically quirks for compatibility reasons. It does not seems critical for 8.0, moving to future,

@wfurt wfurt removed the untriaged New issue has not been triaged by the area owner label Jun 6, 2023
@wfurt wfurt added this to the Future milestone Jun 6, 2023
@gregsdennis
Copy link
Contributor Author

gregsdennis commented Jun 6, 2023

Thanks for the comments. I'm not really worried about it being a regression. I'm more just confused by the behavior.

If this isn't something that's going to be "fixed" (whatever that may mean), I think I'd be satisfied with an explanation as to why it works this way.

@gregsdennis
Copy link
Contributor Author

Just pinging back since it's been 6 weeks or so.

@MihaZupan can you provide insight into what's going on here? Am I doing something I shouldn't be?

@gregsdennis
Copy link
Contributor Author

@MihaZupan just pinging again.

@karelz
Copy link
Member

karelz commented Sep 8, 2023

@MihaZupan ping (bringing it to the top of your notifications ;))

@MihaZupan
Copy link
Member

MihaZupan commented Sep 8, 2023

Sorry about the late reply. As you've discovered, Uri treats file path inputs differently depending on whether they are explicitly specifying the file:// scheme, or if they just look like file paths and the file scheme is implied.
The fact that Uri supports implicit file paths is unfortunate as their behavior is often confusing.

Effectively, inputs that specified the scheme are treated as "real Uris", and will behave similarly to what you might expect for an http:// url.
For implicit file paths, they are considered "just paths" and have different semantics in a bunch of places, often treated more like plain strings than a structured format (e.g. no query/fragment).

In your example, the input with the scheme specified is considered as a real Uri, and is as such allowed to contain a fragment.
For the implicit file path on the other hand, the # and everything following it is considered as just part of the path.

new Uri("file:///C:/Folder/Issue435_schema.json#/$defs/DerivedType").Fragment // #/$defs/DerivedType
new Uri("C:/Folder/Issue435_schema.json#/$defs/DerivedType").Fragment // <empty>

You can try swapping out # for ? and you should see similar behavior.

When it then comes to combining the two inputs, we'll check if the base supports having a fragment, and if so, just replace the fragment section. This is what happens for the explicit file path input here:

// Check for a simple fragment in relative part
if (relativeStr[0] == '#' && !baseUri.IsImplicitFile && baseUri.Syntax!.InFact(UriSyntaxFlags.MayHaveFragment))
{
newUriString = baseUri.GetParts(UriComponents.AbsoluteUri & ~UriComponents.Fragment,
UriFormat.UriEscaped) + relativeStr;
return null;
}

For the implicit file path, the # is treated as just another boring input character, and as such you skip all the way to merging paths. This is where you come to regular Uri path merging rules and behavior matches what would happen even for Http Uris. Consider how the below example looks the same as what you see for #, just with the character swapped out with an a.

new Uri(new Uri("file:///C:/Folder/Issue435_schema.json"), "a/$defs/DerivedType").AbsoluteUri // file:///C:/Folder/a/$defs/DerivedType
new Uri(new Uri("http://foo/Folder/Issue435_schema.json"), "a/$defs/DerivedType").AbsoluteUri // http://foo/Folder/a/$defs/DerivedType

That is, C:/Folder is considered the base directory for the merging (the file name on the base is ignored when merging paths), and a/$defs/DerivedType is the relative path.


TL;DR I think the behavior observed here is "by design", and we'd recommend you use explicit file paths whenever possible.

@karelz karelz modified the milestones: Future, 9.0.0 Sep 11, 2023
@karelz
Copy link
Member

karelz commented Sep 11, 2023

Closing as By Design per above explanation.

@karelz karelz closed this as completed Sep 11, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Oct 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants