Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize URI #692

Open
BoykoAlex opened this issue Jan 19, 2023 · 6 comments
Open

Normalize URI #692

BoykoAlex opened this issue Jan 19, 2023 · 6 comments

Comments

@BoykoAlex
Copy link

The common case of setting URI on lsp4j message object is URI#toString() call. For example LSP4E sets uri on the TextDocumentIdentifier via URI#toString() call rather than URI#toASCIIString(). Therefore if the file name is ClassWithSpécialCharacter.java then only URI#toASCIIString() would produce the valid URI with escaped non-ASCII char é

It'd be great if TextDocumentIdentifier (and perhaps other classes) taking string URIs would normalize the input URI to guard against the issue described above.

@nixel2007
Copy link
Contributor

In bsl language server we've created a helper class for handling non-canonical uri: https://github.com/1c-syntax/utils/blob/master/src/main/java/com/github/_1c_syntax/utils/Absolute.java

@BoykoAlex
Copy link
Author

BoykoAlex commented Jan 19, 2023

Yes, we could do this as well (thanks for the sharing your solution!) but this would require a sweep kind of change, ie. URI.create(strUri) vs Absolute.uri(strUri). The point is to try to avoid a code sweep change if possible... sort of to guard against further use of URI#toString() (or URI.create(String) vs Absolute.uri(String))

@jonahgraham
Copy link
Contributor

jonahgraham commented Feb 16, 2023

@BoykoAlex Are you recommending adding a constructor to TextDocumentIdentifier that takes a java.net.URI (in addition to a String)? Or do you want TextDocumentIdentifier to escape the input string? Can the latter part be done reliably?

@BoykoAlex
Copy link
Author

@jonahgraham I'd add a constructor that takes java.net.URI which would call uri.toASCIIString() to set the string uri field. In addition I'd make the constructor taking String creating a java.net.URI from string and then delegate to the constructor taking java.net.URI. It'd be great to do the same to setUri(...) unless this seems risky...

@jonahgraham
Copy link
Contributor

@jonahgraham I'd add a constructor that takes java.net.URI which would call uri.toASCIIString() to set the string uri field.

This part seems fine.

In addition I'd make the constructor taking String creating a java.net.URI from string and then delegate to the constructor taking java.net.URI.

This part worries me because of double-escaping. But then looking at your above comment that would still need a sweeping change if we didn't adopt this part.

It'd be great to do the same to setUri(...) unless this seems risky...

I don't see any issue with adding a new setUri(java.net.URI) method.

However having a getUri... that returns a java.net.URI seems problematic because its constructor throws URISyntaxException.

I guess the real question is why not have TextDocumentIdentifier store a java.net.URI all the time? Is there some things that LSP URIs can store (as a string) that can't be represented properly by java.net.URI?

@BoykoAlex
Copy link
Author

BoykoAlex commented Feb 16, 2023

I think having java.net.URI rather than String would be ideal... Somehow, I suspect it is String due to perhaps some JSON serialization/deserialization limitations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants