Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VFS-524] A URI with an IPv6 address can't be parsed out correctly #438

Merged

Conversation

fyudanov
Copy link
Contributor

https://issues.apache.org/jira/browse/VFS-524

The fix is based on the one, suggested in the Jira ticket, but also includes some corner cases covered and a number of additional unit tests, also tested end-to-end.

@@ -94,7 +93,7 @@ protected Http5FileObject(final AbstractFileName name, final FS fileSystem,
final FileSystemOptions fileSystemOptions = fileSystem.getFileSystemOptions();
urlCharset = builder.getUrlCharset(fileSystemOptions);
final String pathEncoded = ((GenericURLFileName) name).getPathQueryEncoded(getUrlCharset());
internalURI = URIUtils.resolve(fileSystem.getInternalBaseURI(), pathEncoded);
internalURI = fileSystem.getInternalBaseURI().resolve(pathEncoded);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URIUtils.resolve() from HttpClient5.1 has an issue of escaping host symbols that is not working properly with IPv6 hosts. HttpClient4 and native Java URI.resolve() don't have this problem.

@garydgregory garydgregory changed the title [VFS-524] The uri include ipv6 address can't be parsed out correctly [VFS-524] The URI include ipv6 address can't be parsed out correctly Oct 22, 2023
@garydgregory garydgregory changed the title [VFS-524] The URI include ipv6 address can't be parsed out correctly [VFS-524] The URI include IPv6 address can't be parsed out correctly Oct 22, 2023
@garydgregory garydgregory changed the title [VFS-524] The URI include IPv6 address can't be parsed out correctly [VFS-524] A URI with an IPv6 address can't be parsed out correctly Oct 22, 2023
if (isIPv6Host) {
return ch == ']';
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fyudanov

Thank you for this PR! 😄

The expression below does not look right to me or the method is misnamed, but the code sure could use some comments 😉

Below, we have the test for the last char of a host name:

 return ch == '/' || ch == ';' || ch == '?' || ch == ':' || ch == '@' || ch == '&' || ch == '=' || ch == '+'
                || ch == '$' || ch == ',';

Reading https://datatracker.ietf.org/doc/html/rfc3986#page-49:

host          = IP-literal / IPv4address / reg-name
...
reg-name      = *( unreserved / pct-encoded / sub-delims )
...
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

For example, shouldn't ~ be in the test. Also the method needs comments if you intend to also test for a separator char with rest of the path like / and ?. Or am I missing something?

At least this PR needs more testing with host names and URI that contain all the these special characters.

Copy link
Contributor Author

@fyudanov fyudanov Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @garydgregory

I agree that the char list from

ch == '/' || ch == ';' || ch == '?' || ch == ':' || ch == '@' || ch == '&' || ch == '=' || ch == '+'
                || ch == '$' || ch == ',';

looks a bit inconsistent with RFC (at first site at least), but it wasn't introduced or changed in this PR, I've just moved it to the separate method.

Not sure, if we want to revisit it in the scope of the current change, because the IPv6 host parsing algorithm doesn't actually depend on this list :)

I could add few more tests with chars like ~, if it would be sufficient. But to me, this would cover previously existing IPv4 cases rather than the current change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More tests would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added few new test cases. Those that are commented out are the cases that doesn't work like I would expect according to RFC.

However, I've ensured the behavior on the tests is exactly the same as it was before this PR, so I'm not sure we want to cover missed cases here.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fyudanov
Please see my comment in the code.

@garydgregory garydgregory merged commit e9549ca into apache:master Oct 25, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants