Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return to using stdlib's URL escaping #14

Open
ghost opened this issue Aug 23, 2015 · 5 comments · May be fixed by #40
Open

Return to using stdlib's URL escaping #14

ghost opened this issue Aug 23, 2015 · 5 comments · May be fixed by #40

Comments

@ghost
Copy link

ghost commented Aug 23, 2015

Issue 5684 is now closed, so it might be time to revert PR 9.

@mna
Copy link
Member

mna commented Aug 30, 2015

Yeah, it is now fixed in Go1.5 but it's unlikely people will all upgrade that fast, so I'd be tempted to leave it in for a while, at least until Go1.6, maybe even another release after that. I think a conditional build tag could be used, I may take a look at that when I have a moment, so I'll leave that open.

Thanks for the heads up!

@beeker1121
Copy link
Contributor

beeker1121 commented Nov 15, 2016

I tried making this change locally (returning u.String() in the NormalizeURL() method and using url.QueryEscape in the sortQuery() method), however quite a bit of the tests fail after doing so.

Should this be expected, and the tests just need to be fixed? I'm just unsure which package (net/url or urlesc) is handling everything correctly, as per the RFC stuff.

Here's the output of a portion of the tests:

purell_test.go:715: running LowerScheme...
purell_test.go:715: running LowerScheme2...
purell_test.go:715: running LowerHost...
purell_test.go:715: running UpperEscapes...
purell_test.go:731: UpperEscapes - FAIL expected 'http://www.whatever.com/Some%AA%20Special%8Ecases/', got 'http://www.whatever.com/Some%aa%20Special%8Ecases/'
purell_test.go:715: running UnnecessaryEscapes...
purell_test.go:731: UnnecessaryEscapes - FAIL expected 'http://www.toto.com/AB.D/23R-/_~', got 'http://www.toto.com/%41%42%2E%44/%32%33%52%2D/%5f%7E'
purell_test.go:715: running RemoveDefaultPort...
purell_test.go:715: running RemoveDefaultPort2...
purell_test.go:715: running RemoveDefaultPort3...
purell_test.go:715: running Safe...
purell_test.go:731: Safe - FAIL expected 'http://www.src.ca/to%1Ato%8B%EE/OKnowABC~', got 'http://www.src.ca/to%1ato%8b%ee/OKnow%41%42%43%7e'
purell_test.go:715: running BothLower...
purell_test.go:731: BothLower - FAIL expected 'http://www.src.ca:80/to%1Ato%8B%EE/OKnowABC~', got 'http://www.src.ca:80/to%1ato%8b%ee/OKnow%41%42%43%7e'
purell_test.go:715: running RemoveTrailingSlash...
purell_test.go:715: running RemoveTrailingSlash2...
purell_test.go:715: running RemoveTrailingSlash3...
purell_test.go:715: running AddTrailingSlash...
purell_test.go:731: AddTrailingSlash - FAIL expected 'http://www.SRC.ca:80/', got 'http://www.SRC.ca:80%2F'

Edit: After researching more into it, it appears that this commit golang/go@874a605 changed how the String() method decides how to encode the Path field. A new RawPath field was introduced in this commit, and this field is used to compare against the Path field. If RawPath is a valid encoding of Path, it simply uses it instead of Path.

One fix that resolves most of the test errors is just setting u.RawPath = u.Path before calling u.String() (if taking urlesc package out of the picture). However, while this solves all of the main tests, the TestDecodeUnnecessaryEscapesAll and TestEncodeNecessaryEscapesAll tests fail.

While the standard String() method gets the majority of the escaping correct after doing the path fix, it still does not handle the !, ', (, ), [, ], and * characters as we would probably want. I believe this is due to Go trying to follow the newest RFC specs. RFC 2396 labels these characters (marks) as unreserved (except [ and ] which most browsers accept, so we include), while RFC 3986 does not allow them anymore.

So, what do you think would be best? AFAIK, most browsers and other clients will still allow the unreserved characters labeled in RFC 2396, which the current urlesc package handles escaping correctly. Maybe it's best to keep using that package and forget about the standard lib for now?

@mna
Copy link
Member

mna commented Nov 16, 2016

Yeah, I did give it a shot a while ago and ran into those issues but I didn't dig deeper at that moment, sorry I should've documented it a bit here. Thanks for looking into it, I think that regardless of what browsers support, the change should try to maintain behaviour with what purell does now.

If it's down to only those few characters, maybe it could be done in purell itself, modifying the *url.URL struct directly, so we could get rid of the external dep.

@AlekSi
Copy link
Contributor

AlekSi commented Aug 10, 2017

Would like to see the latest comment implemented.

@mna
Copy link
Member

mna commented Aug 10, 2017

Me too! I'd be happy to accept a PR that implements this and makes all tests pass.

@natasha-jarus natasha-jarus linked a pull request Mar 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants