-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip extraneous characters at end of URLs #5715
Conversation
* URLs that end in %20) have these two characters removed * Add test_handle_404_error_strip_extraneous_chars
Do you think we could do this with a regular expression matching any of those characters multiple times at the end of a string and then using Something like this (pseudocode, I haven't tried it): extraneous_char_re = re.compile(r'[!#$%&()*+,-.:;<=>?@\[\]^_`{|}~]+$')
request.path = re.sub(extraneous_char_re, '', request.path) |
I had a similar thought, Will. Or if we wanted to avoid regex, we could I was testing something to that effect yesterday afternoon, but ran into some weirdness with how browsers convert spaces to %20 that I couldn't resolve before the end of the day. |
I haven't tested, but I'd assume that |
Probably true. |
New logic goes as follows: 1. Lowercase the path. 2. Check for and remove extraneous characters at the end of the path. - List of extraneous characters now includes curly quotes, fancy dashes, and ellipses. 3. If the path has changed, try resolving the path. 1. If it resolves, redirect to it. 2. If it doesn't, return a 404 for the original path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring of handle_404_error looks great as it redirects to a lowercase path and strips off extraneous characters from the end. A unit test should be added that has a mixed case URL with one or two characters from extraneous_char_re
at the end of the URL.
Co-authored-by: Andy Chosak <andy.chosak@cfpb.gov>
Discovered that `resolve` will always return a successful result because any URLs that don't match a standard pattern match the Wagtail fallback pattern. Didn't catch this before because a bug in the slash-appending logic was causing a slash to always be appended, which was "correctly" failing to resolve some URLs. @chosak and I agreed that it wasn't worth the complexity to test for both Django and Wagtail URLs (which would involve getting the current site and testing with one of its class methods), so falling back to just doing a redirect if any transformation of the URL occurred.
@cwdavies and @willbarton Retreating to just doing a single redirect if the path changed at all. Reasons are detailed in the 8b495e0 commit message. Let me know if you have questions or concerns. |
@Scotchester nope, that sounds reasonable. |
(Note: Still need to write more tests before merging.) |
New commit pushed with what I think are adequate tests. Ready for final review, @cwdavies @chosak @willbarton @higs4281. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Scotchester has provided really good unit tests that check one, two, and multiple extraneous characters at end of URL in addition to mixed case for the new handle_404_error
feature.
Requested change was made, and subsequently refactored away
Update
handle_404_error
to remove the last character from the URL if that character is in theextraneous_char_list
Additions
Removals
Testing
Checklist