Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BasicNormalizer] sorting the Query Parameters #246

Closed
Chaiavi opened this issue Jul 11, 2019 · 1 comment
Closed

[BasicNormalizer] sorting the Query Parameters #246

Chaiavi opened this issue Jul 11, 2019 · 1 comment
Labels
enhancement normalizer Issues concerning our URL normalizer
Milestone

Comments

@Chaiavi
Copy link
Member

Chaiavi commented Jul 11, 2019

When normalizing a URL should these two urls be normalized to the same url ?
http://shekhargulati.com?lang=en&article=fred
http://shekhargulati.com?article=fred&lang=en

This can be achieved by sorting the query params.

The above example was taken from UrlCleaner

It is an actual unit test there (one which we fail :-( ), the test name is: shouldSortQueryParameters

@kkrugler
Copy link
Contributor

We sort query parameters when normalizing (in other projects), so yes, I think that's true.

@Chaiavi Chaiavi added enhancement normalizer Issues concerning our URL normalizer labels Jul 12, 2019
@jnioche jnioche added this to the 1.2 milestone Sep 15, 2020
aecio added a commit to aecio/crawler-commons that referenced this issue Jan 4, 2021
- Sort query parameters (fix crawler-commons#246)
- Allows to (optionally) remove common irrelevant query parameters
- Consistently encode query parameters with
'application/x-www-form-urlencoded'
aecio added a commit to aecio/crawler-commons that referenced this issue Jan 4, 2021
- Sort query parameters (fix crawler-commons#246)
- Allows to (optionally) remove common irrelevant query parameters
- Consistently encode query parameters with
'application/x-www-form-urlencoded'
aecio added a commit to aecio/crawler-commons that referenced this issue Jan 5, 2021
- Sort query parameters (fix crawler-commons#246)
- Allows to (optionally) remove common irrelevant query parameters
- Consistently encode query parameters with
'application/x-www-form-urlencoded'
sebastian-nagel added a commit that referenced this issue Sep 21, 2021
…oses #309

- rebase to master and squash commits
- fix failing sitemaps unit tests with URL filtering using BasicURLNormalizer
  (sort query params in test sitemap)
- CHANGES.txt: updated to follow style, added missing entry for preceding commit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement normalizer Issues concerning our URL normalizer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants