Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a builder API for configuring the BasicURLNormalizer #324

Merged
merged 3 commits into from
Oct 5, 2021

Conversation

aecio
Copy link
Contributor

@aecio aecio commented Oct 4, 2021

Usage example:

normalizer = BasicURLNormalizer.newBuilder()
  .idnNormalization(IdnNormalization.PUNYCODE)
  .queryParamsToRemove(
    asList("sid", "phpsessid", "sessionid", "jsessionid")
  )
  .build();

Closes #321.

Usage example:
```
normalizer = BasicURLNormalizer.newBuilder()
  .idnNormalization(IdnNormalization.PUNYCODE)
  .queryParamsToRemove(
    asList("sid", "phpsessid", "sessionid", "jsessionid")
  )
  .build();
```

Closes crawler-commons#321.
- allow to normalize host names to Unicode
@sebastian-nagel
Copy link
Contributor

Thanks, @aecio! I've added the normalization to Unicode. It may in rare situations do unnecessary work if xn-- appears not in the beginning of a host name segment.

@aecio
Copy link
Contributor Author

aecio commented Oct 4, 2021

Sounds good, thanks! Do you want to add an additional line in the CHANGES.txt to mention the IDN to Unicode normalization? This actually sounds like a new feature.

- allow to normalize host names to Unicode (add to changelog)
@sebastian-nagel sebastian-nagel merged commit ec1f2e5 into crawler-commons:master Oct 5, 2021
@sebastian-nagel
Copy link
Contributor

Ok. Updated Changelog and merged. Thanks, @aecio!

@sebastian-nagel sebastian-nagel added this to the 1.2 milestone Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BasicNormalizer] Provide builder to configure the normalizer
2 participants