Skip to content

Conversation

@sebastian-nagel
Copy link
Contributor

@sebastian-nagel sebastian-nagel commented Nov 28, 2016

Restore the default behavior before NUTCH-1712 and make the usage of URL filters and normalizers configurable via command-line options:

  • -filterNormalizeAll : normalize and filter all URLs including the URLs of existing CrawlDb records
  • -noNormalize and -noFilter : do not normalize resp. filter any URLs (new injected or existing ones)

@sebastian-nagel sebastian-nagel merged commit 5945db2 into apache:master Apr 6, 2017
@sebastian-nagel sebastian-nagel deleted the NUTCH-2335 branch August 12, 2017 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant