Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4 #766

Merged
merged 2 commits into from Aug 22, 2023

Conversation

sebastian-nagel
Copy link
Contributor

Note: because NUTCH-2996 requires the upgrade to crawler-commons 1.4, all changes of NUTCH-2995 are included. Only bc5326d contains the changes for NUTCH-2996.

  • split and lowercase agent names (if multiple) at configuration time and pass as collection to SimpleRobotRulesParser
  • update RobotRulesParser command-line help
  • update unit tests to use new API

…mmons 1.4)

- split and lowercase agent names (if multiple) at configuration time
  and pass as collection to SimpleRobotRulesParser
- update RobotRulesParser command-line help
- update unit tests to use new API
…mmons 1.4)

- update description of Nutch properties to reflect the changes due to
  the usage of the new API entry point and the upgrade to crawler-commons 1.4
@sebastian-nagel
Copy link
Contributor Author

Rebased on top of current master which already includes NUTCH-2995.

@sebastian-nagel sebastian-nagel merged commit 070c115 into apache:master Aug 22, 2023
1 check passed
@sebastian-nagel sebastian-nagel deleted the NUTCH-2996 branch August 22, 2023 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant