Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Robots Exclusion Protocol (REP) IETF Draft: port unit tests #360

Conversation

sebastian-nagel
Copy link
Contributor

Port unit tests from https://github.com/google/robotstxt : https://github.com/google/robotstxt/blob/master/robots_test.cc

  • not all tests pass!
  • some TODOs left:
    • tests we have no API endpoints now (we eventually may in the future)
    • tests to test methods used in tests (maybe skip them)

@sebastian-nagel
Copy link
Contributor Author

Rebased on top of #362 - still five unit tests fail:

@crawler-commons crawler-commons deleted a comment from wbartussek Jun 24, 2023
@crawler-commons crawler-commons deleted a comment from wbartussek Jun 24, 2023
@sebastian-nagel sebastian-nagel marked this pull request as ready for review July 10, 2023 09:25
@sebastian-nagel
Copy link
Contributor Author

Unit tests should succeed, once rebased on top of #430.

- port unit tests from https://github.com/google/robotstxt
- adapt "Google-only" unit tests dealing with overlong lines
  and none-standard user-agent names
- port unit tests from https://github.com/google/robotstxt
- adapt unit tests dealing with overlong lines and percent-encoded
  URL paths were the behavior of SimpleRobotRulesParser is not
  wrong and maybe even seen as an improvement compared to restrictions
  put on API input params by the Google robots.txt parser
@sebastian-nagel
Copy link
Contributor Author

Thanks for the review, @rzo1!

@sebastian-nagel sebastian-nagel merged commit 6fb34cf into crawler-commons:master Jul 12, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants