-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle robots.txt with missing sections (and implicit master rules) #114
Comments
Here's a saved copy of the robots.txt file...
|
sebastian-nagel
added a commit
to sebastian-nagel/crawler-commons
that referenced
this issue
Jun 16, 2023
…ster rules) - add unit test to verify solution of crawler-commons#114
sebastian-nagel
added a commit
to sebastian-nagel/crawler-commons
that referenced
this issue
Jun 16, 2023
…gents, closes crawler-commons#390 [Robots.txt] Handle robots.txt with missing sections (and implicit master rules), fixes crawler-commons#114 - do not close rule blocks / groups on other directives than specified in RFC 9309: groups are only closed on a user-agent line at least one allow/disallow line was read before - set Crawl-delay independently from grouping, but never override or set the value for a specific agent using a value defined for the wildcard agent
sebastian-nagel
added a commit
to sebastian-nagel/crawler-commons
that referenced
this issue
Jul 10, 2023
…ster rules) - add unit test to verify solution of crawler-commons#114
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The robots.txt file at http://www.scotsman.com/robots.txt has a number of issues...
The text was updated successfully, but these errors were encountered: