Skip to content
This repository has been archived by the owner on Nov 25, 2023. It is now read-only.

Please add support for the robots.txt file #2

Closed
ygguser opened this issue Apr 3, 2023 · 2 comments
Closed

Please add support for the robots.txt file #2

ygguser opened this issue Apr 3, 2023 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ygguser
Copy link

ygguser commented Apr 3, 2023

Please add reading the robots.txt file and crawling based on its content in the crawler functions.

@d47081
Copy link
Collaborator

d47081 commented Apr 3, 2023

Nice think! forgot about this important moment

@d47081
Copy link
Collaborator

d47081 commented Apr 7, 2023

Hello, after #3 implementation, added the basic robots.txt support.

Maybe I wrote 'new bicycle' but as is.

For right now, crawler supports the User-agent: * section, Allow/Disallow constructions only.

Plus, have added the CRAWL_ROBOTS_DEFAULT_RULES directive to config file, where search provider can set own default rules, when the website does not provide this file.

For contributors, the library class presented here:
https://github.com/YGGverse/YGGo/blob/main/library/robots.php

Please reopen if I something missed.

@d47081 d47081 closed this as completed Apr 7, 2023
@d47081 d47081 added the enhancement New feature or request label Apr 7, 2023
d47081 pushed a commit that referenced this issue Apr 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants