Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of precedence for google user agents #46

Closed
kennylajara opened this issue Apr 6, 2017 · 7 comments
Closed

Order of precedence for google user agents #46

kennylajara opened this issue Apr 6, 2017 · 7 comments

Comments

@kennylajara
Copy link
Contributor

kennylajara commented Apr 6, 2017

Some crawlers (like GoogleBot) may have multiple useragents and one of the names is like a common name that should match for the other specific names (this can be better understood by reading Google's Robots.txt Specifications - Order of precedence for user agents).

It will be nice if one can input an ordered array in the useragent param of the validator in order to replicate that behavior.

I can work on this in another moment if nobody takes the job. I don't have the time right now.

@bopoda
Copy link
Owner

bopoda commented Apr 6, 2017

may be do you have example which works incorrectly? it will be very good to check it.
Now rules should be validated by rules which related to crawler name if rules for this crawler exist otherwise by rules for '*'.

@kennylajara
Copy link
Contributor Author

kennylajara commented Apr 6, 2017

Well I have not tested, but looking at the code I see the function expect a string, so * is the only fallback.

The idea is that (using Google's example), when Google-News check the robots.txt it looks for Google-News, if not exist, the looks for GoogleBot and if not exist, then it looks for *. This script, as far I can see in the code (not tested, and I can't right now), would jump directly from Google-News to * without stepping on GoogleBot.

What I suggest is to open the to replicate Google*s behavior.

@bopoda
Copy link
Owner

bopoda commented Apr 6, 2017

I understood what you mean. Yes, you are right.

seems it is better to implement only for Google user-agents, not all.

@kennylajara
Copy link
Contributor Author

I can do it later if you are open to the posibility. I think it wouldn't add more than 5 or 6 lines to the code and looks like a really good Google's level feature to me.

What do you think? Would you merge it?

@bopoda
Copy link
Owner

bopoda commented Apr 6, 2017

Yes, will be great.
only thing i suggest to include in PR the simplest phpunit test which should check how this functionality works.

@kennylajara
Copy link
Contributor Author

kennylajara commented Apr 6, 2017

Lol... Ok...
I don't really know phpunit, but I want to learn, so... Do you know some simple tutorial or something?

@bopoda
Copy link
Owner

bopoda commented Apr 6, 2017

feel free to create PR without test.
You can see another tests in tests/ directory.
Run tests using:

$ cd /repo
$ phpunit  #to run all tests

or:
phpunit tests/HostTest.php #to run single test

@bopoda bopoda changed the title Order of precedence for user agents Order of precedence for google user agents Apr 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants