Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fake User-Agent if response is 403? #159

Closed
claudep opened this issue Feb 5, 2023 · 3 comments · Fixed by #169
Closed

Fake User-Agent if response is 403? #159

claudep opened this issue Feb 5, 2023 · 3 comments · Fixed by #169

Comments

@claudep
Copy link
Contributor

claudep commented Feb 5, 2023

I have some sites that return a 403 response for both HEAD and GET requests when the User-Agent is not in some whitelisted strings. Here's an example: https://www.cairn.info/revue-l-economie-politique-2005-3-page-60.htm Its probably a measure to avoid some bot traffic.

If you set the User-Agent as looking like a browser (e.g. 'Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0'), it returns a 200 status code.

Would it be acceptable (in the ethical sense) to try with a "fake" user agent in the case of a 403 response?

@claudep
Copy link
Contributor Author

claudep commented Feb 5, 2023

https://www.un.org/ is another example.

@timobrembeck
Copy link
Contributor

Would it be acceptable (in the ethical sense) to try with a "fake" user agent in the case of a 403 response?

In my opinion, this would be justified. Since we try to minimize traffic by checking links only once no matter how often they occur in the content and by only checking URLs once every EXTERNAL_RECHECK_INTERVAL, this cannot be considered malicious behavior. In comparison to the website's regular visitors and all the traffic caused by enumeration tools (which usually also fake their user agent), our checks will probably not put excessive load on the servers.
Also, (at least in our use case) it's in the best interest of the website operators to be considered a valid link, otherwise our clients might remove the link from their content.

@claudep
Copy link
Contributor Author

claudep commented Feb 19, 2023

Thanks, I appreciate your point of view. It makes much sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants