-
Notifications
You must be signed in to change notification settings - Fork 24
Closed
Milestone
Description
We have a mechanism that allows subscription to Google Scholar URLs. We then use our regular feed-fetching tools to regularly poll the Scholar URL and parse the contents.
However, Google is somewhat aggressive about identifying "bot" traffic. As such, we should introduce mitigations that prevent an IP/client from getting blocked. A couple initial ideas that come to mind:
- Automatic polling/fetching for Scholar feeds should be limited. Instead of our default interval, perhaps once daily would be sufficient
- Keep track of Google Scholar pings from the WP installation within a given period, and cap it. So, perhaps, only one or two per hour, or perhaps 10 per day, or some other limiting/spacing mechanism
- We have a manual button 'Refresh Feed Items', which should also respect these limits.
- Don't want to be scummy, but if there's something we could set in our request headers, like perhaps some sort of user agent string indicating that we're not a terrible bot, it would be worth exploring.
Happy to hear (or see) other suggestions for best practices.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels