Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robots.txt for AI-related crawlers and bots #3900

Closed
zackkrida opened this issue Mar 12, 2024 · 0 comments · Fixed by #4077
Closed

robots.txt for AI-related crawlers and bots #3900

zackkrida opened this issue Mar 12, 2024 · 0 comments · Fixed by #4077
Assignees
Labels
🗄️ aspect: data Concerns the data in our catalog and/or databases ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: frontend Related to the Nuxt frontend 🔒 staff only Restricted to staff members

Comments

@zackkrida
Copy link
Member

Problem

We currently get a decent amount of bot traffic that while not currently disrupting service does place considerable load on our servers.

This primarily comes in the form of frontend, client-side searches.

One recent example is the https://imagesift.com/ reverse image search site, a service run by AI company https://thehive.ai/.

Description

Consider adding new robots.txt rules to block a majority of these platforms. This behavior violates our terms of service and these users should be using our API with its throttling rules instead.

This blog post from Neil Clarke shows some examples:

https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website

@zackkrida zackkrida added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 🔒 staff only Restricted to staff members 🧱 stack: frontend Related to the Nuxt frontend 🗄️ aspect: data Concerns the data in our catalog and/or databases labels Mar 12, 2024
@sarayourfriend sarayourfriend self-assigned this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗄️ aspect: data Concerns the data in our catalog and/or databases ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: frontend Related to the Nuxt frontend 🔒 staff only Restricted to staff members
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants