robots.txt for AI-related crawlers and bots #3900

zackkrida · 2024-03-12T14:31:53Z

Problem

We currently get a decent amount of bot traffic that while not currently disrupting service does place considerable load on our servers.

This primarily comes in the form of frontend, client-side searches.

One recent example is the https://imagesift.com/ reverse image search site, a service run by AI company https://thehive.ai/.

Description

Consider adding new robots.txt rules to block a majority of these platforms. This behavior violates our terms of service and these users should be using our API with its throttling rules instead.

This blog post from Neil Clarke shows some examples:

https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website

sarayourfriend self-assigned this Apr 9, 2024

sarayourfriend mentioned this issue Apr 9, 2024

Add (or update) robots.txt and ai.txt to block AI crawlers #4077

Merged

5 tasks

sarayourfriend closed this as completed in #4077 Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

robots.txt for AI-related crawlers and bots #3900

robots.txt for AI-related crawlers and bots #3900

zackkrida commented Mar 12, 2024

robots.txt for AI-related crawlers and bots #3900

robots.txt for AI-related crawlers and bots #3900

Comments

zackkrida commented Mar 12, 2024

Problem

Description