Skip to content

Maintained robots.txt disallow list for known AI scraper tools

License

Notifications You must be signed in to change notification settings

dmitrizzle/disallow-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Opt your web properties out from known AI scraper tools.

yarn add disallow-ai or npm i disallow-ai

This is an opinionated, maintained list of known user agents of scraper bots that use web content to train AI models.

This package is intended to help webmasters automatically opt out of training AI/machine learning models with the content of a property. Its intention is to remain visible for search engines and productivity tools. It's optimized for Node.js servers (e.x., Express/Next.js) but you can also copy-paste contents from src/robots.txt directly into your robots.txt file on any web server.

API.

  • printRobotsTXT(options) prints text string for robots.txt (same content as the above snippet).
    • options.path if you want to set a disallow path to something other than /, you'll need to pass a value to this key.
  • userAgents is a direct reference to an array of objects with all the user agent info.

Example.

const express = require("express");
const server = express();

const { printRobotsTXT } = require("disallow-ai");
server.get("/robots.txt", (req, res, next) => {
    res.type("text/plain");
    res.send(printRobotsTXT());
});

server.listen();

You can run an example server with node ./example/server.js.

Sources.

https://darkvisitors.com/

Contributions welcome.

About

Maintained robots.txt disallow list for known AI scraper tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published