You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since AI scrapers are terrorizing the web and flooding innocent gitea instances, it would make sense to have an option to only allow expensive endpoints (like /src/commit or /blame) for logged in users.
What I have observed is that crawlers like Claudebot and Bytespider don't respect my robots.txt and decide to crawl every single file from every single commit. For big repositories this can become a massive performance hit since gitea has to run git to be able to serve the requests, which has a lot of overhead. I even enabled a redis cache but since they hit new files all the time it didn't help much.
As a workaround I have configured my reverse proxy nginx to redirect these endpoints to an Anubis instance (https://anubis.techaro.lol/) which seems to kill most of the scrapers or at least wastes their time for long enough to make their DDOS (because that's what it is, really!) less annoying.
However, since this is a solution that works on proxying with nginx, every user sees the Anubis thing before being able to look at commits, even if they are logged in. Therefore it would be preferrable to just have an option to disallow these endpoints. If someone external wants to look at the commits they can just check out the repository and look at the history there.
Screenshots
No response
The text was updated successfully, but these errors were encountered:
Feature Description
Since AI scrapers are terrorizing the web and flooding innocent gitea instances, it would make sense to have an option to only allow expensive endpoints (like
/src/commit
or/blame
) for logged in users.What I have observed is that crawlers like Claudebot and Bytespider don't respect my robots.txt and decide to crawl every single file from every single commit. For big repositories this can become a massive performance hit since gitea has to run git to be able to serve the requests, which has a lot of overhead. I even enabled a redis cache but since they hit new files all the time it didn't help much.
As a workaround I have configured my reverse proxy nginx to redirect these endpoints to an Anubis instance (https://anubis.techaro.lol/) which seems to kill most of the scrapers or at least wastes their time for long enough to make their DDOS (because that's what it is, really!) less annoying.
However, since this is a solution that works on proxying with nginx, every user sees the Anubis thing before being able to look at commits, even if they are logged in. Therefore it would be preferrable to just have an option to disallow these endpoints. If someone external wants to look at the commits they can just check out the repository and look at the history there.
Screenshots
No response
The text was updated successfully, but these errors were encountered: