Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: @block_robots decorator for views #42

Open
groovecoder opened this issue Aug 27, 2015 · 3 comments
Open

Feature request: @block_robots decorator for views #42

groovecoder opened this issue Aug 27, 2015 · 3 comments

Comments

@groovecoder
Copy link

It would be nice if django-robots included a decorator to block robots from views based on User-agent (like robots.txt). It would help django apps outright prevent robots - even mis-behaving ones that don't follow robots.txt - from accessing views that they shouldn't.

@SalahAdDin
Copy link

👍

@yakky
Copy link
Member

yakky commented Dec 22, 2015

I think it's out the scope of this application
IMHO a decorator that blocks "rogue" robots to access a view is an application in itself as you need to implement and maintain the list of robots UA string (and even then i doubt 'rogue' robots uses a specific UA string)

@some1ataplace
Copy link

some1ataplace commented Mar 27, 2023

  1. In the django-robots project directory, create a new file called block_robots.py with the following code:
import re
from functools import wraps
from django.http import HttpResponseForbidden

def block_robots(view_func):
    @wraps(view_func)
    def _wrapped_view(request, *args, **kwargs):
        # Update the list of blocked user agents accordingly
        blocked_agents = [
            'Googlebot',
            'Bingbot',
            'Slurp',
            'DuckDuckBot',
            'Baiduspider',
            'YandexBot',
            'Sogou',
            'Exabot',
            'Facebot',
            'ia_archiver'
        ]
        user_agent = request.META.get('HTTP_USER_AGENT', "")

        if any(re.search(agent, user_agent, re.IGNORECASE) for agent in blocked_agents):
            return HttpResponseForbidden("Forbidden for robots")

        return view_func(request, *args, **kwargs)
    return _wrapped_view

  1. Now, you can use the @block_robots decorator in your views.py:
from django.http import HttpResponse
from .block_robots import block_robots

@block_robots
def my_protected_view(request):
    return HttpResponse("This view is protected from robots.")

This code defines a block_robots decorator that first checks whether the User-agent of the incoming request matches any of the blocked agents in the list. If a match is found, an HTTP 403 Forbidden response is returned. If no match is found, the request is allowed to continue to the wrapped view.

Feel free to customize the list of blocked agents according to your requirements. The code uses regular expressions to enable partial matches and case-insensitive search, so you can easily include wildcards in the blocked agents list as needed.

Remember that even though this workaround prevents misbehaving bots from accessing your views, the ideal method of restricting access is still employing a properly configured robots.txt file.


Here is sample code to create a custom decorator @block_robots that will block robots from views based on user-agent:

# views.py
from django.http import HttpResponse, HttpResponseForbidden
from django.conf import settings
from django_robots.decorators import check_robots_txt

def my_view(request):
    # view logic here
    return HttpResponse('This is my view!')

@check_robots_txt
def my_view_with_robot_block(request):
    if robot_blocked(request.META.get('HTTP_USER_AGENT', '')):
        return HttpResponseForbidden()
    # view logic here
    return HttpResponse('This is my view with robot block!')


def robot_blocked(user_agent):
    blocked_robots = getattr(settings, 'BLOCKED_ROBOTS', [])
    return user_agent.lower() in blocked_robots

You would need to define the BLOCKED_ROBOTS list in your Django settings file with the user-agent strings of the robots you want to block. The decorator @check_robots_txt is included to ensure that the view respects the robots.txt file. You can add this decorator to any view you want to respect the robots.txt file, even if it doesn't need to block robots.

Here's an example of how you could define the BLOCKED_ROBOTS in your Django settings file:

settings.py

BLOCKED_ROBOTS = [
    'googlebot',
    'bingbot',
    'yahoo',
    # add more robots here as needed
]

Note that this example is case-insensitive, so any user-agent string containing "googlebot" will be blocked, regardless of whether it's spelled in uppercase or lowercase letters. If you want to make it case-sensitive, you can remove the lower() method in the robot_blocked function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants