Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary get_token_info #3514

Closed
sarayourfriend opened this issue Dec 11, 2023 · 4 comments · Fixed by #3528
Closed

Remove unnecessary get_token_info #3514

sarayourfriend opened this issue Dec 11, 2023 · 4 comments · Fixed by #3528
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API

Comments

@sarayourfriend
Copy link
Contributor

sarayourfriend commented Dec 11, 2023

Current Situation

get_token_info entirely duplicates the functionality of the Oauth2 provider backend we use. All the information from TokenInfo is available from an authenticated request by way of DRF's implementation.

The Oauth2 provider sets auth (by way of setting access_token) on the DRF request object to the access_token object, which already has application and user populated. The validator also calls access_token.is_valid which already checks the token expiry and does not populate request.auth if it is expired. Every aspect of get_token_info is an unnecessary duplication of what is already happening in the oauth request authentication.

DRF sets auth based on access_token here: https://github.com/encode/django-rest-framework/blob/0f39e0124d358b0098261f070175fa8e0359b739/rest_framework/request.py#L391

request.auth.application contains all the information that the TokenInfo object get_token_info returns has (and more!).

Most routes use the default throttle classes (only thumbnails and a select few others have special throttles, everything else including all search endpoints use the default throttle classes).

There are 7 default throttle classes that run on these requests: burst and sustained for each anon, standard oauth and enhanced oauth (6), in addition to the exempt oauth throttle. Each of these call get_auth_token (through the base class's has_valid_token method). Each call to get_auth_token causes up to two database queries but at minimum one. Each of these queries will be identical each time a throttle calls get_auth_token for a given request. That is, the response will be exactly the same. Replacing these usages with request.auth.<whatever> instead of using the token info object from get_auth_token will remove at least 7 unnecessary additional database queries to every single search request (wow!).

Suggested Improvement

Remove get_token_info and replace all usages of it with request.auth. request.auth will be an instance of AccessToken with application populated with the ThrottledApplication and the User instance for the token. Replace calls to token_info.<whatever> with request.auth.<whatever>. If request.auth is None, then either no token existed on the request or it was invalid. Nothing else needs to be checked.

We can also replace checks in serializers request.user and request.user.is_anonymous with just request.auth is not None, which should simplify the understanding of authentication in those code paths.

Benefit

Remove a significant number of unnecessary queries in the throttle classes that happen on almost every single request.

Also we get to delete code, yay!

@sarayourfriend sarayourfriend added help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Dec 11, 2023
@kb-0311
Copy link
Contributor

kb-0311 commented Dec 14, 2023

@sarayourfriend can you please assign this to me?

@kb-0311
Copy link
Contributor

kb-0311 commented Dec 14, 2023

@sarayourfriend I was making changes to the application. To replace the function calls to get_token_info with request.auth.application I made some changes to remove all the assertion errors during test for eg this is the code I wrote:

if request.auth:
            if not "application" in str(request.auth) or not request.auth.application:
                return None
            client_id, _, verified = (request.auth.application.client_id,request.auth.application.rate_limit_model , request.auth.application.verified)
            if client_id and verified:
                return None

All the auth test cases passed however there were certain assertion failures during testing :
Screenshot from 2023-12-14 11-53-31

And the checks that were failed due to this code were all because of this assertion error
Screenshot from 2023-12-14 11-52-55

@kb-0311
Copy link
Contributor

kb-0311 commented Dec 14, 2023

@sarayourfriend WHat if instead of removing the get_token_info() calls we just add a check within that function itself to see if the request.auth.application contains the .<whatever> we want to fetch and pre-mature return it before the network call? This will minimize the code changes needed and optimize network calls the main purpose of this issue?

for eg:

def get_token_info(token: str):
    """
    Recover an OAuth2 application client ID and rate limit model from an access token.

    :param token: An OAuth2 access token.
    :return: If the token is valid, return the client ID associated with the
    token, rate limit model, and email verification status as a tuple; else
    return ``(None, None, None)``.
    """
    logger = parent_logger.getChild("get_token_info")
    
    CHECK FOR WHETHER THE request.auth.application HAS (client_id, tare_limit_model, verified) TUPLE IF YES return (...request.auth.application.<whatever>)
    try:
        token = AccessToken.objects.get(token=token)
    except AccessToken.DoesNotExist:
        return _no_result

    try:
        application = models.ThrottledApplication.objects.get(accesstoken=token)

@sarayourfriend
Copy link
Contributor Author

The 429 response is a rate limited response, so I don't think it has to do with the change you've made to remove get_auth_token. It might need to work off of @dhruvkb's PR in #3505 where he's fixed the throttling in the unit tests.

I understand the change to get_token_info and see how it is tempting, but I think it is ultimately code smell: there are no cases where request.auth should be empty unless we expect it to be empty, in which case we should be anticipating that by checking hasattr(request, "auth"). There are essentially zero cases where request.auth would be empty but there was anything we could reasonably do to get an AccessToken object from the request. Either the Oauth2 middleware/authentication class already retrieved that information or it didn't, either becasue the bearer token did not exist or it was invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants