Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API client pagination - can't use 'x-total-pages' for GitLab.com #41

Closed
cdboer opened this issue May 9, 2021 · 2 comments · Fixed by #43
Closed

API client pagination - can't use 'x-total-pages' for GitLab.com #41

cdboer opened this issue May 9, 2021 · 2 comments · Fixed by #43
Labels
bug Something isn't working
Milestone

Comments

@cdboer
Copy link
Collaborator

cdboer commented May 9, 2021

The api client does not request more than the first page for some resources from projects hosted at GitLab.com.
The pagination approach relies on the key x-total or x-total-pages being present in the request response headers.

The GitLab API doc's state the following concerning x-total and x-total-pages:

For performance reasons, if a query returns more than 10,000 records, GitLab doesn’t return the following headers:

x-total.
x-total-pages.
rel="last" link.

If both keys are missing from response headers, gitlab2prov naivly assumes that there is only one page of the requested resource.
Not all GitLab.com projects are affected. Updates will follow.

See also this section in the official GitLab documentation.

@cdboer cdboer added the bug Something isn't working label May 9, 2021
@cdboer
Copy link
Collaborator Author

cdboer commented May 16, 2021

This issue requires a rework of the gitlab2prov.api.request_handler class. We can no longer rely on the x-total pages key and can therefore not craft all request coroutines 'a priori'. A request coroutine could create new ones if needed and add them to a centralized queue if there are more pages left.

class RequestHandler:

    def __init__(self):
        self.client = None
        self.queue = []

    async def request_page(self, url, page_number):
        url = url.update_query({"page": page_number})
        async with await self.client.get(url) as resp:
            json = await resp.json()
            next_page = resp.headers["x-next-page"]
        if page_number < next_page:
            self.queue.append(self.request_page(url, next_page))
        return json

@cdboer
Copy link
Collaborator Author

cdboer commented May 16, 2021

In the future we should move to KeySet-based pagination, as GitLab is slowly converting their API endpoints to support it. This should speed up response times for requests for bigger projects.

@cdboer cdboer added this to the 0.5 milestone May 16, 2021
cdboer added a commit that referenced this issue May 16, 2021
Rewrite of RequestHandler:

Requests are now queued on demand.
Added pagination fallback to the key 'x-next-page' to not solely rely on 'x-total-pages'.
Add initial exception system, see TODO notes. Exceptions are supposed to be handled in the GitlabClient methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant