Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async support #1538

Open
OlegYurchik opened this issue May 24, 2020 · 19 comments
Open

Async support #1538

OlegYurchik opened this issue May 24, 2020 · 19 comments

Comments

@OlegYurchik
Copy link

What do you think about PyGithub async support? If I do async support for PyGithub - do you accept it?

@s-t-e-v-e-n-k
Copy link
Collaborator

In what way would you add in async support? Given most operations are done remotely on GitHub and our code is waiting for a response or a JSON blob back, how would it help?

@victorgveloso
Copy link

Given most operations are done remotely on GitHub and our code is waiting for a response

That's a good reason for async introduction! I mean... PyGithub should handle API access as coroutines, otherwise most of the time it's waiting for server's reponse instead of doing something useful. It's a common concept on nodejs and it's currently supported on @octokit/rest.js
I'm surprised nobody asked this feature before...

Obs: maybe I misunderstood how PyGithub currently works, but I think sequential and synchronous API wrappers are less efficient than an async one.

@OlegYurchik
Copy link
Author

I could add a new AsyncRequester class and make in all other classes asynchronous methods for interacting with it, while preserving the logic.

Asynchronous code could help in tasks where you need to produce a large number of queries, for example, in a search. It would also allow you to quickly work with multiple accounts on Github

@s-t-e-v-e-n-k
Copy link
Collaborator

Which sounds like a complete redesign, along with adding support for utilising multiple accounts. I love your enthusiasm, but I think it's an awful lot of work for not enough gain.

@victorgveloso
Copy link

victorgveloso commented May 25, 2020

It might involve a lot of work, but I'd love to see it implemented. Currently I'm stuck with JS on my research because of @octokit/rest.js performance overcomes PyGithub's. If some help is wanted I would be glad to work on this too.

Btw I think multiple accounts support would be too much! Isn't just async a tremenduous first step towards performance gains?

@OlegYurchik
Copy link
Author

Maybe, I can do any editions and show it in pull request? For test

@abhn
Copy link

abhn commented Jun 6, 2020

Asyncio sounds like a good idea given my usecase. I am trying to read all files in a repository recursively, and synchronous requests are just too slow (I might be missing something like rate limiting on github's api but we could definitely make such operations faster).

@s-t-e-v-e-n-k
Copy link
Collaborator

I would strongly suggest using something like GitPython for that rather than requesting everything via the GitHub API.

@abhn
Copy link

abhn commented Jun 6, 2020

Thanks for the interesting suggestion, I'll give it a try as it does make sense to do it that way.

@basnijholt
Copy link

I don't think this issue should be closed. Using asyncio for IO-bound operations is the perfect use-case.

@victorgveloso
Copy link

victorgveloso commented Jun 27, 2021

This short YouTube video demonstrate how asyncio could enhance http requests performance in python programs: https://youtu.be/m_a0fN48Alw

Please @OlegYurchik, reopen this issue!

@Hyperx837
Copy link

Hyperx837 commented Sep 17, 2021

for my use case, I'm creating a discord bot that has a few commands that interact with Github. discord.py has an event-based aio system. so any library that doesn't support async-await is not usable. asyncio without this kind of use case doesn't make the advantage any lesser. for example, if getting the content of the first comment of an issue takes 1 sec, it would take 100s to process the whole thing. with asyncio 100 such requests would take 2-3s. which is 98s faster than the current implementation with no extra resource bottleneck. I would say I really don't have that much of a problem with performance with my case though. it's the fact that we can't use blocking implementations of packages in async frameworks like discord.py and fastapi. because at a higher scale, doing such might lead the program to miss some events when it's in the blocked state

@dineshr93
Copy link

Especially all the PaginatedList of various objects(issue, pullrequests etc) can be made faster with HTTPx and asyncio

This short YouTube video demonstrate how asyncio could enhance http requests performance in python programs: https://youtu.be/m_a0fN48Alw

Please @OlegYurchik, reopen this issue!

@victorgveloso
Copy link

Especially all the PaginatedList

Aren't PaginatedLists necessarily sequential? I mean, the point of paginated lists is querying only pages whose data you intend to use. How would you implement async PaginatedLists? What do you do when the pages you will need aren't known ahead of time? And what about situations where you want to iterate in a specific order?

In my opinion, methods returning PaginatedLists could have an optional parameter which describes a number of pages to be pre-fetched asynchronously. The returned object could be of another type (let's say it'll be named AsyncList) that implements PaginatedLists' interface, hence could be used interchangeably. This solves the aforementioned problem by letting the user decide between performance and previsibility.

@dineshr93
Copy link

Especially all the PaginatedList

Aren't PaginatedLists necessarily sequential? I mean, the point of paginated lists is querying only pages whose data you intend to use. How would you implement async PaginatedLists? What do you do when the pages you will need aren't known ahead of time? And what about situations where you want to iterate in a specific order?

In my opinion, methods returning PaginatedLists could have an optional parameter which describes a number of pages to be pre-fetched asynchronously. The returned object could be of another type (let's say it'll be named AsyncList) that implements PaginatedLists' interface, hence could be used interchangeably. This solves the aforementioned problem by letting the user decide between performance and previsibility.

Yes. That's a good Idea. Since this lib is slow for big GitHub projects in addition I am using ghapi where we can query issues and commits using page= & per_page= parameter . Just In case if it helps anyone..

@OlegYurchik OlegYurchik reopened this Apr 20, 2022
@ahonnecke
Copy link

I'd like to see this implemented and would be interested in helping.

@basnijholt
Copy link

There seem to be some alternatives now for async:

@samj1912
Copy link

I have been relying on https://github.com/yanyongyu/githubkit which is the most featured for python github libraries. It has a fully typed api using pydantic and supports both sync and async requests using httpx. Works wonderfully with something like fastapi.

@Ousret
Copy link

Ousret commented May 9, 2024

Hello!

There's an opportunity that can be considered.
As this project is tied to Requests, you may look at Niquests.

Why?

  • Drop-in & faster replacement for Requests
  • Both synchronous and asynchronous interfaces, perfectly synchronized
  • HTTP/2 by default with proper multiplexing

You could generate the async code from your existing sync code, thus avoiding the "extra" maintenance burden.
I am willing to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants