Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_issues() result has totalCount maxed out to 1000 #1309

Open
dleach02 opened this issue Dec 5, 2019 · 12 comments
Open

search_issues() result has totalCount maxed out to 1000 #1309

dleach02 opened this issue Dec 5, 2019 · 12 comments

Comments

@dleach02
Copy link

dleach02 commented Dec 5, 2019

The paginated list returned from a search_issues() will report a totalCount that maxes out to 1000 when a search query results in more than 1000 items. The totalCount() method in PaginatedList() will get headers and data from result of a requester.requestJsonAndCheck() using the first URL. The logic then tests to see if 'link' is not in headers which in my case there is a 'link' so it falls to the else side and parses the lastUrl line for 'page' which is 1000.

Problem though is that in my search, the original 'data' structure has a valid 'total_count' field of 3041 so I'm not sure why the logic tries to derive something from the lastUrl which is generating an incorrect value?

And if I iterate through the PaginatedList returned I count only 1020 items so I'm not able to iterate through all 3041 items. Note that I put rate limiting code in the iteration loop to sleep checking on get_rate_limit().search.remaining to go nonzero.

@dleach02
Copy link
Author

dleach02 commented Dec 6, 2019

this may be some limit in number of pages the search will support... pretty much most of my test runs return 1000 where there are 10 pages of 1000 entries.

@djwgit
Copy link

djwgit commented Jan 29, 2020

possible to set this high priority, thanks.

@dleach02
Copy link
Author

dleach02 commented Jan 29, 2020

possible to set this high priority, thanks.

@djwgit, Not sure I'm following what you are saying? Are you wanting this issue made a high priority issue or are you suggesting there is something I could do in my code to make my search request high priority?

@stale
Copy link

stale bot commented Mar 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 29, 2020
@dleach02
Copy link
Author

I guess I had an open question to @djwgit. Wasn't sure if this person wanted me to mark this issue as high?

@djwgit
Copy link

djwgit commented May 6, 2020

@dleach02
sorry for the late reply, and thanks for opening this issue.
currently, I am doing a workaround, to query several times by time ranges, then put results together...
it will be nice to have this solved in a nicer way without using the time-slicing workaround.
yes, it is nice if this issue (#1309) could be marked as high-priority
(oh, @sfdye already did :-) )
thanks again for this awesome package !

@dleach02
Copy link
Author

dleach02 commented May 6, 2020

@djwgit

Yes, my work around is doing the same thing. Time slice the search request. Time permitting, I was also going to investigate the v4 API to see if that would be easier... but would like a nice python wrapper ;-)

@janeklb
Copy link

janeklb commented Oct 19, 2020

Just bumped into this too - thought it was an issue w/ the PyGithub library, but it looks like it's just a github api limitation

To satisfy that need, the GitHub Search API provides up to 1,000 results for each search.

https://docs.github.com/en/free-pro-team@latest/rest/reference/search

@psybers
Copy link

psybers commented Jun 30, 2021

So just fyi, if I am understanding the bug correctly, I think there is a simple workaround:

results = g.search_repositories('test')
results.get_page(0)
print('total: ' + str(results.totalCount))

Basically if you call get_page(0) before requesting the total count, then it shows the actual number (e.g., 50k) as opposed to how many results they will actually let you access (1k). At least that seems to work for me.

@psybers
Copy link

psybers commented Jun 30, 2021

What I would recommend the library author do is to store the data["total_count"] value in maybe another field and then allow accessing that value. That is the number of matching items for the search query. This can be useful to people searching!

If the search results in less than 1k results, then that value matches the totalCount value. So it is mostly useful when there are more than 1k results.

@vladimirtelepov
Copy link

any updates?

@SaashaJoshi
Copy link

SaashaJoshi commented Nov 1, 2022

Hey,
The totalCount method for paginated lists still max outs to 1000. It might be weird but when I do the following, I encounter a bug.

>>> repos = g.search_repositories(query = "language:python")
>>> print(repos.totalCount)
1000
>>> print(repos[1].totalCount)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Repository' object has no attribute 'totalCount'
>>> print(repos.totalCount)
7924416

I am not sure what happens! The number on the third try might be the correct value, but there is no way to verify that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants