search_issues() result has totalCount maxed out to 1000 #1309

dleach02 · 2019-12-05T23:13:11Z

The paginated list returned from a search_issues() will report a totalCount that maxes out to 1000 when a search query results in more than 1000 items. The totalCount() method in PaginatedList() will get headers and data from result of a requester.requestJsonAndCheck() using the first URL. The logic then tests to see if 'link' is not in headers which in my case there is a 'link' so it falls to the else side and parses the lastUrl line for 'page' which is 1000.

Problem though is that in my search, the original 'data' structure has a valid 'total_count' field of 3041 so I'm not sure why the logic tries to derive something from the lastUrl which is generating an incorrect value?

And if I iterate through the PaginatedList returned I count only 1020 items so I'm not able to iterate through all 3041 items. Note that I put rate limiting code in the iteration loop to sleep checking on get_rate_limit().search.remaining to go nonzero.

dleach02 · 2019-12-06T03:29:08Z

this may be some limit in number of pages the search will support... pretty much most of my test runs return 1000 where there are 10 pages of 1000 entries.

djwgit · 2020-01-29T19:44:38Z

possible to set this high priority, thanks.

dleach02 · 2020-01-29T20:19:24Z

possible to set this high priority, thanks.

@djwgit, Not sure I'm following what you are saying? Are you wanting this issue made a high priority issue or are you suggesting there is something I could do in my code to make my search request high priority?

stale · 2020-03-29T22:16:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dleach02 · 2020-03-30T14:48:04Z

I guess I had an open question to @djwgit. Wasn't sure if this person wanted me to mark this issue as high?

djwgit · 2020-05-06T17:20:33Z

@dleach02
sorry for the late reply, and thanks for opening this issue.
currently, I am doing a workaround, to query several times by time ranges, then put results together...
it will be nice to have this solved in a nicer way without using the time-slicing workaround.
yes, it is nice if this issue (#1309) could be marked as high-priority
(oh, @sfdye already did :-) )
thanks again for this awesome package !

dleach02 · 2020-05-06T18:38:02Z

@djwgit

Yes, my work around is doing the same thing. Time slice the search request. Time permitting, I was also going to investigate the v4 API to see if that would be easier... but would like a nice python wrapper ;-)

janeklb · 2020-10-19T23:23:13Z

Just bumped into this too - thought it was an issue w/ the PyGithub library, but it looks like it's just a github api limitation

To satisfy that need, the GitHub Search API provides up to 1,000 results for each search.

https://docs.github.com/en/free-pro-team@latest/rest/reference/search

psybers · 2021-06-30T00:09:50Z

So just fyi, if I am understanding the bug correctly, I think there is a simple workaround:

results = g.search_repositories('test')
results.get_page(0)
print('total: ' + str(results.totalCount))

Basically if you call get_page(0) before requesting the total count, then it shows the actual number (e.g., 50k) as opposed to how many results they will actually let you access (1k). At least that seems to work for me.

psybers · 2021-06-30T00:12:04Z

What I would recommend the library author do is to store the data["total_count"] value in maybe another field and then allow accessing that value. That is the number of matching items for the search query. This can be useful to people searching!

If the search results in less than 1k results, then that value matches the totalCount value. So it is mostly useful when there are more than 1k results.

vladimirtelepov · 2021-12-28T14:48:39Z

any updates?

SaashaJoshi · 2022-11-01T18:47:46Z

Hey,
The totalCount method for paginated lists still max outs to 1000. It might be weird but when I do the following, I encounter a bug.

>>> repos = g.search_repositories(query = "language:python")
>>> print(repos.totalCount)
1000
>>> print(repos[1].totalCount)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Repository' object has no attribute 'totalCount'
>>> print(repos.totalCount)
7924416

I am not sure what happens! The number on the third try might be the correct value, but there is no way to verify that.

stale bot added the stale label Mar 29, 2020

stale bot removed the stale label Mar 30, 2020

sfdye added bug high priority labels Apr 26, 2020

EnricoMi mentioned this issue Jul 12, 2023

total count showing 0 when list has values #2620

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

search_issues() result has totalCount maxed out to 1000 #1309

search_issues() result has totalCount maxed out to 1000 #1309

dleach02 commented Dec 5, 2019 •

edited

dleach02 commented Dec 6, 2019

djwgit commented Jan 29, 2020

dleach02 commented Jan 29, 2020 •

edited

stale bot commented Mar 29, 2020

dleach02 commented Mar 30, 2020

djwgit commented May 6, 2020

dleach02 commented May 6, 2020

janeklb commented Oct 19, 2020

psybers commented Jun 30, 2021

psybers commented Jun 30, 2021

vladimirtelepov commented Dec 28, 2021

SaashaJoshi commented Nov 1, 2022 •

edited

search_issues() result has totalCount maxed out to 1000 #1309

search_issues() result has totalCount maxed out to 1000 #1309

Comments

dleach02 commented Dec 5, 2019 • edited

dleach02 commented Dec 6, 2019

djwgit commented Jan 29, 2020

dleach02 commented Jan 29, 2020 • edited

stale bot commented Mar 29, 2020

dleach02 commented Mar 30, 2020

djwgit commented May 6, 2020

dleach02 commented May 6, 2020

janeklb commented Oct 19, 2020

psybers commented Jun 30, 2021

psybers commented Jun 30, 2021

vladimirtelepov commented Dec 28, 2021

SaashaJoshi commented Nov 1, 2022 • edited

dleach02 commented Dec 5, 2019 •

edited

dleach02 commented Jan 29, 2020 •

edited

SaashaJoshi commented Nov 1, 2022 •

edited