New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Querying more than (500/5000) results in an automated way? #39
Comments
Can you provide me with the category you are looking to pull from? The category tree does get all records but this could be something added in. |
I'm looking at this: Thanks, I gave categorytree a try as well but it only gives me a short list of pages, strangely (the entire first page and halfway through the second page, up to Anemone altaica, which is 500 entries).
The categorymembers query returns 500 entries as well if I set "results" to anything higher than 500. I'm calling it like this:
Forgive me if I'm overlooking something obvious. If there's something that needs changing here I'm also happy to contribute. |
You should be able to get all the category members using the following: from mediawiki import MediaWiki
wiki = MediaWiki('http://practicalplants.org/w/api.php')
all_plant_names = wiki.categorymembers('Plants', results=None, subcategories=False) After a bit of poking around, it seems as though pymediawiki is looking for if 'continue' not in raw_res or last_cont == raw_res['continue']:
break It may take me some time to get this working and to see if this is a long-term change or if the query-continue is an older construct. |
@lovelaced I seemed to have tracked down the issue to a difference between actions and props and how they handle the With this, if you set the number of results to return to None, it will pull back all the category members. |
I pushed the change to pypi as version 0.3.17; if you would upgrade pymediawiki and test again, that would be great! I actually used this category in the test suite. |
@lovelaced I am going to close this issue. If something is still not working, please reopen and let me know! |
* Add fix to use the `query-continue` parameter to continue to pull category members [issue #39](#39) * Better handle large categorymember selections * Add better handling of exception attributes including adding them to the documentation * Correct the pulling of the section titles without additional markup [#42](#42) * Handle memoization of unicode parameters in python 2.7 * ***Change default timeout*** for HTTP requests to 15 seconds
Hi,
I'm attempting to get the results of a query (categorymembers) and the number of results are more than 500. I'm cool with doing multiple queries, but is there a way to continue where I left off? I know the "cmcontinue" param is available in raw_res in the categorymembers function, but I'm not sure if I can leverage it directly to get the results I want, or if I'm missing something.
For example, let's say I want to use the user default max (500) to get a list of all the pages that exist in a category, but there's 8000 pages in the category. Is it possible to loop a query to get all the pages?
The text was updated successfully, but these errors were encountered: