Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

categorymembers doesn't return files #97

Closed
tbm opened this issue Dec 18, 2020 · 7 comments · Fixed by #100
Closed

categorymembers doesn't return files #97

tbm opened this issue Dec 18, 2020 · 7 comments · Fixed by #100

Comments

@tbm
Copy link
Contributor

tbm commented Dec 18, 2020

I'd like to iterate through category pages. The API has categorymembers for this. An example query: https://en.wiktionary.org/w/api.php?action=query&list=categorymembers&cmtitle=Category%3ASwahili_lemmas&format=json

Is that something you could add?

@barrust
Copy link
Owner

barrust commented Dec 18, 2020

Currently there is a categorymembers function that does, I believe, what you are looking for. Or is this something different?

@tbm
Copy link
Contributor Author

tbm commented Dec 21, 2020

Sorry, I'm not sure how I missed it.

I doesn't seem to work though. I get an empty list as the result.

Test case:

  from mediawiki import MediaWiki
  
  wiki = MediaWiki("https://commons.wikimedia.org/w/api.php")
  wiki.user_agent = "MartinWikiQuery/1.0 (tbm@cyrius.com)"
  
  p = wiki.page("File:Sw-ke-shangaliwa.flac")
  print(p.pageid)
  print(p.title)
  for category in p.categories:
      print(f"Category {category}")
      pages = wiki.categorymembers(category, subcategories=False)
      print(pages)

Does it work for you?

@tbm
Copy link
Contributor Author

tbm commented Dec 21, 2020

Oh, it doesn't work because you only query for page (or page|subcat) whereas these are all of type file.

This change makes it work:

-            "cmtype": ("page|subcat" if subcategories else "page"),
+            "cmtype": ("page|file|subcat" if subcategories else "file|page"),

@tbm
Copy link
Contributor Author

tbm commented Dec 21, 2020

https://www.mediawiki.org/wiki/API:Categorymembers

type
    Adds the type that the page has been categorised as (page, subcat or file).

@tbm
Copy link
Contributor Author

tbm commented Dec 21, 2020

I'm not sure if you want an argument to check for files or not and whether you want to return files as part of pages or as a separate list in the return tuple, but this quick hack works for me:

--- mediawiki/mediawiki.py
+++ mediawiki/mediawiki.py
@@ -672,7 +672,7 @@ class MediaWiki(object):
         search_params = {
             "list": "categorymembers",
             "cmprop": "ids|title|type",
-            "cmtype": ("page|subcat" if subcategories else "page"),
+            "cmtype": ("page|file|subcat" if subcategories else "page|file"),
             "cmlimit": (min(results, max_pull) if results is not None else max_pull),
             "cmtitle": "{0}:{1}".format(self.category_prefix, category),
         }
@@ -690,7 +690,7 @@ class MediaWiki(object):
 
             current_pull = len(raw_res["query"]["categorymembers"])
             for rec in raw_res["query"]["categorymembers"]:
-                if rec["type"] == "page":
+                if rec["type"] in ("page", "file"):
                     pages.append(rec["title"])
                 elif rec["type"] == "subcat":
                     tmp = rec["title"]

@tbm tbm changed the title Iterate through category pages (categorymembers) categorymembers doesn't return files Dec 21, 2020
@barrust
Copy link
Owner

barrust commented Dec 21, 2020

I don't generally have an issue with it being added within the normal pages. My only concern is if someone wants to loop over the pages returned, will they have to know which ones are files to append "file:" or something similar for other languages?

What if, we added a boolean parameter to include files in the category members? Therefore if someone like you would like them, you can pull them but if someone else doesn't need them they can leave the default of False set. What do you think?

def categorymembers(category, results=10, subcategories=True, files=False):
    cmtype = "page"
    if subcategories:
        cmtype += "|subcat"
    if files:
        cmtype += "|file"
    # do the remaining work

@barrust
Copy link
Owner

barrust commented Dec 21, 2020

UGH! Not sure I like putting it into its own list for files. It does add the "File:" prefix so that should work as just another "link". I think I can get to this in pretty quickly and update all the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants