Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-existent category proposed #22

Closed
nicolas-raoul opened this issue Oct 20, 2015 · 22 comments
Closed

Non-existent category proposed #22

nicolas-raoul opened this issue Oct 20, 2015 · 22 comments

Comments

@nicolas-raoul
Copy link
Member

In the categories selection dialog, when I start typing "flapja" I get two choice: "Flapjack" and "Flapjack image". None of these categories exist on Commons, the only real choice is "Flapjacks".

I tried selecting "Flapjack", and as a result the image is attributed that wrong category, resulting in a red link.

That category does not even seem to have been renamed recently, so I am not sure what is going on: https://commons.wikimedia.org/w/index.php?title=Category:Flapjacks&action=history

I had never searched for that category (or any with a similar name) before.

Where the request for categories from Commons seems to be made: https://github.com/nicolas-raoul/apps-android-commons/blob/master/commons/src/main/java/fr/free/nrw/commons/category/CategorizationFragment.java#L169

The same (mostly, I think?) request on the API sandbox: https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&list=allcategories&format=json&acfrom=flapja&acdir=descending as you can see results are "Flap Jack" and "FlapJack Image". Not sure where "Flapjack" came from.

@nicolas-raoul nicolas-raoul changed the title Wrong categories proposed Non-existent category proposed Oct 20, 2015
@nicolas-raoul
Copy link
Member Author

https://commons.wikimedia.org/wiki/Category:Badge actually does not exist and does not seem to have ever existed:

screenshot_2016-03-08-22-32-43

With latest version as of 2016 March 08

@misaochan
Copy link
Member

Hmm. Is this fixable on our end though? We're just returning the categories found via the Commons API. Could there be an issue with the way the API works?

@nicolas-raoul
Copy link
Member Author

Is this a limitation documented in the API's documentation?

This is a problem that affects us, and since it is not a problem with our code, we have to report the problem upstream. Probably the best place would be Phabricator? Maybe there is even an existing issue about it already?

@misaochan
Copy link
Member

Okay, I'll ask about it on the #Commons project board, that should be the best place right?

@nicolas-raoul
Copy link
Member Author

Do you mean https://commons.wikimedia.org/wiki/Commons:Village_pump ?
People there know about content policies but often not much about technical things

The present problem is probably deep inside the Mediawiki software, and the file search feature is used on many wikis, not just Commons, so I think Phabricator is a better place to ask. It also lowers the probability that the topic gets forgotten :-)

@misaochan
Copy link
Member

Oh, no, I meant what project board I should tag my Phabricator task with. Like https://phabricator.wikimedia.org/tag/commons/

@nicolas-raoul
Copy link
Member Author

nicolas-raoul commented Jul 8, 2016

Oh I understand! Yes sounds good :-)

@misaochan
Copy link
Member

Posted at https://phabricator.wikimedia.org/T139911 , tagging #Commons and #MediaWiki-API. Please feel free to modify my post as needed. :)

@nicolas-raoul
Copy link
Member Author

nicolas-raoul commented Jul 11, 2016

Thanks!

@jayvdb
Copy link
Contributor

jayvdb commented Jul 11, 2016

So, categories do exist when an image uses them, even if a page for the category doesnt exist.

I.e. one image here:
https://commons.m.wikimedia.org/wiki/Category:Flapjack

@nicolas-raoul
Copy link
Member Author

@jayvdb : Thanks for the investigation! https://commons.wikimedia.org/wiki/Category:Flapjack says "This page does not currently exist". The Commons search API seems to return any category name that at least an image uses, regardless of whether the category page exists or not.

If confirmed, I would call that a bug of the API. Or at least we should have the option to ask for only categories with existing pages, imho.

@jayvdb
Copy link
Contributor

jayvdb commented Jul 11, 2016

In mediawiki terminology, the category exists even if the page about the category does not.

You can add extra parameters & properties to the results to filter out those categories, and maybe filter out redirects, etc.
Or maybe you want list=allpages&alnamespace=14

See https://m.mediawiki.org/wiki/API:Allcategories and https://m.mediawiki.org/wiki/API:Allpages

@misaochan
Copy link
Member

Posted by anomie (who also closed the Phab task):

The definition of "category" for list=allcategory includes anything that was ever a category on a page, even if it doesn't have any members anymore. This is independent of whether a corresponding page ever existed in the Category namespace.

See also T28411: Entries for non-existent categories should be deleted from the 'category' table.

@nicolas-raoul
Copy link
Member Author

As an upload client, I guess what we want is not just "categories" as defined in the two previous comments. Instead, we want "currently already in use categories".

@jayvdb: https://commons.wikimedia.org/w/api.php?action=query&list=allcategories&acprefix=badge&aclimit=25&alnamespace=14 says Unrecognized parameter: 'alnamespace'... any idea how to modify the API request to get what we want? Thanks a lot! :-)

Linking to the Phabricator issue for this Mediawiki bug: https://phabricator.wikimedia.org/T28411

@jayvdb
Copy link
Contributor

jayvdb commented Jul 12, 2016

I still dont know 100% what you want, but we are getting close .
I suspect https://commons.wikimedia.org/w/api.php?action=query&list=allcategories&acprefix=badge&acmin=2&acprop=size , with client side ordering, would be appropriate.

But possibly if you have a fixed sized UI list to fill, or you want to get the best results first (responsive app) and fetch lower value results second, issue
https://commons.wikimedia.org/w/api.php?action=query&list=allcategories&acprefix=badge&acmin=100&acprop=size&aclimit=25 first, and if there are less then 25 results, send another query using acmin=2&acmax=9 to fill the UI space available . Categories with only one member are likely to be unsuitable categories

@nicolas-raoul
Copy link
Member Author

Interesting!
Server-side ordering would really be more efficient than performing many requests and then dropping most of the results...

Showing first categories that contain many files sounds like a good idea indeed. Also showing categories that are towards the root (even if they have only sub-categories and no files) would probably be another good idea.

@nicolas-raoul
Copy link
Member Author

Using version 2.9.0, category https://commons.wikimedia.org/wiki/Category:Azabujuban_summer_festival was proposed, even though it is a category redirect to the real category https://commons.wikimedia.org/wiki/Category:Azabu-J%C5%ABban_N%C5%8Dry%C5%8D_Matsuri

{{Category redirect|Azabu-Jūban Nōryō Matsuri}}

@ashishkumar468
Copy link
Collaborator

Seems to be fixed, feel free to re-open if this still exists on the latest releases

@nicolas-raoul
Copy link
Member Author

It is still happening unfortunately.
When I searched for trees in the app I saw this suspicious category: "Trees,nature..."
20201012_082628

Sure enough, Upload Wizard does not allow it:
Screen Shot 2020-10-12 at 08 24 56
For reference here is how the upload wizard looks when typing a valid category name:
Screen Shot 2020-10-12 at 08 24 45

The problem is probably that one picture wrongly uses that category name: https://commons.wikimedia.org/wiki/File:Forest_!!.jpg
Upload Wizard's category search is inferior to our category search, so replacing our search algorithm with theirs is not a good solution. But maybe we should double-check that categories exist before listing them?

@sivaraam
Copy link
Member

Showing a category regardless of whether a page exists for it or not is how the allcategories API we use is designed to behave. I'm not sure if we have anything to do in our end for this.

The cases you observe would eventually get fixed when the community removes those categories. For instance, the last case mentioned even seems fixed now. So, can we close this issue?

@nicolas-raoul
Copy link
Member Author

Indeed it does not seem to happen anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants