Suggest depictions using image recognition #75

nicolas-raoul · 2016-02-25T03:43:00Z

It would be great if I was proposed the item Elephant when I take a picture of an elephant.

There are some APIs for this, not sure if any is usable for free.
The API would provide a few words such as {elephant, zoo} and we would perform a Wikidata item search on these words and add the resulting items to the list of suggestions.

In using an online service, the feature should probably be opt-in since the privacy policy of the API will most probably be incompatible with the Wikimedia privacy policy.

nicolas-raoul · 2017-03-25T15:29:45Z

A friend has left Google to create an AI company and is looking for people to test his library. He promises to open source it soon. Unlike Google libraries, it is usable offline.

This looks like a great opportunity to develop this feature, since no such library existed so far (as far as I know)
Anyone interested in working on this right now? I can send you the library. Thanks a lot!

misaochan · 2017-03-25T15:39:17Z

Sounds great, and yeah should definitely be opt-in. I could chuck this into my IEG renewal proposal, but that probably won't be for another couple of months, so anyone who wants to work on it sooner is most welcome.

nicolas-raoul · 2018-02-06T12:31:45Z

There is a grant proposal to create an API for that:
https://meta.wikimedia.org/wiki/Grants:Project/AICAT

misaochan · 2018-02-06T13:58:19Z

@nicolas-raoul Sounds very useful! How did you hear of it? I wanted to post an endorsement, but their community notifications section is still empty so I was hesitant. :)

nicolas-raoul · 2018-02-06T15:33:52Z

@misaochan I learned about it here: https://www.wikidata.org/wiki/Wikidata:Project_chat#Interesting_initiative
I added an endorsement.

misaochan · 2018-02-06T15:47:25Z

I did the same. :) Even if the grant is approved though, it will probably be about a year before the API is usable (the grant is 8 months, and I believe the next Project Grant round starts in July).

alexeymorgunov · 2018-02-19T14:34:06Z

Thanks for the endorsement @nicolas-raoul! I am one of the guys behind the proposal. We welcome any suggestions and advice!

nicolas-raoul · 2018-03-13T10:27:10Z

Recent WMF blog post https://blog.wikimedia.org.uk/2018/02/structured-data-on-commons-is-the-most-important-development-in-wikimedias-usability/ :

show[...] the user images with suggested ‘fields’, allowing the user to then swipe left or right to say whether or not the image should be tagged with the suggested category. This would allow the community to help organise the uncategorised images on Commons much more efficiently.

This sounds very similar to the present issue.
Categories will become Structured Commons properties in the future, but that does not make that much difference from the point of view of this issue.

The idea of swiping left/right is interesting, let's gather the pros/cons:
Pros of swiping:

Cool gesture
The whole screen space is available to show infomation about the category (property), for instance textual explanation or example images that have it (which is the topic of When selecting category, show sample images #1244)

Cons of swiping:

No global view, for instance after taking a picture of a red car in Italy, you get the suggestion "Car" and you swipe Yes, then "red car" and you swipe Yes again, then "red car in Italy" and you swipe Yes again. If you had seen all of them from the beginning, you would have selected only the last (most precise) category. With Structured Commons this should not be a problem as color/country/etc are perpendicular properties.
Takes more time. The current suggestion screen shows like 50 suggestions. With swiping you can not reasonably expect the user to swipe more than 10 times for a single upload.

The other new idea we can steal from this blog is that category suggestion could be used not only for the picture I just uploaded, but also for uncategorized pictures uploaded by other people.

aaronpp65 · 2018-03-19T07:48:39Z

Hai ,
My name is Aaron.So i am interested in contributing to the Commons App for GSoC18 to allow users to browse.So i was wondering if i could use image processing,Like when the user uses the camera to take a photo,the app scans the area and gives possible suggestions which could include users see other people's work etc.
we could use TensorFlow lite
and an image processing model like Inception-v3
Inception-v3 has already been tested successfully in TensorFlow lite, they say and i quote "the model is guaranteed to work out of the box"
Do you think this could work?Looking forward to suggestions.....

nicolas-raoul · 2018-03-19T08:20:01Z

@aaronpp65 basic questions about this solution:

Does it work offline?
What is the size (kilobytes) of the part we must embed in our app's APK?
What is the license of the whole thing? (if it does not work offline, please cite the license of the server part too)
Thanks :-)

Also, if I understand correctly that library gives you a word like "leopard" or "container ship", right? How do you propose matching these strings to:

Wikimedia Commons categories (see https://commons.wikimedia.org/wiki/Category:Topics)
Wikidata entities (see https://www.wikidata.org/wiki/Special:Random)

aaronpp65 · 2018-03-19T12:37:51Z

*It’s machine-learning on the go, without the need for connectivity.
*TensorFlow Lite is < 300KB in size when all operators are linked and <= 200KB when using only the operators needed for standard supported models (MobileNet and InceptionV3).
*TensorFlow is an open-source software,it was released under the Apache 2.0 open source license

aaronpp65 · 2018-03-20T08:56:08Z

Yes the library gives you a word like "leopard" or "container ship" but it happens when we use a pre-trained Incpetion v3. Its trained using Imagnet data set.
So instead of using a pretrained model, we can train the Inception model using our own wikimedia commons dataset.Hence we will get strings similar to that of the commons.Then we can query this string in the commons database and retrieve other people work .
But then as you asked before we will need connectivity to do this part of querying the database

nicolas-raoul · 2018-03-20T10:53:06Z

@aaronpp65 Very impressive, thanks!
Requiring connectivity during training is no problem, of course.
But using Commons as a training set unfortunately sounds difficult, because:

Most Commons category have only 10 files or less, which is not enough for training.
Commons images are usually not fit for training, for instance these are images in https://commons.wikimedia.org/wiki/Category:Container_ships :

$2016-08-05_frachtschiffreise_stockwerkbetten_auf_containerfeeder_ms_dornbusch$

So I guess we'd be better off trying to match from ImageNet categories to Commons or Wikidata.
https://opendata.stackexchange.com/questions/12541/mapping-between-imagenet-and-wikimedia-commons-categories
https://opendata.stackexchange.com/questions/12542/mapping-between-imagenet-and-wikidata-entities

aaronpp65 · 2018-03-20T17:27:51Z

Yeah.....So mapping ImageNet with commons should do the trick

aaronpp65 · 2018-03-20T18:29:15Z

@nicolas-raoul will you please check my draft and give possible feedbacks.
Thanks

nicolas-raoul · 2018-03-22T08:24:14Z

@aaronpp65 Could you please post a link to your draft? Thanks!

aaronpp65 · 2018-03-22T16:56:02Z

https://docs.google.com/document/d/1am3EbhBrwaYn2_LLKAmnrXlzTGVWgttCdALAV4fy_NU/edit?usp=sharing
@nicolas-raoul here is the link to the draft. I should make one in phabricator too right?

nicolas-raoul · 2018-03-23T04:34:30Z

Yes, please post it on Phabricator, thanks :-)

…

On Fri, Mar 23, 2018 at 1:56 AM, aaronpp65 ***@***.***> wrote: https://docs.google.com/document/d/1am3EbhBrwaYn2_ LLKAmnrXlzTGVWgttCdALAV4fy_NU/edit?usp=sharing @nicolas-raoul <https://github.com/nicolas-raoul> here is the link to the draft. I should make one in phabricator too right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#75 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGFBg4ZoDeKOeG6RxeFEOexRjr-wttXks5tg9eigaJpZM4HiYl2> .

nicolas-raoul · 2018-03-23T04:41:03Z

Could you please explain in more details the following steps: *- Convert the model to the TensorFlow Lite file format.* *- Integrating the converted model into Android application* *Also, please add a step-by-step description of what the user will see, what screen they will go to, what button they click, so that we understand what this project will bring to the app. Feel free to include hand-drawn screens to make it clearer if necessary.* *Thanks! :-)* On Fri, Mar 23, 2018 at 1:34 PM, Nicolas Raoul <nicolas.raoul@gmail.com> wrote:

…

Yes, please post it on Phabricator, thanks :-) On Fri, Mar 23, 2018 at 1:56 AM, aaronpp65 ***@***.***> wrote: > https://docs.google.com/document/d/1am3EbhBrwaYn2_LLKAmnrXlz > TGVWgttCdALAV4fy_NU/edit?usp=sharing > @nicolas-raoul <https://github.com/nicolas-raoul> here is the link to > the draft. I should make one in phabricator too right? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#75 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAGFBg4ZoDeKOeG6RxeFEOexRjr-wttXks5tg9eigaJpZM4HiYl2> > . >

aaronpp65 · 2018-03-23T17:13:57Z

@nicolas-raoul i have made the required changes and have added a basic wireframe
Thanks for the feedback.

nicolas-raoul · 2018-05-21T02:12:33Z

Here is a web-based tool that suggests categories for any image: https://youtu.be/Y9lvXVJCiyc?t=1932
It seems to work quite well, judging from the demo.

Image labelling and category suggester
Phab: https://phabricator.wikimedia.org/T155538 (not exactly all of the things this ticket wants).
User script that finds labels for the image and suggests categories
I will use the provided laptop
Niharika
Demo a user script that uses a Google (?) image recoginition API to detect contents of an image and suggest possible categories. Works, but not perfect. (hilarious skeleton example) you can play with it yourself
https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/sandbox - https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/imagery.js

If I understand correctly the wiki page calls a mediawiki API which in turn calls a third-party image recognition tool. Having mediawiki in the middle allows the IP address of the user to not be leaked, so I guess we could actually use this right now.

whym · 2019-01-08T11:08:54Z

https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/sandbox - https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/imagery.js

It looks like this uses a Toolforge tool (https://tools.wmflabs.org/imagery/api.php) which is currently down(?) - it returns a 500 error on a query from the script for me. It's been a long time, I believe it was meant to be a proof of concept that was not going to be maintained as it was.

nicolas-raoul · 2019-01-08T11:15:33Z

it was meant to be a proof of concept

I hope the source code is still available somewhere and someone turns it into a more permanent tool :-)

nicolas-raoul · 2019-03-20T10:46:05Z

My understanding is that we still need to find either:

A Wikimedia-hosted API that provides image classification (We can not use third party API like Azure directly for privacy reasons. Calling a third-party API from a Wikimedia server would be OK as long as it does not cost money).
An embeddable image classification JAR which is small (at most a few megabytes) and open source.

The API or library must output either Commons category(ies) (example: "the submitted image contains a https://commons.wikimedia.org/wiki/Category:Dogs") or Wikipedia/Wikidata item(s) (example: "the submitted image contains a https://www.wikidata.org/wiki/Q144").

madhurgupta10 · 2019-03-20T11:03:35Z

@nicolas-raoul I agree that using third party API such as Azure will be a concern for privacy. There is an alternative to it https://wadehuang36.github.io/2017/07/20/offline-image-classifier-on-android.html

nicolas-raoul · 2019-03-29T09:07:43Z

Thanks @madhurgupta10 !
This seems to be the best fork of that project: https://github.com/mnnyang/tensorflow_mobilenet_android_example
Unfortunately I did not manage to build it, there must be some necessary step that I am not thinking of.

madhurgupta10 · 2019-03-29T13:42:14Z

@nicolas-raoul I managed to build it, If you would like, I shared the apk file

.

nicolas-raoul · 2019-04-01T04:12:03Z

Thanks! Did you modify any of that project's files? If yes please fork/commit and push your fork to GitHub, thanks :-)

wow 32 MB is very big. I am sure the tensorflow libraries contain many unused classes/network types/etc. Ideally image recognition should not add more than a few MB to the total size of our app's APK. Anyone willing to take on this issue for GSoC/Outreachy, please include that trimming task in your schedule, thanks!

madhurgupta10 · 2019-04-01T04:28:10Z

@nicolas-raoul Sure, I will add that in my proposal :) and will commit the files soon, also the TF 2.0 is out so it would be much more optimized and would be better than this example, it's pretty old.

nicolas-raoul · 2019-04-09T03:39:10Z

I believe the project above uses the normal Tensorflow. Using Tensorflow Lite will certainly reduce size a lot, but still not enough I am afraid. Other things to try:
https://developer.android.com/topic/performance/reduce-apk-size
https://stackoverflow.com/questions/51784882/how-to-decrease-the-size-of-android-native-shared-libaries-so-files/51814290#51814290

madhurgupta10 · 2019-04-09T08:50:41Z

@nicolas-raoul
Thanks for the links, I will explore them 👍

nicolas-raoul · 2019-07-02T04:23:13Z

There is a pre-built APK at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android#bazel
I just tried it, the app itself is rather buggy, but when it works it is super-fast and not too bad at recognizing things, I mean the 3 first guesses contains the depicted object 50% of the time, so showing them as suggestions would be helpful.
The APK is 23 MB unfortunately.

maskaravivek · 2019-07-02T08:17:06Z

I would take a look at this and hopefully, we can incorporate this in the app for category suggestions and later maybe to suggest depicts for Wikidata.

nicolas-raoul · 2019-07-02T09:43:35Z

Most image classification implementations output WordNet 3.0 concepts.

I just wrote this query that shows the mapping between WordNet concepts, Wikidata items, and Commons categories. It takes a while to execute, so here is a screenshot:

There are currently 474 mappings, and it has not increased in a year. I will try to motivate people to add more mappings.

nicolas-raoul · 2019-09-18T02:30:43Z

Good news, this is starting to get implemented on commons.wikimedia.org :
https://commons.wikimedia.org/wiki/Commons:Structured_data/Computer-aided_tagging

nicolas-raoul · 2020-03-13T07:41:09Z

This page seems to do exactly what we want: https://commons.wikimedia.org/wiki/Special:SuggestedTags
Everyone please try it out, and post here how often at least one useful suggestion is shown (for instance 50% of the time, etc). Other thoughts welcome too of course. Thanks! :-)

I have asked whether an API could be made for us: https://commons.wikimedia.org/wiki/Commons_talk:Structured_data/Computer-aided_tagging/Archive_2020#API_to_retrieve_%22depicts%22_suggestions_from_Commons_app? (no reply unfortunately)

maskaravivek · 2020-03-13T09:14:38Z

Wow! The suggestions look quite useful. For each of the first 5 images, I found at least 1 relevant tag suggested.

misaochan · 2020-03-13T10:03:48Z

Their algorithm is fantastic IMO!! Had at least 2 relevant tags for 3/3 of the photos I saw. @macgills , is this something that you and Mark have identified as a future potential task for you (after getting our SDC branch merged)?

macgills · 2020-03-13T10:14:34Z

I couldn't say! Will for sure discuss it with him at our next meeting on monday.

misaochan · 2020-03-13T10:24:27Z

Awesome! Let us know how that goes. :)

sivaraam · 2020-03-13T12:23:50Z

Everyone please try it out, and post here how often at least one useful suggestion is shown (for instance 50% of the time, etc).

In my attempt at testing 6 or 7 images, the suggestions were mostly relevant. In cases there were even 10 appropriate suggestion! Also, none of the suggestions can be called totally irrelevant. This looks great!

Their algorithm is fantastic IMO!!

Yeah, guess what they are using in the backend, Google cloud vision. 😎 [ref]

On a related note, the Wikipedia app is adding a new option in their Suggested edits feature that allows users to tag Commons images with suggested image tags [ref 1] [ref 2] [ref 3]. This is already in their alpha app. Not sure if it's in the production version, though. I suppose they're using an API related to the Special:SuggestedTags page.

nicolas-raoul · 2024-01-12T07:19:12Z

Ideally in the future we could use on-device models to do this. This would remove the need to either call a we service or embed a bulky model in our APK.

https://ai.google.dev/tutorials/android_aicore :

The APIs for accessing Gemini Nano support text-to-text modality, with more modalities coming in the future.
[apps using this can] provide a LoRA fine-tuning block to improve performance of the model for your use case.

Hopefully image-to-text will come soon.

sivaraam · 2024-01-12T13:59:04Z

Ideally in the future we could use on-device models to do this.

The idea is nice. I'm nust unsure what the community consensus is about using machine assistance in order to edit depictions. Do you happen to be aware of any guidelines about it Nicolas?

nicolas-raoul · 2024-01-13T01:19:59Z

@sivaraam I don't think there are any guidelines about this currently. The AICAT experiment was stopped due to some strong opposing voices, but I believe our app is a very different use case. In our app:

The tagger is the person who is uploading the picture, they have all of the necessary context.
It was tempting for AICAT users to apply all of the suggested tags, because often all were plausible. Our users already know that not all suggestions should be selected, because for years already they have been shown a wide range of suggestions, some based on nearby items, some based on previously uploaded pictures, etc. In other words, unlike AICAT our users understand that what is shown is just suggestions, and that they have to select wisely.

nicolas-raoul · 2024-10-15T02:03:20Z

Embedded models would be a good way to avoid privacy issues (everything is done on the device), but currently it only supports text-to-text, not image-to-text: https://developer.android.com/ai/gemini-nano https://ai.google.dev/edge

nicolas-raoul added enhancement categorization labels Feb 25, 2016

nicolas-raoul changed the title ~~Categorize with image recognition~~ Categorize (or Structured Commons equivalent) with image recognition Mar 13, 2018

This comment has been minimized.

Sign in to view

nicolas-raoul mentioned this issue Apr 1, 2019

ClassNotFoundException: Didn't find class "dev.wadehuang.mobilenetexample.MainActivity" weironghuang31/tensorflow_mobilenet_android_example#1

Open

nicolas-raoul mentioned this issue May 15, 2019

Our team at Wikimedia Hackathon 2019 #1962

Closed

nicolas-raoul added the structured-data label May 15, 2019

nicolas-raoul changed the title ~~Categorize (or Structured Commons equivalent) with image recognition~~ Suggest depictions using image recognition Aug 7, 2019

nicolas-raoul added the machine-learning label Jan 27, 2020

Suggest depictions using image recognition #75

Suggest depictions using image recognition #75

Comments

nicolas-raoul commented Feb 25, 2016 • edited Loading

nicolas-raoul commented Mar 25, 2017 • edited Loading

misaochan commented Mar 25, 2017

nicolas-raoul commented Feb 6, 2018

misaochan commented Feb 6, 2018

nicolas-raoul commented Feb 6, 2018

misaochan commented Feb 6, 2018

alexeymorgunov commented Feb 19, 2018

nicolas-raoul commented Mar 13, 2018 • edited Loading

aaronpp65 commented Mar 19, 2018

nicolas-raoul commented Mar 19, 2018 • edited Loading

aaronpp65 commented Mar 19, 2018

aaronpp65 commented Mar 20, 2018

nicolas-raoul commented Mar 20, 2018

aaronpp65 commented Mar 20, 2018

aaronpp65 commented Mar 20, 2018

nicolas-raoul commented Mar 22, 2018

aaronpp65 commented Mar 22, 2018

nicolas-raoul commented Mar 23, 2018 via email

nicolas-raoul commented Mar 23, 2018 via email

aaronpp65 commented Mar 23, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

nicolas-raoul commented May 21, 2018

whym commented Jan 8, 2019

nicolas-raoul commented Jan 8, 2019

nicolas-raoul commented Mar 20, 2019 • edited Loading

madhurgupta10 commented Mar 20, 2019

nicolas-raoul commented Mar 29, 2019

madhurgupta10 commented Mar 29, 2019 • edited Loading

nicolas-raoul commented Apr 1, 2019

madhurgupta10 commented Apr 1, 2019

nicolas-raoul commented Apr 9, 2019

madhurgupta10 commented Apr 9, 2019 • edited Loading

nicolas-raoul commented Jul 2, 2019

maskaravivek commented Jul 2, 2019

nicolas-raoul commented Jul 2, 2019

nicolas-raoul commented Sep 18, 2019

nicolas-raoul commented Mar 13, 2020 • edited Loading

maskaravivek commented Mar 13, 2020

misaochan commented Mar 13, 2020 • edited Loading

macgills commented Mar 13, 2020

misaochan commented Mar 13, 2020

sivaraam commented Mar 13, 2020 • edited Loading

nicolas-raoul commented Jan 12, 2024

sivaraam commented Jan 12, 2024 • edited Loading

nicolas-raoul commented Jan 13, 2024

nicolas-raoul commented Oct 15, 2024

nicolas-raoul commented Feb 25, 2016 •

edited

Loading

nicolas-raoul commented Mar 25, 2017 •

edited

Loading

nicolas-raoul commented Mar 13, 2018 •

edited

Loading

nicolas-raoul commented Mar 19, 2018 •

edited

Loading

nicolas-raoul commented Mar 20, 2019 •

edited

Loading

madhurgupta10 commented Mar 29, 2019 •

edited

Loading

madhurgupta10 commented Apr 9, 2019 •

edited

Loading

nicolas-raoul commented Mar 13, 2020 •

edited

Loading

misaochan commented Mar 13, 2020 •

edited

Loading

sivaraam commented Mar 13, 2020 •

edited

Loading

sivaraam commented Jan 12, 2024 •

edited

Loading