Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

label score per image/'primary label' #202

Open
dobkeratops opened this issue Sep 17, 2018 · 13 comments
Open

label score per image/'primary label' #202

dobkeratops opened this issue Sep 17, 2018 · 13 comments

Comments

@dobkeratops
Copy link

dobkeratops commented Sep 17, 2018

expanding on the idea of a "primary label" per image-imagine giving each label a score;
I suspect the image sites would have something like this,

imagine asking these questions:

  • For an image, which single word/label describes it best?
  • For a word/label - which of the available images describes it best?
  • sort the labels on an image, by prominence
  • for a label, sort the images by how well they represent it (after the best, which is the next best..)

This is subjective, and could yield a floating point value representing the degree of consensus/certainty.

I'm not sure if that's 2 seperate values , or one value ("the coupling between the label and the image") .. could two values be combined?

Such a value could be used for:

  • sorting the tasks?
  • conveying uncertain assignments (e.g. can't tell if this dog on grass is in a park or garden? .. give both a value of 0.5)
  • sorting any search results
  • the expected value for training an image wide net
  • also un-verified labels could start off at some assumed value (increase or decrease per validation/invalidation)

This sort of thing might become more useful as both the image set and label set grows.. I imagine many un-annotatable labels still being useful for image-wide training (e..g distinctions between describing an image primarily various overlapping terms , such as... person vs crowd, city vs street, city vs building,city vs town, street vs road, garden and park)

@bbernhard
Copy link
Collaborator

Interesting idea!

For an image, which single word/label describes it best?

would you give the user the possibility to also add a new word/label, or does he need to choose best matching one from the existing labels? would you restrict that to one word? I think picking one word has definitely it's charme, but I am not sure whether we should enforce that or not. If we do not restrict that and allow free input, my gut feeling is, that people will automatically end up to write small descriptive senctences/phrases (woman dancing in rain, barking dog,...) - I think picking just one word requires you to think about it a bit longer, while describing what you see is easier in most cases. I wonder if we can get similar results, if we let the users type sentences? (What would be really cool is, if we could train a neural net that can describe the image - I guess that could be quite helpful for blind people; but not sure what's more helpful for that - single words or small sentences/phrases? I guess in case of phrases the neural net might also learn some grammar?).

@dobkeratops
Copy link
Author

dobkeratops commented Sep 17, 2018

would you give the user the possibility to also add a new word/label, or does he need to choose best matching one from the existing labels?

i'm hoping 'scene labels' will suffice most of the time, but some of the images will focus on an object label or even part (focus on a persons face, etc). we will have the same issues with spam .. we could curate a scene label list if need be.

I think picking just one word requires you to think about it a bit longer, while describing what you see is easier in most cases.

right i was thinking that too, perhaps it could be structured A <relation> B default 'in', eg 'dog in park' , the second part greyed out until you assign something , to avoid needing a whole natural language parser ... but there is a lot of progress on exactly that kind of natural language description going on (neural nets learning grammar..)

another idea is to just sort the labels.. it might be obvious enough that a foreground object goes first (where is your eye drawn). "dog, park" (focussed on one dog). vs "park, dogs" (dogs in the distance)

@bbernhard
Copy link
Collaborator

on a related note: I scrolled a bit through a bunch of papers today where they tried to train a neural net that is able to generate image descriptions. I haven't read the papers completely, but it looks like that some of the implementations used phrases/sentences as input. https://cs.stanford.edu/people/karpathy/deepimagesent/

Something that I wasn't aware either: It looks like that but the Coco dataset also contains a descriptive text for each image - always thought they were only labeling images.

(I am not completely through the papers, but they sound quite interesting).

@bbernhard
Copy link
Collaborator

bbernhard commented Sep 24, 2018

short update: I started working on this one a few days ago. The first version will be pretty basic - it simply allows to add one (or more) image descriptions. There will be no (semantic) parsing of the input at the moment - so users can enter any text they want (I wanted to keep the first version as simple as possible).

After the user entered some text, the image description is still in a locked state and needs to be unlocked by a moderator. Not sure if the manual unlocking scales (I guess at some point it won't be possible anymore for moderators to keep up), but I want to keep that mechanism as long as possible.

I am a bit afraid that we will end up with a lot of garbage in our dataset, if we skip the content moderation completely. I have seen that so often in public discussion boards/forums...no matter how good your spam prevention mechanisms are, at some point bots will find a way through and mess with your board. If the dataset reaches a certain size, I guess it's not even possible to spot that by picking random samples, so I guess our safest bet at the moment is to unlock every image description manually.

If that doesn't scale anymore, I think we can also introduce some sort of trust level. If a user has contributed a lot of valid data, he will automatically become a "trustworthy user". i.e we don't need to validate every contribution of that user anymore; instead we randomly pick every n-th contribution and validate that (to avoid that a user "turns bad" after becoming a trustworthy user).

Here are some screenshots:

Add new image descriptions:

image_description_1

image_description_2

edit: I am not totally happy yet with the UI controls. Ideally, there should be only one UI element for both the label and the image description input. But I don't want to mix up labels and image descriptions - I think it's better to keep them separated in the dataset.

Moderators screen (to unlock image descriptions):

image_description_3

@dobkeratops
Copy link
Author

ok interesting idea.. I guess this moderation stuff is absolutely essential if you're going to have extensive user descriptions going in

I was also going to make a couple of suggestions toward this..

ideas like
(i) asterisk prefix in label entry to indicate primary .. hard for users to discover but this is the easiest for me to give you examples with *dog (the image focusses on a dog) *cityscape *gym (suggests the label cityscape and marks it as primary)
(ii) aliased 'blah blah scene' hints to initialise primary labels,
(iii) some labels like 'kitchen' would probably be safe to assume as scene labels until you tell it not
there's some distinct examples like
street, airport, landscape, harbour, wilderness,farm

(iv) a square bracket language for label relations forming a scene description, eg
[[man] holding [smartphone]]
[chef behind [[food on plate] on table]]
although that might be overkill.. - I can imagine an 'approved vocabulary' emerging which would give quite a bit of descriptive power (on in behind above below inside outside of holding eating ..) combinable with the labels
*the main reason for the square brackets would be to allow extracting labels and label suggestions directly from the scene description statements, and to distinguish them in the text entry box.
(i.e. say "[[chef] behind [[food on [pasta bowl]] on [wooden table]]" and you'd get chef,food,pasta bowl, wooden table all added to the label list aswell.)

(v) there's also very vague 'thematic labels' which would definitely always be un-annotatable: military (soldiers, weapons,vehicles,..) sport ("this image contains sports players or sports equipment") retail ("this image has shop exteriors or interiors or shopping bags or ..") food and drink ("this image focusses on food preparation, dishes, or people in cafes/restaurants/")... these might be nice to make available for searches and easy whole-image training signals (there's a label home but now I realise that's probably too ambiguous, a thematic label domestic could be clearer) I have these in my HD directories of course.. my reasoning is it will be a nice easy training boost . subtle things like the difference between forest and jungle could force it to look for interesting details

@dobkeratops
Copy link
Author

(FYI i'm having a little Haskell sidetrack at the minute.. occasionally i dip back into it. Such a fascinating/addictive language but i still find it slower to write code. it might be another option for exploring things like the image clustering idea)

@bbernhard
Copy link
Collaborator

Many thanks for the suggestions, will definitely consider them! I think your list contains a few "easy wins", which we could implement pretty easily - will keep those on my mental todo list :)

FYI i'm having a little Haskell sidetrack at the minute.. occasionally i dip back into it. Such a fascinating/addictive language but i still find it slower to write code. it might be another option for exploring things like the image clustering idea)

Haskell is also a language that really fascinates me. Unfortunately, I never had enough motivation and perseverance to dig deeper into the language concepts. But I've heard many times from my colleagues that once you mastered the language you get a really powerful tool in your hands :)

@dobkeratops
Copy link
Author

so basically with more intent toward 3d models i'm writing some haskell clustering code, but it's using a plugin vector type, so i could still throw in 'a 32323 element vector' from an image. with 3d models you can physically see the clusters (draw bounding shapes around them).

(i'm just remembering your 'image galaxy' idea again now..)

bbernhard pushed a commit that referenced this issue Sep 30, 2018
* moved database specific parts to database/
* moved parser specific parts to parser/
* moved common parts to commons/

see #202
@bbernhard
Copy link
Collaborator

bbernhard commented Oct 3, 2018

short status update:

I just pushed a new version to production which includes a lot of changes. Here's a short list:

  • tried to make the browse mode more prominent
  • restructured navigation bar a bit
  • added possibility to describe images
  • added moderator tab (moderators can now accept/decline image descriptions)
  • fixed a few small UI bugs (and hopefully didn't introduce new ones)

A few more words about the image description feature: Currently, only the bare minimum is implemented. I think there is still a lot of room for improvement, but I intentionally wanted to keep that whole thing as simple as possible, in order to evaluate whether that's something that's worth investing more time into.

Possible improvements:

  • add support for different languages:
    Currently, I am implicitly assuming, that every image description will be in english. But I think it would make sense to add support for other languages as well. (e.q: I am way more comfortable to describe an image in german...I am always struggling to get the english grammar right. But in order to train a neural net on this image description, I think it's crucial to have a sentence that's grammatically correct).

  • add possibility to change image description (moderator functionality):
    At the moment it's not possible to change an image description. So, if someone adds an image description that has a spelling mistake or is grammatically wrong, the only option is to decline it. I think it would be quite useful for moderators to correct an image description.

  • validation functionality: I think we also need some sort of validation/voting functionality (maybe something similar to the "validate label"/"validate annotation"?)

@dobkeratops I gave you moderator rights to accept/decline image descriptions. When you are logged in, you should see a notification badge in the top right corner:

moderation

The notification badge shows the number of pending requests (i.e requests that need moderator action). Any help with the moderation of the image descriptions is really appreciated...as you are living in the UK, I think it might be easier for you to spot grammar mistakes. ;) But I also totally understand, if you don't wanna do this...so please don't feel forced to that ;).

edit: I tried to test everything carefully, but in case I broke some existing stuff, please let me know :)

@dobkeratops
Copy link
Author

Great to hear.. as you can probably guess I've been going deeper into haskell-land recently but haven't forgotten this at all.

I have several areas I wish to connect and some hugely ambitious ideas that I want to get going on (which can connect with what you're doing here.. I think it's an awesome resource).. I just need to rotate focus between them.

Not sure how long I'll stick with haskell - it's amazingly addictive/elegant but I think you got it right using Go for a website, and I still think Rust has a great balance between correctness/safety and low-level capability (mixing the 'expression-based' code with real-world imperative & OOP styles). I'm basically on a pendulum-swing with C++ at one end, rust in the middle. I definitely don't agree with the 'pure-fp zealots'

anyway now I have to decide if I should try some serious new ideas in haskell (there are definitely things I spent quite a bit of time on in the past that it does suit) or go back to Rust (which I think can do everything)...

@bbernhard
Copy link
Collaborator

bbernhard commented Oct 7, 2018

Great to hear.. as you can probably guess I've been going deeper into haskell-land recently but haven't forgotten this at all.

I have several areas I wish to connect and some hugely ambitious ideas that I want to get going on (which can connect with what you're doing here.. I think it's an awesome resource).. I just need to rotate focus between them.

Great to hear - really looking forward to that! In my opinion a good ecosystem of libraries and tools that are built around a service are at least as important as the service itself. I've seen so many great services over the years...some of them were really extraordinary and solved real problems. While the service itself was really good, the ecosystem built around it was...well..not that good. Often they had their way of doing something and if you wanted to do something slightly different, it was either really complicated or not possible at all. I hope that by focusing on the API and third party integrations early in the process, that we get a better feeling of what functionality we/other users actually need. So I think any effort in that direction is definitely worth it and pays off in the long run :)

Not sure how long I'll stick with haskell - it's amazingly addictive/elegant but I think you got it right using Go for a website,

Not sure, if Go was the perfect choice, but I am pretty happy with it so far. In the past, I always used Python as a backend language for web applications (which I later always regretted at some point). My main motivation for using Python was always "oh, that will be just a small application, so Python will do just fine". But most of the time, the application got bigger than expected. And at some point in time I wanted to refactor and restructure the application, which I always find really painful in Python due to dynamic typing.

I recently refactored the imagemonkey-core sourcecode a bit (moved a lot of stuff to libraries) and I have to say: The experience was really smooth...way nicer than my experience with Python. The fact that golang is a statically typed language also helps a lot...no matter how many unit/integration tests I have in place, it makes me way more confident when the compiler runs through after refactoring.

I recently stumbled accross nim (https://nim-lang.org/) which also sounds really interesting. I am still looking for a use case though. I tend to re-write stuff in a different language, just because the language itself fascinates me. But while rewriting the software, I often lose interest in the project itself, as it's anyway already "solved" and then I end up with a bunch of unfinished code. So I've decided to only write new stuff in a new language...it's not always working out, but I got better at controlling the temptation :D

@dobkeratops
Copy link
Author

dobkeratops commented Oct 20, 2018

( update, i'm firmly back in rust-land working on the renderer again. Haven't dropped so many contributions to your site lately , but I still try to dip in occasionally.

regarding 'describe-the-image', I hope my speculative use is ok: I've been dropping sentances with square brackets [] enclosing as much of the parse tree as possible, especially in the hope that the leaves [[a person] riding [an elephant]] can be turned directly into labels -> a person an elephant, saving entry and of course giving some training data for natural-language processing. I figure at worst just stripping them out is trivial with the kind of text libraries modern languages have (I've just been impressed with what Rust has on that front, making error messages for shaders so much easier to process etc..)

back on my project...the time in haskell helped me get out of the c++ comfort zone, but Rust is the best language for me overall;
my long term interest here is AI based texture/material enhancement/synthesis, AI based assists for procedural modelling/using procedural models in recognition. Lots of peices to connect . I still have an inkling to do the procedural parts in haskell.

The two most immediate utilities I'm after are:-
(i) auto-texture-tiling .. decent aprox can be done for simpler images. step 1:"splatting some peices, choosing the best correlation from several tries". Step 2: more accurate using that image tree. Step 3.. full on trained neural nets.

(ii) (a) microtxture and (b)'corner/crease' texture selection e.g. (ii)(a)breaking an image into one low res structural image, and one fine-grain which can be repeated at a different rate. not really 'compression' but giving the same impression in a way that can be varied. (iii) is about enhancement of image corners etc.

(this is why I'm so keen on things like material labels)

@bbernhard
Copy link
Collaborator

bbernhard commented Oct 21, 2018

Thanks for the update :) That's great to hear!

The two most immediate utilities I'm after are:-
(i) auto-texture-tiling .. decent aprox can be done for simpler images. step 1:"splatting some peices, choosing the best correlation from several tries". Step 2: more accurate using that image tree. Step 3.. full on trained neural nets.

(ii) (a) microtxture and (b)'corner/crease' texture selection e.g. (ii)(a)breaking an image into one low res structural image, and one fine-grain which can be repeated at a different rate. not really 'compression' but giving the same impression in a way that can be varied. (iii) is about enhancement of image corners etc.

That sounds very interesting, please keep me posted on your progress - really looking forward to that. I am hoping that once there are a few more applications that actually make use of the dataset, that it gets easier to extend the site's functionality. Although I tried to keep the APIs as open and flexible as possible, but they are currently designed mostly around my own use cases. (not sure if my use cases also match other people's use cases..so it's great to see other people's use cases too).

In case you need functionalities which are currently missing or you need some help with the API, let me know. I'll do my best to add them. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants