New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
label score per image/'primary label' #202
Comments
Interesting idea!
would you give the user the possibility to also add a new word/label, or does he need to choose best matching one from the existing labels? would you restrict that to one word? I think picking one word has definitely it's charme, but I am not sure whether we should enforce that or not. If we do not restrict that and allow free input, my gut feeling is, that people will automatically end up to write small descriptive senctences/phrases ( |
i'm hoping 'scene labels' will suffice most of the time, but some of the images will focus on an object label or even part (focus on a persons face, etc). we will have the same issues with spam .. we could curate a scene label list if need be.
right i was thinking that too, perhaps it could be structured another idea is to just sort the labels.. it might be obvious enough that a foreground object goes first (where is your eye drawn). "dog, park" (focussed on one dog). vs "park, dogs" (dogs in the distance) |
on a related note: I scrolled a bit through a bunch of papers today where they tried to train a neural net that is able to generate image descriptions. I haven't read the papers completely, but it looks like that some of the implementations used phrases/sentences as input. https://cs.stanford.edu/people/karpathy/deepimagesent/ Something that I wasn't aware either: It looks like that but the Coco dataset also contains a descriptive text for each image - always thought they were only labeling images. (I am not completely through the papers, but they sound quite interesting). |
short update: I started working on this one a few days ago. The first version will be pretty basic - it simply allows to add one (or more) image descriptions. There will be no (semantic) parsing of the input at the moment - so users can enter any text they want (I wanted to keep the first version as simple as possible). After the user entered some text, the image description is still in a locked state and needs to be unlocked by a moderator. Not sure if the manual unlocking scales (I guess at some point it won't be possible anymore for moderators to keep up), but I want to keep that mechanism as long as possible. I am a bit afraid that we will end up with a lot of garbage in our dataset, if we skip the content moderation completely. I have seen that so often in public discussion boards/forums...no matter how good your spam prevention mechanisms are, at some point bots will find a way through and mess with your board. If the dataset reaches a certain size, I guess it's not even possible to spot that by picking random samples, so I guess our safest bet at the moment is to unlock every image description manually. If that doesn't scale anymore, I think we can also introduce some sort of trust level. If a user has contributed a lot of valid data, he will automatically become a "trustworthy user". i.e we don't need to validate every contribution of that user anymore; instead we randomly pick every n-th contribution and validate that (to avoid that a user "turns bad" after becoming a trustworthy user). Here are some screenshots: Add new image descriptions:edit: I am not totally happy yet with the UI controls. Ideally, there should be only one UI element for both the label and the image description input. But I don't want to mix up labels and image descriptions - I think it's better to keep them separated in the dataset. Moderators screen (to unlock image descriptions): |
ok interesting idea.. I guess this moderation stuff is absolutely essential if you're going to have extensive user descriptions going in I was also going to make a couple of suggestions toward this.. ideas like (iv) a square bracket language for label relations forming a scene description, eg (v) there's also very vague 'thematic labels' which would definitely always be un-annotatable: |
(FYI i'm having a little Haskell sidetrack at the minute.. occasionally i dip back into it. Such a fascinating/addictive language but i still find it slower to write code. it might be another option for exploring things like the image clustering idea) |
Many thanks for the suggestions, will definitely consider them! I think your list contains a few "easy wins", which we could implement pretty easily - will keep those on my mental todo list :)
Haskell is also a language that really fascinates me. Unfortunately, I never had enough motivation and perseverance to dig deeper into the language concepts. But I've heard many times from my colleagues that once you mastered the language you get a really powerful tool in your hands :) |
so basically with more intent toward 3d models i'm writing some haskell clustering code, but it's using a plugin vector type, so i could still throw in 'a 32323 element vector' from an image. with 3d models you can physically see the clusters (draw bounding shapes around them). (i'm just remembering your 'image galaxy' idea again now..) |
* moved database specific parts to database/ * moved parser specific parts to parser/ * moved common parts to commons/ see #202
short status update: I just pushed a new version to production which includes a lot of changes. Here's a short list:
A few more words about the image description feature: Currently, only the bare minimum is implemented. I think there is still a lot of room for improvement, but I intentionally wanted to keep that whole thing as simple as possible, in order to evaluate whether that's something that's worth investing more time into. Possible improvements:
@dobkeratops I gave you moderator rights to accept/decline image descriptions. When you are logged in, you should see a notification badge in the top right corner: The notification badge shows the number of pending requests (i.e requests that need moderator action). Any help with the moderation of the image descriptions is really appreciated...as you are living in the UK, I think it might be easier for you to spot grammar mistakes. ;) But I also totally understand, if you don't wanna do this...so please don't feel forced to that ;). edit: I tried to test everything carefully, but in case I broke some existing stuff, please let me know :) |
Great to hear.. as you can probably guess I've been going deeper into haskell-land recently but haven't forgotten this at all. I have several areas I wish to connect and some hugely ambitious ideas that I want to get going on (which can connect with what you're doing here.. I think it's an awesome resource).. I just need to rotate focus between them. Not sure how long I'll stick with haskell - it's amazingly addictive/elegant but I think you got it right using Go for a website, and I still think Rust has a great balance between correctness/safety and low-level capability (mixing the 'expression-based' code with real-world imperative & OOP styles). I'm basically on a pendulum-swing with C++ at one end, rust in the middle. I definitely don't agree with the 'pure-fp zealots' anyway now I have to decide if I should try some serious new ideas in haskell (there are definitely things I spent quite a bit of time on in the past that it does suit) or go back to Rust (which I think can do everything)... |
Great to hear - really looking forward to that! In my opinion a good ecosystem of libraries and tools that are built around a service are at least as important as the service itself. I've seen so many great services over the years...some of them were really extraordinary and solved real problems. While the service itself was really good, the ecosystem built around it was...well..not that good. Often they had their way of doing something and if you wanted to do something slightly different, it was either really complicated or not possible at all. I hope that by focusing on the API and third party integrations early in the process, that we get a better feeling of what functionality we/other users actually need. So I think any effort in that direction is definitely worth it and pays off in the long run :)
Not sure, if Go was the perfect choice, but I am pretty happy with it so far. In the past, I always used Python as a backend language for web applications (which I later always regretted at some point). My main motivation for using Python was always "oh, that will be just a small application, so Python will do just fine". But most of the time, the application got bigger than expected. And at some point in time I wanted to refactor and restructure the application, which I always find really painful in Python due to dynamic typing. I recently refactored the imagemonkey-core sourcecode a bit (moved a lot of stuff to libraries) and I have to say: The experience was really smooth...way nicer than my experience with Python. The fact that golang is a statically typed language also helps a lot...no matter how many unit/integration tests I have in place, it makes me way more confident when the compiler runs through after refactoring. I recently stumbled accross nim (https://nim-lang.org/) which also sounds really interesting. I am still looking for a use case though. I tend to re-write stuff in a different language, just because the language itself fascinates me. But while rewriting the software, I often lose interest in the project itself, as it's anyway already "solved" and then I end up with a bunch of unfinished code. So I've decided to only write new stuff in a new language...it's not always working out, but I got better at controlling the temptation :D |
( update, i'm firmly back in rust-land working on the renderer again. Haven't dropped so many contributions to your site lately , but I still try to dip in occasionally. regarding 'describe-the-image', I hope my speculative use is ok: I've been dropping sentances with square brackets [] enclosing as much of the parse tree as possible, especially in the hope that the leaves back on my project...the time in haskell helped me get out of the c++ comfort zone, but Rust is the best language for me overall; The two most immediate utilities I'm after are:- (ii) (a) microtxture and (b)'corner/crease' texture selection e.g. (ii)(a)breaking an image into one low res structural image, and one fine-grain which can be repeated at a different rate. not really 'compression' but giving the same impression in a way that can be varied. (iii) is about enhancement of image corners etc. (this is why I'm so keen on things like material labels) |
Thanks for the update :) That's great to hear!
That sounds very interesting, please keep me posted on your progress - really looking forward to that. I am hoping that once there are a few more applications that actually make use of the dataset, that it gets easier to extend the site's functionality. Although I tried to keep the APIs as open and flexible as possible, but they are currently designed mostly around my own use cases. (not sure if my use cases also match other people's use cases..so it's great to see other people's use cases too). In case you need functionalities which are currently missing or you need some help with the API, let me know. I'll do my best to add them. :) |
expanding on the idea of a "primary label" per image-imagine giving each label a score;
I suspect the image sites would have something like this,
imagine asking these questions:
This is subjective, and could yield a floating point value representing the degree of consensus/certainty.
I'm not sure if that's 2 seperate values , or one value ("the coupling between the label and the image") .. could two values be combined?
Such a value could be used for:
This sort of thing might become more useful as both the image set and label set grows.. I imagine many un-annotatable labels still being useful for image-wide training (e..g distinctions between describing an image primarily various overlapping terms , such as...
person
vscrowd
,city
vsstreet
,city
vsbuilding
,city
vstown
,street
vsroad
,garden
andpark
)The text was updated successfully, but these errors were encountered: