some confusion about classes in pretrained model on OpenImages #26

aurschmi · 2020-12-09T15:36:39Z

Hi, thanks for your nice repo.
I am experimenting with your pretrained network on openimages for my thesis.
But I came across some mismatch between the names of the classes you trained your network on and the official ones from OpenImagesV6.
As I understand it, you saved the class names along with the model. Then, in infer.py we load it into the variable 'classes_list'.
When I looked into that variable, the first 50 labels have a very strange string format (e.g. """Pig's organ soup""") and in addition the last 10 classes also seem to be damaged (e.g. and melon family' or "pentathlon""" (this is raw text as it is in the list)). I attached a dump of the damaged labels as a zipped csv. classes_list.csv.zip (please look at it in a text editor and not in excel)
I wonder what the implications of this damaged classes are? were these really the ones used during training?

To my understanding, the correct ids to be trained on would be here: https://storage.googleapis.com/openimages/v6/oidv6-classes-trainable.txt and the corresponding class descriptions could be found here: https://storage.googleapis.com/openimages/v6/oidv6-class-descriptions.csv

Thanks in advance for your clarification and help.

mrT23 · 2020-12-09T18:10:21Z

Hi aurschmi.
The short answer - thanks for the review. i believe its fine.

The long answer:
Open Images is a complicated dataset. hard to download, hard to pre-process and hard to train.
for example , not all the download links are still alive. another example - many classes don't have in practice relevant pictures in the train set.

Out of the 9604 possible classes, i think the actual train (and test set) only contain about ~5500 valid classes, so the detector won't predict all classes. trust me, 5500 is plenty.

i recommend you to experiment with the detector on actual images. a good recipe for doing inference - for each image, choose the best ~20 labels that crossed a threshold of 0.95-0.99. you will see that it output good results. if you find a consistent discrepancy in the detector's output, let us know.

aurschmi · 2020-12-09T20:56:02Z

Hi mrT23
First of all, thank you very much for your fast and helpful answer.
The reason I was interested in the labels list is, that I want to apply some post-processing steps with NLP techniques to filter out the most distinct labels.
So if I understood you correctly, this means that, although some of the labels are non-sense, this has no implications as these indexes anyway refer to highly unlikely (because undertrained) labels. But in any case, it still holds that the network's output matches the index-to-class schema provided.
If that's the case this is perfectly fine for me. I totally agree that 5500 labels are more than sufficient and that the labels we get from running inference on your network are helpful.
I really appreciate your effort to provide a pretrained network on OI, given the problems with the dataset that you mentioned. To my knowledge, there are no other such networks available. So, from my side, we can close that issue and I'd reopen it if I found a discrepancy.

aurschmi closed this as completed Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some confusion about classes in pretrained model on OpenImages #26

some confusion about classes in pretrained model on OpenImages #26

aurschmi commented Dec 9, 2020 •

edited

Loading

mrT23 commented Dec 9, 2020 •

edited

Loading

aurschmi commented Dec 9, 2020 •

edited

Loading

some confusion about classes in pretrained model on OpenImages #26

some confusion about classes in pretrained model on OpenImages #26

Comments

aurschmi commented Dec 9, 2020 • edited Loading

mrT23 commented Dec 9, 2020 • edited Loading

aurschmi commented Dec 9, 2020 • edited Loading

aurschmi commented Dec 9, 2020 •

edited

Loading

mrT23 commented Dec 9, 2020 •

edited

Loading

aurschmi commented Dec 9, 2020 •

edited

Loading