Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry error on loading validation set images in multiclass project #32

Closed
jenspete opened this issue Jun 17, 2023 · 13 comments
Closed

Retry error on loading validation set images in multiclass project #32

jenspete opened this issue Jun 17, 2023 · 13 comments
Assignees

Comments

@jenspete
Copy link

I believe I have tracked the error down to the line 336 in im_utils.py.

"all_annot_fnames = set(cur_annot_fnames + prev_annot_fnames)"

Reading line 338 makes me assume that the variable "all_annot_fnames" is supposed to match index-wise with all_dirs, as this is what cur_annot_fnames does. However, cur_annot_fnames has repeated elements, because it contains filenames across multiple directories corresponding to each of the multiple classes. Moreover the set datastructure does not preserve order, so any correspondence between cur_annot_fnames and all_dirs is lost. The end result is that sometimes filenames are assumed to be in directories they are not actually present in and it therefore triggers the retry error. If my interpretation is correct, I also assume that this means a much smaller validation set is actually used than what is present, as filenames are unique across classes.

@jenspete
Copy link
Author

While I am still concerned about the above code, it seems the exception can be tracked to lines 276-279 in datasets.py. In my interpretation the loop runs over all classes and assumes a filename is present in each, which I guess, is not the case if not every class was annotated? Or is this an assumption now?

@Abe404
Copy link
Owner

Abe404 commented Jun 18, 2023

Thanks for the extra details. The code is wrong as far as I can tell - I think it should not assume that every class is annotated.

Just for my reference, this is related to: #30

I will look into both soon and propose a fix.

@Abe404
Copy link
Owner

Abe404 commented Jun 21, 2023

I've managed to reproduce the issue now, or at least I am getting errors with missing validation annotations that I should not be. I'm working on an automated test. Will update when I have more progress.

@Abe404
Copy link
Owner

Abe404 commented Jun 22, 2023

Note:

We may want to look into a consider the implications of the following code:

all_annot_fnames = set(cur_annot_fnames + prev_annot_fnames)

See how this line assume that dirs and fnames correspond:
https://github.com/Abe404/RootPainter3D/blob/bedb9dc66d0c7d8cb9bfde2d0b4c2eedfd99cd6a/trainer/im_utils.py#LL338C8-L338C8

But we converted the dirs to a set before this. Can we rely on consistent ordering?

Also it looks like we only get a single instance of each scan, if we use a set? So the validation set would only include a single class for each scan and other annotations would be ignored for this scan?

@Abe404 Abe404 self-assigned this Jul 16, 2023
@Abe404
Copy link
Owner

Abe404 commented Jul 17, 2023

I have written a failing integration test for this now:
67a18ff

Next step is a failing unit test that better isolates the problem in im_utils.py

@Abe404
Copy link
Owner

Abe404 commented Jul 17, 2023

Failing unit test now implemented: 9552497

(depends on the commit after it to run).

it doesn't isolate the problem quite as much as I would like (which I think would require some refactoring and I don't want to shave the yak too much) but it shows what is going wrong in terms of patch refs not having the correct directories in a multi class project.

@Abe404
Copy link
Owner

Abe404 commented Jul 17, 2023

Unit test now passing:
c528ba2

But it looks like there's still a problem with the integration test.

The validation code still assumes there's an annotation for every class for every image in the validation set.

def get_annots_for_image(self, annot_fname):

@Abe404
Copy link
Owner

Abe404 commented Jul 17, 2023

Integration test now passing also.
7b14aa0

I believe this is fixed but I will try some more manual testing with everything running connected to the GUI just to make sure.

@Abe404
Copy link
Owner

Abe404 commented Jul 17, 2023

I'm occasionally getting this issue when testing with the total segment dataset: #34

It is intermittent due to the test picking random images. I will aim to address this soon (as a separate issue).

@Abe404
Copy link
Owner

Abe404 commented Jul 20, 2023

#34 is almost finished. I will close both issues once all tests pass and I've tested it out manually.

I think #29 might also need a bit more work for the manual test to go well.

@Abe404
Copy link
Owner

Abe404 commented Jul 27, 2023

This is fixed now. Please let me know if you run into any more problems.

@Abe404 Abe404 closed this as completed Jul 27, 2023
@Abe404
Copy link
Owner

Abe404 commented Jul 30, 2023

I found another issue related to this so re-opening.

There is a problem with the function load_train_image_and_annot

def load_train_image_and_annot(dataset_dir, train_seg_dirs, train_annot_dirs, use_seg,

It is supposed to always return a matching list of class names and annotations. When force_fg is true it may not return annotations that do not have any foreground (as these should not be used in training) but it still returns the corresponding class, leading to a mismatch between the list of classes and list of annotations, which causes an exception in the loss function.

This means training can crash for projects where only the background is annotated for some images, without any foreground annotated.

This is the test that revealed the bug:
0a701df

(testing multi class training works on total segmentor dataset).

@Abe404 Abe404 reopened this Jul 30, 2023
@Abe404
Copy link
Owner

Abe404 commented Jul 30, 2023

I found another issue related to this so re-opening.

There is a problem with the function load_train_image_and_annot

def load_train_image_and_annot(dataset_dir, train_seg_dirs, train_annot_dirs, use_seg,

It is supposed to always return a matching list of class names and annotations. When force_fg is true it may not return annotations that do not have any foreground (as these should not be used in training) but it still returns the corresponding class, leading to a mismatch between the list of classes and list of annotations, which causes an exception in the loss function.

This means training can crash for projects where only the background is annotated for some images, without any foreground annotated.

This is the test that revealed the bug: 0a701df

(testing multi class training works on total segmentor dataset).

Fixed with: 0449648

@Abe404 Abe404 closed this as completed Jul 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants