Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaskDino Fails to learn Precise Bounding Boxes on custom dataset but Dino does #242

Open
FabianSchuetze opened this issue Mar 21, 2023 · 6 comments
Assignees

Comments

@FabianSchuetze
Copy link

Thanks for the wonderful repo. It's a pleasure to work with it and to read the code.

When training MaskDino on a custom dataset, the bounding box predictions are not very good. Interestingly:

  • Dino learns good bounding boxes on the same dataset
  • The instance masks predicted by Mask Dino are good too
  • Bounding boxes with 50mAP are learned with MaskDino, but the bounding boxes for mAP 75 (and better) are poor.
  • MaskDino errored at the end with a problem in the cost_matrix. See the logs for details.

Does anybody have an idea what I could tune to generate good bb results?

Training Details:
I have slightly modified the training process (see this branch https://github.com/FabianSchuetze/detrex/tree/my_changes). I added amp training and have included some gradient checkpointing. I train with one GPU and a batch size of four (for MaskDino, Dino works with a batch size of 8). The learning rate is lowered linearly.

Data:
The instances are very dense, similar to the "is-crowded" scenes of COCO. There is only one class. I have adjusted the num_objects in the config files.

Logs:
Logs of the training runs are attached below. There are three logs:

  • One for MaskDino with the original noise scale (0.4)
  • One with a noise scale of 1.0 (Dino uses this value)
  • One for Dino (noise scale of 1.0)

Hyparameters:
Comparing the parameters, the following aspects seem notable:

  • Maskidio has a higher Hungarian class loss (5 vs 2)
  • Maskdino has 1/3 of the queries (300 vs 900)

maskdino_0.4_noise_scale.txt
maskdino_1.0_noise_scale.txt
dino_log.txt

Does anybody have an idea how to debug the problem?

@FabianSchuetze
Copy link
Author

To reproduce the results, I have used a public dataset with similar characteristics. In The COB-3D dataset, see: https://arxiv.org/abs/2210.07424 . I have extracted rgb images, bounding boxes, instance mask in the coco format. The dataset is a bit small (~6k images) and can be downloaded here. The original data is here. Please note that the data is published under the CC, non-comercial see https://github.com/wyndwarrior/autoregressive-bbox/blob/main/LICENSE .

An image of the predictions with maskdino and the gt are:
image

The logs for dino and mask dino are uploaded below.
dino.txt
maskDino.txt

Interestingly:

  • The bb mAP for dino is much better than for maskDino. The training is a bit short, but I noticed similar difference after longer training
  • However, when looking at the predictions, the visualized bbs for dino at not that much better. Both show a little bit of a low recall. I also uploaded the json predictions.
  • The results with standard Mask-RCNN heads are generally pretty good on this dataset. They have a good recall and good precision.

@HaoZhang534
Copy link
Collaborator

To reproduce the results, I have used a public dataset with similar characteristics. In The COB-3D dataset, see: https://arxiv.org/abs/2210.07424 . I have extracted rgb images, bounding boxes, instance mask in the coco format. The dataset is a bit small (~6k images) and can be downloaded here. The original data is here. Please note that the data is published under the CC, non-comercial see https://github.com/wyndwarrior/autoregressive-bbox/blob/main/LICENSE .

An image of the predictions with maskdino and the gt are: image

The logs for dino and mask dino are uploaded below. dino.txt maskDino.txt

Interestingly:

  • The bb mAP for dino is much better than for maskDino. The training is a bit short, but I noticed similar difference after longer training
  • However, when looking at the predictions, the visualized bbs for dino at not that much better. Both show a little bit of a low recall. I also uploaded the json predictions.
  • The results with standard Mask-RCNN heads are generally pretty good on this dataset. They have a good recall and good precision.

Hello, I notice that the boxes by maskdino are all shifted upper right a little bit. I guess there may be some bugs in the postprocessing code.

@HaoZhang534
Copy link
Collaborator

@FabianSchuetze When you have relatively small datasets, Mask-RCNN usually can do good enough. MaskDINO and DINO are suitable for relatively large datasets such as COCO.

@HaoZhang534
Copy link
Collaborator

HaoZhang534 commented Apr 9, 2023

@FabianSchuetze We fixed a bug in #249. Maybe you can run again to see if this solved your problem. Please also refer to the discussions in #247 .

@FabianSchuetze
Copy link
Author

Thank you so much, @HaoZhang534 ! I will train the model again tomorrow and report back.

@FabianSchuetze
Copy link
Author

@HaoZhang534 . I have worked wit the new commits but the bounding boxes are still shifted. I have commented again in #247 .

Furthermore, I am still not getting very good results. Maybe the training process is not really possible with just a batch size of 4? I will try to train on MS CoCo and see whether I can reproduce the original results. Can you maybe attach a log of the original training process? That would be wonderful & would make a comparison easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants