Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket size warning #73

Open
rahulagrawal048 opened this issue Mar 13, 2022 · 5 comments
Open

Bucket size warning #73

rahulagrawal048 opened this issue Mar 13, 2022 · 5 comments

Comments

@rahulagrawal048
Copy link

rahulagrawal048 commented Mar 13, 2022

[W reducer.cpp:347] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [19, 256, 1, 1], strides() = [256, 1, 256, 256]
bucket_view.sizes() = [19, 256, 1, 1], strides() = [256, 1, 1, 1] (function operator())

I trained the segmenter-B0 model on Cityscapes dataset using 4 GPUs with a sample size per GPU as 2 and get the above warning. Has anyone faced a similar problem or know where the issue might be?

This caused a drop in performance to mIoU ~ 63 on the val set as compared to 76.2 stated in the paper.

@TheoPis
Copy link

TheoPis commented Mar 14, 2022

I too have had a similar warning when using B0, B1, B5 on both Cityscapes and ADE20K. In my case with B0 on Cityscapes I get single scale miou 75.3 instead of 76.2 stated in the paper. It also seems that per-batch training time is slowed down by this warning/error to a significant degree. I would be greatfull if any suggestions could be provided as to what may be causing this.

@rahulagrawal048
Copy link
Author

With B0 on Cityscapes, what batch size per GPU did you use to get 75.3? Did you change anything else?

@TheoPis
Copy link

TheoPis commented Mar 14, 2022

@rahulagrawal048 : I used a total batch size of 8 (4 gpus, 2 per gpu). It's also important to mention that I do not use mmseg rather I have made a very carefull introduction of the MiT and Segformer implementations from here to my codebase. I also closely followed the config files in local_configs/ for B0. I thought the error message you mentioned was somehow related to me not using mmseg but it may be a more general issue than that. Do you use mmseg in your reproduction?

@rahulagrawal048
Copy link
Author

Yes I am completely using mmseg and that might be the reason for the low mIoU I get.

@harshm121
Copy link

Warning: Grad strides do not match bucket view strides.

Were you able to figure out what led to this warning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants