Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate BEVFusion's BEV Pooling operation into BEVDet #39

Closed
Divadi opened this issue Jul 8, 2022 · 9 comments
Closed

Incorporate BEVFusion's BEV Pooling operation into BEVDet #39

Divadi opened this issue Jul 8, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@Divadi
Copy link
Contributor

Divadi commented Jul 8, 2022

Hello, I have been trying to replace BEVDet's QuickCumSum operation with BEVFusion's BEV Pooling operation.
https://github.com/mit-han-lab/bevfusion/tree/main/mmdet3d/ops/bev_pool

To do so, I simply have replaced

x, geom_feats = QuickCumsum.apply(x, geom_feats, ranks)
# griddify (B x C x Z x X x Y)
final = torch.zeros((B, C, nx[2], nx[1], nx[0]), device=x.device)
final[geom_feats[:, 3], :, geom_feats[:, 2], geom_feats[:, 1], geom_feats[:, 0]] = x
# collapse Z

with

x = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])

Where bev_pool is BEVFusion's bev_pool cuda operation.

However, I find that although there is significant speed up, the loss is not decreasing as expected (around 14 at end of epoch 5, while it should be around 9.5).

Looking at the papers, they seem to be equivalent pooling operations, but I was hoping for some guidance in case I missed something.

Thank you!

@HuangJunJie2017
Copy link
Owner

@Divadi Both acceleration modification in BEVDet(https://github.com/HuangJunJie2017/BEVDet#estimate-the-inference-speed-of-bevdet) and BEVFusion should be conducted during testing only. The explanation can be found in technical reports.

@Divadi
Copy link
Contributor Author

Divadi commented Jul 8, 2022

Thank you for your prompt response!

I believe I found the relevant section in BEVDet:
image

And for BEVFusion:
image

However, it seems like from the technical report in BEVFusion and their codebase, the "interval reduction" cuda operation can be performed during training. Their code https://github.com/mit-han-lab/bevfusion/tree/main/mmdet3d/ops/bev_pool has backward cuda operations defined as well.

Again, I apologize if I missed some details

@HuangJunJie2017
Copy link
Owner

@Divadi I misunderstanding the theory of the acceleration in BEVFusion. And it was confirmed by one of the BEVFusion authors that the acceleration in BEVFusion can be used in training. I will check and support this if it can accelerate the training process.

@Divadi
Copy link
Contributor Author

Divadi commented Jul 8, 2022

@HuangJunJie2017 I understand; I will look into it as well.

@HuangJunJie2017 HuangJunJie2017 added the enhancement New feature or request label Jul 8, 2022
@HuangJunJie2017
Copy link
Owner

HuangJunJie2017 commented Jul 13, 2022

@Divadi
hi, the problem has been solved and bevpool has been supported in this repo. Please pull the latest code for this feature.
the coordinate system definitions of the BEV feature are different in these two repos, so a transpose is required after the bevpool operation.

@ruolinsss
Copy link

Hi, it's so cool that BEVDET could also support bevpool! I wonder have you already checked the results? Will the bevpool operation degrade the performance? Thanks a lot!

@HuangJunJie2017
Copy link
Owner

@ruolinsss hi, I have checked the consistency in testing, and there is no performance degeneration. The training consistency is under checking.

@Divadi
Copy link
Contributor Author

Divadi commented Jul 13, 2022

@HuangJunJie2017 Thank you for the integration! I had found that there is no inference result change, and training is sped up 2.2 Days -> 1.5 days on 4 GPUs.
However, my original inference results themselves are different from the repository; I'll make another post.

Have you experimented with mixed precision to speed up training? I find that just adding fp16 = dict(loss_scale='dynamic') can reduce another .5 days, but have not completed the training (since I think it would require putting @force_fp32 in some places).

@HuangJunJie2017
Copy link
Owner

@Divadi I start trying mixed-precision today. Just a few hours ago.... Where to put @force_fp32 is a good problem. I will refer to other algorithms for some suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants