Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么我跑第三阶段时会爆显存? #28

Closed
Musicad opened this issue Aug 3, 2020 · 4 comments
Closed

为什么我跑第三阶段时会爆显存? #28

Musicad opened this issue Aug 3, 2020 · 4 comments

Comments

@Musicad
Copy link

Musicad commented Aug 3, 2020

作者大大们好像都是中国人?我就偷懒用中文写issue了。
这个implementation我没理解错的话,是分三阶段训练对吧?第三阶段用local branch的feature map通过deep feature map sharing来辅助训练global branch。显存是在最后计算ensemble loss的时候开始暴增,我将batch size设为1,每次只训练一张full size的图片,发现只要这张图片稍微大一点就会爆显存,哪怕将sub batch size设到2也不行。我的训练集图片也不算很大,最大的长宽也不超过4000。按文章里对于显存效率的说法看这不应该啊,是不是这个implementation跟文章写的不一样呢?
希望得到回复!卡在这很久很久了。

@qinliuliuqin
Copy link

作者大大们好像都是中国人?我就偷懒用中文写issue了。
这个implementation我没理解错的话,是分三阶段训练对吧?第三阶段用local branch的feature map通过deep feature map sharing来辅助训练global branch。显存是在最后计算ensemble loss的时候开始暴增,我将batch size设为1,每次只训练一张full size的图片,发现只要这张图片稍微大一点就会爆显存,哪怕将sub batch size设到2也不行。我的训练集图片也不算很大,最大的长宽也不超过4000。按文章里对于显存效率的说法看这不应该啊,是不是这个implementation跟文章写的不一样呢?
希望得到回复!卡在这很久很久了。

Hi Musicad, 请问您的问题解决了吗?我正准备复现这个工作。

@EmmaSRH
Copy link

EmmaSRH commented Aug 31, 2020

我也有相同的问题,事实上在第二阶段时显存就有明显增加,代码中写的也比较粗暴,感觉和论文中汇报的高效差距很远。。。

@chenwydj
Copy link
Collaborator

Hi everyone!

Thank you for your interest in our work!

  1. The largest size of the image we tried in our work is 5000x5000. The core hyperparameters that affect the memory usage during training are: 1) batch size; 2) sub_batch_size; 3) size of cropped patches. For training the 5000x5000 example, we used batch_size = 4, sub_batch_size = 6, crop size = 536x536. This will cost about 10G memory during training.
  2. Our main claim is testing time memory efficiency instead of training.

@qinliuliuqin
Copy link

qinliuliuqin commented Aug 31, 2020

Hi everyone!

Thank you for your interest in our work!

  1. The largest size of the image we tried in our work is 5000x5000. The core hyperparameters that affect the memory usage during training are: 1) batch size; 2) sub_batch_size; 3) size of cropped patches. For training the 5000x5000 example, we used batch_size = 4, sub_batch_size = 6, crop size = 536x536. This will cost about 10G memory during training.
  2. Our main claim is testing time memory efficiency instead of training.

Hi Wuyang,

Thanks for your timely reply! I enjoyed your paper a lot and am now trying to extend your work to high-resolution volumetric medical image segmentation (e.g., the median size is 512x512x384). I will let you know if GL-Net works in the medical image domain where memory issue is more severe.

Qin

@chenwydj chenwydj closed this as completed Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants