Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train detr3d_vovnet_train exceed the memory of 4*RTX3090 #21

Open
synsin0 opened this issue Mar 9, 2022 · 2 comments
Open

Train detr3d_vovnet_train exceed the memory of 4*RTX3090 #21

synsin0 opened this issue Mar 9, 2022 · 2 comments

Comments

@synsin0
Copy link

synsin0 commented Mar 9, 2022

Environment: 4xRTX3090.
Failure: train detr3d with resnet101 backbone dominates each card with 21GB memory. Train detr3d with vovnet backbone exceeds the memory limit. image_per_gpu is set to 1.
I read from your paper that your experiment uses 8xRTX3090. How should I adjust for adaption of my training process?

@a1600012888
Copy link

Hi synsin0.
For vovnet backbone, it is too large to fit in 3090.
If you want to fit it in 3090, you can try:

  1. fp16
  2. memory checkpoint Training Deep Nets with Sublinear Memory Cost, pytorch provide a checkpoint implementation: torch.utils.checkpoint.checkpoint, see https://pytorch.org/docs/stable/checkpoint.html?highlight=checkpoint
  3. Freeze some layers of Vovnet. e.g. first stage, etc.

@cgl-cell
Copy link

Environment: 4xRTX3090. Failure: train detr3d with resnet101 backbone dominates each card with 21GB memory. Train detr3d with vovnet backbone exceeds the memory limit. image_per_gpu is set to 1. I read from your paper that your experiment uses 8xRTX3090. How should I adjust for adaption of my training process?

Have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants