About GPU memory usage #10

Fan-Yixuan · 2022-05-05T09:17:24Z

Thanks for your great work! I am trying to reimplement your work with the new version (v1.0.0) of mmd3d, my environment:

sys.platform: linux
Python: 3.8.11 (default, Aug  3 2021, 15:09:35) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.1, V11.1.74
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.9.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.10.1
OpenCV: 4.5.3
MMCV: 1.5.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.23.0
MMSegmentation: 0.24.0
MMDetection3D: 1.0.0rc1+c7cde78

I have dealt with the coordinate system refactoring problem and also the img_fields issue, but I can only train with up to 50 query proposals while with one sample per 24GB RTX3090 GPU, using the default config (nuscenes, Lidar and camera, R50FPN, second lidar backbone, 200queries) will encounter CUDA OOM.

Noting your practice #6 (comment), I hereby seek help. I also didn't notice your use of spconv, hope you can provide more details.
Thanks a lot.

The text was updated successfully, but these errors were encountered:

XuyangBai · 2022-05-05T09:39:26Z

Hi @Fan-Yixuan Thanks for your interest in our work. I have tried training TransFusion on 8 3090GPUs and it could fit into the memory, not sure what happens in your environment. But you could try to use spconv 1.2 to reduce the memory. The spconv is used in SparseEncoder. mmdet3d includes spconv in their repo in mmdet3d/ops/spconv but it is a old version. To use another version of spconv, I used to install it following the instruction here and replace

TransFusion/mmdet3d/ops/spconv/__init__.py

Lines 14 to 20 in 5337046

    
           from .conv import (SparseConv2d, SparseConv3d, SparseConvTranspose2d, 
        
                              SparseConvTranspose3d, SparseInverseConv2d, 
        
                              SparseInverseConv3d, SubMConv2d, SubMConv3d) 
        
           from .modules import SparseModule, SparseSequential 
        
           from .pool import SparseMaxPool2d, SparseMaxPool3d 
        
           from .structure import SparseConvTensor, scatter_nd

by something like

from spconv import SparseConv2d, ...

Fan-Yixuan · 2022-05-05T14:22:40Z

Thanks a lot for your help, I'm using the latest spconv 2.1.21 and now I can train 200 queries with one sample per 3090 using ~22GB memory. While 2 samples per GPU is still not achievable. I will keep exploring to better solve this problem!

Fan-Yixuan · 2022-05-06T09:25:12Z

@XuyangBai Hi dear author, I would like to ask if TransFusion's prediction heads do not contain branches for attribute prediction (moving, stopped, parked vehicle, etc.). I'm not familiar with this task (nuScenes), why does it work like this instead of reducing AAE by adding such branches.

XuyangBai · 2022-05-06T10:58:43Z

I basically follow the mmdet3d and achieve the attribute prediction using some post-processing rules, check the code here:

TransFusion/mmdet3d/datasets/nuscenes_dataset.py

Line 319 in 5337046

for i, box in enumerate(boxes):

Fan-Yixuan · 2022-05-06T11:03:16Z

Yes I noticed, but it seems strange to directly use the default attribute, is there any official statement as to why this is done?

XuyangBai · 2022-05-06T11:13:01Z

Ah sorry I just use it as the de-facto, never carefully think about this issue

Fan-Yixuan · 2022-05-06T11:15:24Z

Ok, since mmd3d implements it like this, it should have its own reason 2333333

Fan-Yixuan · 2022-05-11T09:00:56Z

@XuyangBai Hi dear author, I finished training transfusion_nusc_voxel_L and got val set performance of 64.63mAP/69.99NDS. The previous problem about GPU memory has been solved, which was because the images were not resized to 448*800 due to some version issue.
However, the training after adding cameras encountered some problems, mATE, mASE, mAOE, and mAVE are all increasing with training. Do you have any suggestion for possible causes? My question is, it seems that GlobalRotScaleTrans and RandomFlip3D will cause mismatch between LiDAR and camera?

XuyangBai · 2022-05-11T09:34:47Z

Did you use the newest code? There is some bugs when changing the shape of image features, leading to the mismatch between two modalities, which are fixed in 8977b2b and 5187414

XuyangBai · 2022-05-11T09:42:18Z

GlobalRotScaleTrans and RandomFlip3D will not break the matching between LiDAR and camera because every time we project the object queries(and initial prediction) from 3d space onto the image plane, we first do the inverse transformation, which will convert the augmented 3d positions back to the original coordinate. See the following code:

TransFusion/mmdet3d/models/dense_heads/transfusion_head.py

Lines 944 to 948 in 5187414

    
           if batch_size == 1:  # skip during inference to save time 
        
               points = query_pos_3d_with_corners.T 
        
           else: 
        
               points = apply_3d_transformation(query_pos_3d_with_corners.T, 'LIDAR', img_metas[sample_idx], reverse=True).detach() 
        
           num_points = points.shape[0]

BTW, I just realize that this might be the problem: here I assume the BS=1 is for evaluation time so I skip the apply_3d_transformation for fast inference. If you use samples_per_gpu=1 for training you should remove this logic and always apply apply_3d_transformation.

Fan-Yixuan · 2022-05-11T13:01:53Z

I'm using the latest version of the code, and I'm using 2 samples per GPU, and I have another question, RandomFlip3D's parent class, RandomFlip, doesn't support flipping a list of images, will it matter?

XuyangBai · 2022-05-11T14:20:55Z

Yes, It might be the reason. If the flip in img_field is set to True but the image is not flipped, then the consistency between Lidar and the image is broken. You can check the preprocessing classes to figure out how it works in my implementation, I do not remember exactly where I did the conversion from a list of images to a ndarray.

Fan-Yixuan · 2022-05-11T14:29:53Z

My concern is that maybe
https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/pipelines/transforms.py#L465-L469
should be changed to a loop over the list of images like you did in MyResize etc., but I don't understand why you and #7 (comment) are able to get the correct training results based on the current code.

XuyangBai · 2022-05-12T00:36:34Z

@Fan-Yixuan I find mmcv.imflip do works for list of images, see the following example:

Fan-Yixuan · 2022-05-12T09:10:45Z

Thanks for the explanation, that's true, but I still can't seem to solve my problem. The strangest thing I found is the change of loss_bbox during the training process as shown in the figure. The orange line is the result of LiDAR only, and the red line is the result of LC. Do you have any suggestions? thanks a lot.

Also I found that I didn't notice the changes in train.py, i.e. I didn't freeze the LiDAR branch, combined with the figure above, I now think this is likely the reason.

XuyangBai · 2022-05-12T10:15:22Z

It is really weird that the bbox loss turns to increase at some point, the curve before 10k looks normal. I am not sure the reason but maybe you can first verify the projection of object queries onto the image through some visualization? If the lidar and image are not aligned well, the image feature attached to the object queries will be wrong. BTW, you mentioned the mATE, mASE, mAOE are all increasing, so how about mAP?

Fan-Yixuan · 2022-05-12T10:26:01Z

The first three epochs after adding camera, mAP: 62.49, 58.86, 59.66. I feel that the loss turns to increase is probably because the learning rate becomes larger (I use 4*3090 with 2 samples per GPU so I forward propagation twice and then update the parameters to make batch size equals 16, thus learning rate reaches a maximum at around 40k iters)

Do you think this is normal if the LiDAR branch is not frozen?

XuyangBai · 2022-05-12T10:34:20Z

The learning rate should not be the reason. I have also tried to use batch_size 8*1.

Yes, I freeze the LiDAR branch during training of TransFusion as it is already well trained in the first stage. If you would like to jointly optimize the lidar branch and the fusion component, maybe they should be operimized in different learning rates.

Fan-Yixuan · 2022-05-14T02:44:33Z

Hi, sorry for the late reply. I made two changes: the first follows your changes in the dataset definition file, but from what I understand this shouldn't have a real impact.

The second is to freeze the weights of the LiDAR branch. Now I can get 66.75mAP/71.03NDS on the nuScenes validation set. So I think the previous problem is caused by using too large learning rate for the LiDAR branch which has been well trained.

XuyangBai · 2022-05-15T00:41:10Z

Yes, the order of images does not affect a lot but freezing the backbone did.

Fan-Yixuan · 2022-05-15T02:42:23Z

Ok, thank you for your patience and your excellent work, I close this issue.

nmll · 2022-05-16T14:20:47Z

@Fan-Yixuan Hello! Could you tell me the max learning rate in your training step of the first stage and second stage separately?

Fan-Yixuan · 2022-05-16T14:24:51Z

Hi, my experiment follows the code given by the author

TransFusion/configs/transfusion_nusc_voxel_L.py

Lines 244 to 250 in 5337046

    
           optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.01)  # for 8gpu * 2sample_per_gpu 
        
           optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2)) 
        
           lr_config = dict( 
        
               policy='cyclic', 
        
               target_ratio=(10, 0.0001), 
        
               cyclic_times=1, 
        
               step_ratio_up=0.4)

TransFusion/configs/transfusion_nusc_voxel_LC.py

Lines 246 to 252 in 5337046

    
           optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.01)  # for 8gpu * 2sample_per_gpu 
        
           optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2)) 
        
           lr_config = dict( 
        
               policy='cyclic', 
        
               target_ratio=(10, 0.0001), 
        
               cyclic_times=1, 
        
               step_ratio_up=0.4)

nmll · 2022-05-16T14:26:53Z

OK! Thanks!

nmll · 2022-05-28T15:01:00Z

GlobalRotScaleTrans and RandomFlip3D will not break the matching between LiDAR and camera because every time we project the object queries(and initial prediction) from 3d space onto the image plane, we first do the inverse transformation, which will convert the augmented 3d positions back to the original coordinate. See the following code:

TransFusion/mmdet3d/models/dense_heads/transfusion_head.py

Lines 944 to 948 in 5187414

if batch_size == 1: # skip during inference to save time

points = query_pos_3d_with_corners.T

else:

points = apply_3d_transformation(query_pos_3d_with_corners.T, 'LIDAR', img_metas[sample_idx], reverse=True).detach()

num_points = points.shape[0]

BTW, I just realize that this might be the problem: here I assume the BS=1 is for evaluation time so I skip the apply_3d_transformation for fast inference. If you use samples_per_gpu=1 for training you should remove this logic and always apply apply_3d_transformation.

Hello! @XuyangBai May I ask about this apply_3d_transformation is only used in projecting 3D to 2D query, but is not used in adding the BEV lidar feature and BEV image feature for image guided query initialization. Will this be a mismatch between lidar and image modalities due to the Radomflip3d and GlobalRotScaleTrans?

XuyangBai · 2022-06-01T13:32:04Z

Hi @nmll That's a very good question that I didn't realize previously. Intuitively, the point clouds should also be transformed using the inversion of data augmentation when projecting image features onto the BEV plane (or equivalently, I should perform a similar rotation and flip to images, which is somewhat complicated). However, the network still works under the current settings. My guess is that the network is able to 1) leverage the contextual relationship (between image features and LiDAR features) to associate the two sets of features and thus perform the projection, and 2) ignore the geometry relationship brought by the position encodings of image features and LiDAR features.

Furthermore, I have run another experiment that removes the RandomFlip and GlobalRotScaleTrans during training to see whether forcing the two modalities to be consistent will further improve the results. In this case, the network could also leverage the geometry relationship to build the association. The observation is that: the training loss is decreasing more rapidly compared with the previous setting. The blue curve in the following figure is the one without RandomFlip & GlobalRotScaleTrans while the gray curve is the original one. However, the final mAP and NDS is similar. So I assume that removing these two augmentations will increase the convergence speed but the final performance might be already saturated (although the heatmap_loss could be further reduced, the object queries selected by the heatmap are already with good locations, so the improvement is not remarkable in terms of final mAP and NDS)

I will remove the RandomFlip and GlobalRotScaleTrans in the config files, which is more reasonable and gives better convergence speed. Thanks a lot for pointing out that issue.

Best,
Xuyang

heming7 · 2022-06-16T21:06:54Z

@XuyangBai Hi dear author, I finished training transfusion_nusc_voxel_L and got val set performance of 64.63mAP/69.99NDS. The previous problem about GPU memory has been solved, which was because the images were not resized to 448*800 due to some version issue. However, the training after adding cameras encountered some problems, mATE, mASE, mAOE, and mAVE are all increasing with training. Do you have any suggestion for possible causes? My question is, it seems that GlobalRotScaleTrans and RandomFlip3D will cause mismatch between LiDAR and camera?

Hello @Fan-Yixuan

Can you tell what you did to solve the version issue? I am facing the same problem now.

Fan-Yixuan · 2022-06-17T01:36:31Z

Hi @heming7, you need to make sure that results['img_fields'] is ['img'] and type(results['img']) is list before these code:

TransFusion/mmdet3d/datasets/pipelines/loading.py

Lines 187 to 190 in 8977b2b

    
           def _resize_img(self, results): 
        
               """Resize images with ``results['scale']``.""" 
        
               for key in results.get('img_fields', ['img']): 
        
                   for idx in range(len(results['img'])):

heminghuang7 · 2022-06-17T19:01:33Z

Hi @heming7, you need to make sure that results['img_fields'] is ['img'] and type(results['img']) is list before these code:

TransFusion/mmdet3d/datasets/pipelines/loading.py

Lines 187 to 190 in 8977b2b

def _resize_img(self, results):

"""Resize images with ``results['scale']``."""

for key in results.get('img_fields', ['img']):

for idx in range(len(results['img'])):

Hello Yixuan

Thank you for the suggestion. I checked the code and I think the author has pushed a commit that fixes this. But I manage to run it by reducing the value samples_per_gpu. Anyway, thank you so much for the help!

yinjunbo · 2022-10-10T03:30:18Z

@XuyangBai Hi dear author, I finished training transfusion_nusc_voxel_L and got val set performance of 64.63mAP/69.99NDS. The previous problem about GPU memory has been solved, which was because the images were not resized to 448*800 due to some version issue. However, the training after adding cameras encountered some problems, mATE, mASE, mAOE, and mAVE are all increasing with training. Do you have any suggestion for possible causes? My question is, it seems that GlobalRotScaleTrans and RandomFlip3D will cause mismatch between LiDAR and camera?

Hi, @Fan-Yixuan, could you please share your torch/cuda/mmdet3d/spconv environment you've used to reproduce the nusc val performance (64.63mAP and 69.99NDS)? It seems that you used 8*3090 with batch size 2 and lr 1e-4?

Fan-Yixuan · 2022-10-10T06:17:28Z

@XuyangBai Hi dear author, I finished training transfusion_nusc_voxel_L and got val set performance of 64.63mAP/69.99NDS. The previous problem about GPU memory has been solved, which was because the images were not resized to 448*800 due to some version issue. However, the training after adding cameras encountered some problems, mATE, mASE, mAOE, and mAVE are all increasing with training. Do you have any suggestion for possible causes? My question is, it seems that GlobalRotScaleTrans and RandomFlip3D will cause mismatch between LiDAR and camera?

Hi, @Fan-Yixuan, could you please share your torch/cuda/mmdet3d/spconv environment you've used to reproduce the nusc val performance (64.63mAP and 69.99NDS)? It seems that you used 8*3090 with batch size 2 and lr 1e-4?

Hi my env: #10 (comment), my spconv: 2.1.21 my total batchsize: 16, lr: 1e-4

yinjunbo · 2022-10-10T06:27:38Z

@XuyangBai Hi dear author, I finished training transfusion_nusc_voxel_L and got val set performance of 64.63mAP/69.99NDS. The previous problem about GPU memory has been solved, which was because the images were not resized to 448*800 due to some version issue. However, the training after adding cameras encountered some problems, mATE, mASE, mAOE, and mAVE are all increasing with training. Do you have any suggestion for possible causes? My question is, it seems that GlobalRotScaleTrans and RandomFlip3D will cause mismatch between LiDAR and camera?

Hi, @Fan-Yixuan, could you please share your torch/cuda/mmdet3d/spconv environment you've used to reproduce the nusc val performance (64.63mAP and 69.99NDS)? It seems that you used 8*3090 with batch size 2 and lr 1e-4?

Hi my env: #10 (comment), my spconv: 2.1.21 my total batchsize: 16, lr: 1e-4
@Fan-Yixuan , Thanks for your quik reply! I'll have another try.
Btw, could you please share your traing log, so I can check my problem accordingly(email: yinjunbocn@gmail.com)?

Fan-Yixuan · 2022-10-10T07:04:55Z

@yinjunbo
Sure, for training lidar-only, the first 15 epochs:
20220505_225100.log
the last 5 epochs (fade strategy):
20220508_101828.log

yinjunbo · 2022-10-10T07:12:37Z

@yinjunbo Sure, for training lidar-only, the first 15 epochs: 20220505_225100.log the last 5 epochs (fade strategy): 20220508_101828.log

Thank you very much!
I find that my training loss is obvisously larger than yours. Did you try to train a model before coordinate system refactoring ?

Fan-Yixuan · 2022-10-10T07:17:32Z

@yinjunbo Sorry I didn't save the training logs before modifying the coordinate system, but if the coordinate is not aligned, it should work very poorly.

yinjunbo · 2022-10-10T07:20:20Z

@yinjunbo Sorry I didn't save the training logs before modifying the coordinate system, but if the coordinate is not aligned, it should work very poorly.

I totally agree with you. Since my repreoced performance is just slightly lower (~2 points) than yours, this could not be caused by coordinate system. I'll continue to find the problems. tks!

BoomSky0416 · 2023-07-21T02:13:03Z

@Fan-Yixuan Hello, I am trying to reproduce transfusion in mmdet3d-1.1.0. But I got the wrong result in training lidar-camera fusion stage. Could you please share your training log for this stage, thanks! (email: shoutian@umich.edu)

Fan-Yixuan closed this as completed May 15, 2022

XuyangBai mentioned this issue Jun 27, 2022

loss_heatmap reduction abnormal #33

Open

Fan-Yixuan reopened this Oct 10, 2022

Fan-Yixuan closed this as completed Oct 10, 2022

yichen928 mentioned this issue Jul 12, 2023

mmdet3d安装失败 yichen928/SparseFusion#5

Closed

About GPU memory usage #10

About GPU memory usage #10

Comments

Fan-Yixuan commented May 5, 2022

XuyangBai commented May 5, 2022 • edited Loading

Fan-Yixuan commented May 5, 2022

Fan-Yixuan commented May 6, 2022

XuyangBai commented May 6, 2022 • edited Loading

Fan-Yixuan commented May 6, 2022

XuyangBai commented May 6, 2022

Fan-Yixuan commented May 6, 2022

Fan-Yixuan commented May 11, 2022

XuyangBai commented May 11, 2022

XuyangBai commented May 11, 2022

Fan-Yixuan commented May 11, 2022

XuyangBai commented May 11, 2022 • edited Loading

Fan-Yixuan commented May 11, 2022 • edited Loading

XuyangBai commented May 12, 2022

Fan-Yixuan commented May 12, 2022 • edited Loading

XuyangBai commented May 12, 2022

Fan-Yixuan commented May 12, 2022 • edited Loading

XuyangBai commented May 12, 2022 • edited Loading

Fan-Yixuan commented May 14, 2022

XuyangBai commented May 15, 2022

Fan-Yixuan commented May 15, 2022

nmll commented May 16, 2022

Fan-Yixuan commented May 16, 2022

nmll commented May 16, 2022

nmll commented May 28, 2022

XuyangBai commented Jun 1, 2022 • edited Loading

heming7 commented Jun 16, 2022 • edited Loading

Fan-Yixuan commented Jun 17, 2022

heminghuang7 commented Jun 17, 2022

yinjunbo commented Oct 10, 2022 • edited Loading

Fan-Yixuan commented Oct 10, 2022

yinjunbo commented Oct 10, 2022 • edited Loading

Fan-Yixuan commented Oct 10, 2022 • edited Loading

yinjunbo commented Oct 10, 2022

Fan-Yixuan commented Oct 10, 2022

yinjunbo commented Oct 10, 2022

BoomSky0416 commented Jul 21, 2023

XuyangBai commented May 5, 2022 •

edited

Loading

XuyangBai commented May 6, 2022 •

edited

Loading

XuyangBai commented May 11, 2022 •

edited

Loading

Fan-Yixuan commented May 11, 2022 •

edited

Loading

Fan-Yixuan commented May 12, 2022 •

edited

Loading

Fan-Yixuan commented May 12, 2022 •

edited

Loading

XuyangBai commented May 12, 2022 •

edited

Loading

XuyangBai commented Jun 1, 2022 •

edited

Loading

heming7 commented Jun 16, 2022 •

edited

Loading

yinjunbo commented Oct 10, 2022 •

edited

Loading

yinjunbo commented Oct 10, 2022 •

edited

Loading

Fan-Yixuan commented Oct 10, 2022 •

edited

Loading