About the 2D backbone #7

SxJyJay · 2022-04-30T07:59:07Z

Hi, I have some questions about training the TransFusion-LC.

You mentioned in the supplementary materials that a 2D backbone pre-trained on the autonomous driving datasets is required and frozen during training the TransFusion-LC. (i.e., DLA-34 and ResNet-50 pre-trained on the nuScenes and Waymo in repsectively.) However, I don't find relevant pre-trained models in the readme.md of this git, and relevant configuration terms in the config files (e.g., transfusion_nusc_voxel_LC.py). Or maybe you have provided but I missed something important?
Could you please provide relevant pre-trained 2D backbone models, or relevant instructions of pre-training the 2D backbone models? Thanks a lot!

XuyangBai · 2022-04-30T11:27:02Z

Hi, sorry it seems I didn't make it clear in the readme.

For DLA34 pretrained on 3D detection, I follow PointAugmenting to reuse the model provided by CenterNet. You can download the checkpoint from https://github.com/xingyizhou/CenterTrack/blob/master/readme/MODEL_ZOO.md#monocular-3d-detection-tracking.
For ResNet50+FPN pretrained on instance segmentation, I use the model provided by mmdet3d, you can download the checkpoints from https://github.com/open-mmlab/mmdetection3d/blob/v0.12.0/configs/nuimages/README.md (note that you should also use the checkpoints provided by mmdet3d v0.12.0). I choose the backbone from MaskRCNN that pretrained only on imagenet (the first one).
For ResNet50+FPN pretrained on 2D detection, I train the model by using the same config file with (2) except removing the mask head.

And I use a similar step as (3) to train a 2D backbone for the waymo dataset. I can send you the relevant processing code and config file if needed.

Best,
Xuyang.

SxJyJay · 2022-04-30T11:36:53Z

Thanks a lot for your reply! It is really clear!
Could you please send me the relevant code for training the 2D backbone for the waymo dataset if that doesn't bother you? My email is yanjay2future@gmail.com.

XuyangBai · 2022-04-30T11:57:27Z

Hi, I have sent them to your email.

SxJyJay · 2022-05-01T02:25:28Z

Thanks! I received your email.
I still have some questions about re-implementation.

In

TransFusion/configs/transfusion_nusc_voxel_LC.py

Line 130 in ab24543

# img_backbone=dict(

, I notice that you comment out DLA-34 image backbone, and replace it with the ResNet-50. I am wondering whether the configuration parameters of DLA-34 is the full version of DLA-34 because I notice that "heads" parameter is set to empty.
img_backbone=dict( type='DLASeg', num_layers=34, heads={}, head_convs=-1, ),
I am reproducing the TransFusion-L strictly following your config file and instructions, but the mAP on nuScenes validation set is only 0.5985 at 17-th epoch (the whole training process hasn't be finished yet). I don't know where I went wrong. Could you please send me the training logs for TransFusion-L and TransFusion-LC, and thereby I can compare it with my training log.

Sorry to bother you again. Sincere appreciation!

XuyangBai · 2022-05-01T03:05:08Z

Hi,

R50 + FPN gives a slightly better result compared with DLA34 (as shown in Table 12 in the supplementary material). And I only use DLA34 as the image feature extractor, so I do not load the task head.
Did you adopt the fade strategy (disenable the copy-and-paste augmentation for the last 5 epochs)? That can have a remarkable effect on the mAP by reducing the false positive.

Best,
Xuyang

SxJyJay · 2022-05-01T04:11:38Z

Oh, I got it. I forget to adopt the fade strategy for the last 5 epochs.
Besides, I found that the NDS value is always lower than mAP in my present validation process.
e.g.,

mAP 0.5199; NDS 0.4856 at epoch 5;
mAP 0.5606; NDS 0.5244 at epoch 10;
mAP 0.5895; NDS 0.5453 at epoch 15
I don't know if this is a normal phenomenon. I observe that the NDS value is generally higher than the mAP values. Could you please provide some valuable suggestions or point out where am I possibly wrong?
Thanks!

Sincerely,
Jay

XuyangBai · 2022-05-01T07:38:47Z

It is not normal, could you provide the full results such as mATE, mAOE, mASE?

XuyangBai · 2022-05-01T07:44:19Z

It can have a very bad mAOE abd mASE if you use the newest version mmdet3d to generate the .pkl and then train TransFusion. mmdet3d has a large coordinate system refactoring in newer version. See https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/compatibility.md#coordinate-system-refactoring

SxJyJay · 2022-05-01T08:00:19Z

OK, I list the TP metrics results as below:
at epoch 19 (without the fade strategy), mATE=0.2839; mASE=0.7090, mAOE=1.5609; mAVE=0.2707; mAAE=0.1913

It can have a very bad mAOE and mASE if you use the newest version mmdet3d to generate the .pkl and then train TransFusion. mmdet3d has a large coordinate system refactoring in the newer version. See https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/compatibility.md#coordinate-system-refactoring

I think this might be the key to my problem! Since I create the meta data of nuscenes with the newest released mmdet3d, and degrade its version after I find the version mismatching with the mmdet3d of the TransFusion github.
Thanks for your valuable advice! I will re-create the metadata and see what will happen.

YunzeMan · 2022-05-02T21:51:45Z

Nice discussion above! Hi @XuyangBai, I have a follow-up question regarding the training of LC model.

To load the TransFusion-L model when training the -LC model, should we change the load_from key in the config file into the -L model checkpoint, or should we leave that part empty but change the pretrained key in the TransFusionDetector field instead?

XuyangBai · 2022-05-03T00:35:58Z

Hi @YunzeMan I usually use the following code to combine the pretrained TransFusion-L and the 2D backbone

img = torch.load('img_backbone.pth', map_location='cpu')
pts = torch.load('transfusionL.pth', map_location='cpu')
new_model = {"state_dict": pts["state_dict"]}
for k,v in img["state_dict"].items():
    if 'backbone' in k or 'neck' in k:
        new_model["state_dict"]['img_'+k] = v
        print('img_'+k)
torch.save(new_model, "fusion_model.pth")

And then set the load_from key to load both the pretrained 3D backbone and 2D backbone.

WWW2323 · 2022-05-03T17:36:45Z

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

XuyangBai · 2022-05-03T23:57:55Z

@WWW2323 about 2 days for me using 8V100 GPUs

SxJyJay · 2022-05-04T01:25:01Z

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

SxJyJay · 2022-05-04T23:45:12Z

@XuyangBai Hi, I have finished the whole training process of TransFusion. I make no modifications except for replacing the DLA-34 to ResNet50+FPN as you suggested. And the final results on the nuscenes validation set are:
mAP=67.25, NDS=70.89, mATE=28.09, mASE=25.30, mAOE=28.58, mAVE=26.26, mAAE=19.15
The mAP and NDS are a little lower than the results on the nuscenes test set reported in the paper. Conventionally, I think the results on the test set are lower than those on the validation set.

Besides, I find that the mAP drop may be caused by much lower AP of some classes such as trailer, traffic cone and barrier.. I list AP of my results (on val set) vs reported results (on test set) below:
car(87.9 vs 87.1), truck(64.0 vs 60.0), bus(74.1 vs 68.3), trailer(43.5 vs 60.8), construction_vehicle(29.8 vs 33.1), pedestrian(88.3 vs 88.4), motorcycle(74.3 vs 73.6), bike(63.5 vs 52.9), traffic cone(77.1 vs 86.7), barrier(70.1 vs 78.1)

I don't know whether my results are within an acceptable error margin. Or such results are caused by the bias of different image backbones (i.e., DLA-34 and ResNet50+FPN)?

XuyangBai · 2022-05-05T00:45:41Z

Hi @SxJyJay, You can see the detailed results on val set below.

mAP: 0.6727
mATE: 0.2721
mASE: 0.2517
mAOE: 0.2740
mAVE: 0.2536
mAAE: 0.1902
NDS: 0.7122

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.876   0.169   0.148   0.085   0.259   0.185
truck   0.620   0.302   0.182   0.102   0.228   0.221
bus     0.757   0.302   0.186   0.048   0.386   0.256
trailer 0.428   0.520   0.209   0.463   0.185   0.163
construction_vehicle    0.274   0.666   0.417   0.833   0.124   0.318
pedestrian      0.878   0.128   0.282   0.360   0.215   0.097
motorcycle      0.754   0.184   0.244   0.215   0.421   0.267
bicycle 0.631   0.150   0.263   0.300   0.212   0.016
traffic_cone    0.770   0.119   0.304   nan     nan     nan
barrier 0.739   0.182   0.281   0.059   nan     nan

I think it is within an acceptable error margin. The slightly worse performance might be coming from the training variance. For the gap between validation and test set, it is normal because generally they are having different distributions. Also, you could try using more queries during inference to get a better result with a longer inference time (see Table 13 in the supplementary) Besides, if you are using a different version mmdet3d, some data augmentation strategy is actually disabled ( see the difference between LoadMultiViewImage in this codebase and in mmdet3d) if no img_fields is set, the RandomFlip augmentation is actually not working.

304886938 · 2022-05-05T04:05:24Z

Hello @XuyangBai, I want to use your results on nuscenes validation set to do object tracking experiment, but I don't have enough computing power for training. I wonder if you could provide json files of the validation set results? Here is my email 304886938@qq.com. Looking forward to your reply!

SxJyJay · 2022-05-05T11:34:54Z

Thank you. On the validation set, the performance I re-produced seems close to yours.
I also list my re-produced results on the val set below:

mAP: 0.6725
mATE: 0.2809
mASE: 0.2530
mAOE: 0.2858
mAVE: 0.2626
mAAE: 0.1915
NDS: 0.7089
Eval time: 110.1s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.879   0.168   0.148   0.087   0.259   0.196
truck   0.640   0.322   0.182   0.085   0.232   0.223
bus     0.741   0.326   0.181   0.041   0.407   0.244
trailer 0.435   0.509   0.203   0.495   0.213   0.159
construction_vehicle    0.298   0.723   0.445   0.817   0.123   0.324
pedestrian      0.883   0.128   0.285   0.376   0.217   0.093
motorcycle      0.743   0.183   0.232   0.216   0.451   0.281
bicycle 0.635   0.146   0.255   0.404   0.198   0.013
traffic_cone    0.771   0.118   0.311   nan     nan     nan
barrier 0.701   0.187   0.288   0.050   nan     nan

My problems are perfectly solved by you! Hence, I close this issue.
Thanks again for your patience!

xxlbigbrother · 2022-05-10T14:45:01Z

Hi, I have sent them to your email.

Hi, I also plan to train 2D backbone for waymo and nuscenes, Could you please send me the relevant code for training the 2D backbone? It will be helpful! My email is xxlbigbrother@gmail.com

zzm-hl · 2022-05-11T13:32:28Z

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment
`sys.platform: linux
Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA A100-SXM4-40GB
CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1
NVCC: Build cuda_11.1.TC455_06.29069683_0
GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
PyTorch: 1.8.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0
OpenCV: 4.5.5
MMCV: 1.3.18
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.11.0
MMDetection3D: 0.12.0+5337046`

zzm-hl · 2022-05-11T13:33:08Z

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d on 3090 GPUs, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment `sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-40GB CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1 NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) PyTorch: 1.8.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0 OpenCV: 4.5.5 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+5337046`

SxJyJay · 2022-05-11T14:11:15Z

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d on 3090 GPUs, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment `sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-40GB CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1 NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) PyTorch: 1.8.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0 OpenCV: 4.5.5 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+5337046`

Hi, my runtime environment is shown below:

 - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for
Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff68
3)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arc
h=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=com
pute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_
75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,co
de=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUD
NN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLA
GS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -D
NDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XN
NPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing
-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -
Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-funct
ion -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-s
trict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno
-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-ca
st -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno
-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-s
tringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WI
TH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PT
R=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, US
E_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0
OpenCV: 4.5.5
MMCV: 1.3.0
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMDetection: 2.10.0
MMDetection3D: 0.11.0+

Besides, I think you can check the time consumed on fetching data and running one forward pass to identify where is the bottleneck. Maybe your problem is caused by slow io.

zzm-hl · 2022-05-11T14:20:16Z

Hi, @XuyangBai @SxJyJay, it needs 4 days for me to train a TransFusion-L (8 V100 GPUs, epoch=20, samples_per_gpu=2), which seems too long. How long did you spend training TransFusion-L？Thanks!!

Also about 2 days for me using 8 RTX3090 GPUs.

Hi, could you please provide the environment of your CUDA, PyTorch MMCV, mmdet, mmdet3d on 3090 GPUs, because I am training on 4*A100 and the display takes 20 days, which makes me confused, I want to exclude the influence of the environment `sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-SXM4-40GB CUDA_HOME: /public/home/u212040344/usr/local/cuda-11.1 NVCC: Build cuda_11.1.TC455_06.29069683_0 GCC: gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) PyTorch: 1.8.0 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0 OpenCV: 4.5.5 MMCV: 1.3.18 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.12.0+5337046`

Hi, my runtime environment is shown below:
 - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for
Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff68
3)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arc
h=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=com
pute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_
75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,co
de=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUD
NN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLA
GS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -D
NDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XN
NPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing
-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -
Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-funct
ion -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-s
trict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno
-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-ca
st -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno
-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-s
tringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WI
TH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PT
R=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, US
E_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0
OpenCV: 4.5.5
MMCV: 1.3.0
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMDetection: 2.10.0
MMDetection3D: 0.11.0+
Besides, I think you can check the time consumed on fetching data and running one forward pass to identify where is the bottleneck. Maybe your problem is caused by slow io.

thanks for your reply! The strange thing is that my GPU usage has been maintained at 100, basically will not jump back and forth, I don't know if this can mean that the Speed of CPU loading data is normal？

wzmsltw · 2022-05-30T02:03:48Z

@SxJyJay Hi, can you provide the trained TransFusion and TransFusion-L model? My re-produced result is 63.9 mAP (Lidar) and 64.4 mAP(Lidar+Camera), which is strange. Thanks so much!

SxJyJay · 2022-05-30T02:16:51Z

@wzmsltw Hi, you can leave me your email, and I will send checkpoints to you.

wzmsltw · 2022-05-30T02:22:44Z

@SxJyJay my email address is wzmsltw@gmail.com Thanks so much for your help!

wzmsltw · 2022-05-31T01:56:06Z

@SxJyJay Hi, when will you send checkpoints? Really looking forward to it. Thanks again~

SxJyJay · 2022-05-31T02:34:54Z

@SxJyJay Hi, when will you send checkpoints? Really looking forward to it. Thanks again~

Sorry for the delay. I have something urgent yesterday. I have send you!
Best,
Yang Jiao

zzj403 · 2022-08-04T06:29:25Z

@maokp @kuangpanda @cxd520314wang I have send you my reproduced checkpoints! Please check up your email!

@SxJyJay Hi, I am a PhD studenet aiming to study the lidar-camera detection models. I've tried many times but I still cannot reproduce satisfing results. Could you please send me your checkpoints? Really looking forward to it. Thanks!
my email is 945937825@qq.com

xpyqiubai · 2022-08-12T08:28:37Z

Hi, I have sent them to your email.

Hi, I also plan to train 2D backbone for waymo and nuscenes. Could you please send me the relevant code for training the 2D backbone for the waymo and nuscenes dataset if that doesn't bother you? (specifically the waymo dataset) My email is xpydgqb@gmail.com

HatakeKiki · 2022-08-19T03:46:34Z

Hi， I'm also trying to reproduce TransFusion-L but my mAP and NDS (60.34 & 66.46) are much lower than the author's. Could you please send me your training log of TransFusion-L? I notice an obvious drop of loss at epoch 16 when fade strategy is applied in other's training. But mine seems no difference between with and w/o fade strategy. Thank you! My mail is: kiki_jiang@sjtu.edu.cn

SxJyJay · 2022-08-20T02:04:06Z

@JamesHao-ml @yangsijing1995 @wangyd-0312 @Young98CN @zzj403 @jqfromsjtu Hi, I have sent checkpoints to u. Sorry for late reply, as I just finish a DDL.

SxJyJay · 2022-08-20T02:11:19Z

@xpyqiubai @xxlbigbrother @kuangpanda Hi, I have sent data processing code for waymo and kitti to u. Sorry for late reply.

xpyqiubai · 2022-08-20T02:23:19Z

@xpyqiubai @xxlbigbrother @kuangpanda Hi, I have sent data processing code for waymo and kitti to u. Sorry for late reply.

Thanks!

yichen928 · 2022-09-27T04:24:09Z

@SxJyJay Hi SxJyJay, can you send the trained checkpoints on nuscenes to me? I need the trained TransFusion and
TransFusion-L model as well as the relevant data processing code. It would be greatly helpful for me since I may not have enough machines to train it by myself. Thank you very much! My email is 1733834831@qq.com.

SxJyJay · 2022-09-29T02:47:02Z

@SxJyJay Hi SxJyJay, can you send the trained checkpoints on nuscenes to me? I need the trained TransFusion and TransFusion-L model as well as the relevant data processing code. It would be greatly helpful for me since I may not have enough machines to train it by myself. Thank you very much! My email is 1733834831@qq.com.

I have sent relevant checkpoints and data processing code to your email.

yichen928 · 2022-09-29T03:03:17Z

@SxJyJay Hi SxJyJay, can you send the trained checkpoints on nuscenes to me? I need the trained TransFusion and TransFusion-L model as well as the relevant data processing code. It would be greatly helpful for me since I may not have enough machines to train it by myself. Thank you very much! My email is 1733834831@qq.com.

I have sent relevant checkpoints and data processing code to your email.

Thank you very much!

minrui-hust · 2022-11-18T07:05:19Z

Hi, @SxJyJay, I have reproduce the Transfusion-L with mAP 65.4, however, the reproduced Transfusion-LC model can only achive mAP 65.6, which has a large gap between yours(67.25). Can you send me your training log and checkpoint of both Transfusion-L and Transfusion-LC so I can check where went wrong, my email is hustminrui@126.com. Thank you!

SxJyJay · 2022-11-22T10:37:36Z

Hi, @SxJyJay, I have reproduce the Transfusion-L with mAP 65.4, however, the reproduced Transfusion-LC model can only achive mAP 65.6, which has a large gap between yours(67.25). Can you send me your training log and checkpoint of both Transfusion-L and Transfusion-LC so I can check where went wrong, my email is hustminrui@126.com. Thank you!

Hi, I have sent you relevant pretrained weights.

minrui-hust · 2022-11-30T03:21:46Z

Thanks a lot

carry-all-coder · 2022-12-01T03:48:52Z

@SxJyJay Hi SxJyJay, the reproduced Transfusion-LC model of mine is so low. Could you please send the trained checkpoints on nuscenes to me? I need the trained TransFusion and TransFusion-L model as well as the relevant data processing code. Thank you very much! My email is 982330532@qq.com

frogbam · 2022-12-19T08:39:21Z

@SxJyJay Hi, could you send me the checkpoint to me? I need the trained TransFusion-L,
TransFusion and 2D backbone as well as the data processing code. My email is frogbam07@gmail.com. Very thanks.

fanxlin · 2022-12-21T07:50:00Z

@SxJyJay Hi, can you provide the trained TransFusion and TransFusion-L model?
I am a novice and want to use 1 GPU to run through this verification and test model to learn.
Thanks so much! My email is fanxlin@gmail.com

jiangchaokang · 2023-01-01T04:28:57Z

Hi, @SxJyJay, I have reproduce the Transfusion-L with mAP 65.4, however, the reproduced Transfusion-LC model can only achive mAP 65.6, which has a large gap between yours(67.25). Can you send me your training log and checkpoint of both Transfusion-L and Transfusion-LC so I can check where went wrong, my email is hustminrui@126.com. Thank you!

Hi, I have sent you relevant pretrained weights.
Hello, SxJyJay, I really need a well-trained and trained model. I would be very grateful if you could send it to me. I look forward to your help. My email is ts20060079a31@cumt.edu.cn

wang632846 · 2023-01-27T18:06:11Z

Hi, @SxJyJay I am trying to reproduce the Transfusion-L. But I can‘t reach the results.
could you send me your checkpoints?
My email is hulled-stags-0b@icloud.com
Thank you so much for your work!

SxJyJay · 2023-01-28T07:36:06Z

I upload my reproduced checkpoints to Google Drive. You can get access them using the following links:
TransFusion-L: https://drive.google.com/file/d/1J7fTYsfqRovIdKPenEG5OHQObKq-tfrl/view?usp=sharing
TransFusion-LC: https://drive.google.com/file/d/1mv_JH0gqC3SrUZ9ik9qPEBlgqeeCh9Tb/view?usp=sharing

wang632846 · 2023-01-28T07:44:24Z

Hi @SxJyJay Thank you very much!

TE-fanxl · 2023-01-31T05:41:28Z

@SxJyJay Thank you so much for you kind sharing!

ajinkyakhoche · 2023-02-25T19:13:50Z

@maokp @kuangpanda @cxd520314wang I have send you my reproduced checkpoints! Please check up your email!

@maokp @kuangpanda @cxd520314wang @SxJyJay I am interested in training a 2D backbone on Waymo dataset. Could you share the relevant code and checkpoints with me on khoche@kth.se? thanks in advance!

RostyslavUA · 2023-02-28T00:36:27Z

@heminghuang7 You can comment the following part out:

TransFusion/configs/_base_/models/mask_rcnn_r50_fpn.py

Lines 56 to 66 in 399bda0

mask_roi_extractor=dict(

type='SingleRoIExtractor',

roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),

out_channels=256,

featmap_strides=[4, 8, 16, 32]),

mask_head=dict(

type='FCNMaskHead',

num_convs=4,

in_channels=256,

conv_out_channels=256,

num_classes=80,

Thank you so much!

After commenting out that part, I get an error

  File "/home/b1-gpu/mmcv-1.2.4/mmcv/utils/registry.py", line 144, in build_from_cfg
    '`cfg` or `default_args` must contain the key "type", '
KeyError: '`cfg` or `default_args` must contain the key "type", but got {\'num_classes\': 10}\nNone'

How did you solve that?

gopi-erabati · 2023-03-21T18:34:23Z

@xpyqiubai @xxlbigbrother @kuangpanda @SxJyJay I'm interested to train a 2D backbone on Waymo dataset. Can you please share the relevant code and checkpoints (if possible) to gopi231091@gmail.com. Thank you very much!

wqueree · 2023-03-23T12:05:46Z

I upload my reproduced checkpoints to Google Drive. You can get access them using the following links: TransFusion-L: https://drive.google.com/file/d/1J7fTYsfqRovIdKPenEG5OHQObKq-tfrl/view?usp=sharing TransFusion-LC: https://drive.google.com/file/d/1mv_JH0gqC3SrUZ9ik9qPEBlgqeeCh9Tb/view?usp=sharing

This is fantastic, thank you so much for sharing!

ToothlessBDG · 2023-03-28T12:22:26Z

我将复制的检查点上传到 Google 云端硬盘。您可以使用以下链接访问它们：TransFusion-L：https://drive.google.com/file/d/1J7fTYsfqRovIdKPenEG5OHQObKq-tfrl/view?usp=sharing TransFusion-LC：https://drive.google.com/file/d/1mv_JH0gqC3SrUZ9ik9qPEBlgqeeCh9Tb/view?usp=sharing

Hello, thank you very much for sharing, this is very helpful for me who only has a GPU, I also want to see the parameters after training, so can you send me a Transfusion work_dir file, thank you very much. gzr321654987@126.com

friendship1 · 2024-01-23T19:28:11Z

@xpyqiubai @xxlbigbrother @kuangpanda @SxJyJay Could you please provide the necessary code and any available checkpoints for training a 2D backbone on the Waymo dataset? If possible, send the information to friendship1@dgist.ac.kr. Your assistance is greatly appreciated!

SxJyJay closed this as completed May 5, 2022

Fan-Yixuan mentioned this issue May 11, 2022

About GPU memory usage #10

Closed

wqueree mentioned this issue Mar 24, 2023

How can I test against the validation set with transfusion_nusc_voxel_LC.pth? #88

Closed

SxJyJay mentioned this issue Apr 8, 2023

about version of python library SxJyJay/MSMDFusion#9

Open

chenyilun95 mentioned this issue Jan 19, 2024

Where does the pretrained backbone weights come from NVlabs/FocalFormer3D#16

Closed

SxJyJay mentioned this issue Mar 21, 2024

the training log files SxJyJay/MSMDFusion#39

Closed

About the 2D backbone #7

About the 2D backbone #7

Comments

SxJyJay commented Apr 30, 2022

XuyangBai commented Apr 30, 2022

SxJyJay commented Apr 30, 2022

XuyangBai commented Apr 30, 2022

SxJyJay commented May 1, 2022

XuyangBai commented May 1, 2022

SxJyJay commented May 1, 2022

XuyangBai commented May 1, 2022

XuyangBai commented May 1, 2022

SxJyJay commented May 1, 2022

YunzeMan commented May 2, 2022

XuyangBai commented May 3, 2022 • edited

WWW2323 commented May 3, 2022 • edited

XuyangBai commented May 3, 2022

SxJyJay commented May 4, 2022

SxJyJay commented May 4, 2022

XuyangBai commented May 5, 2022 • edited

304886938 commented May 5, 2022

SxJyJay commented May 5, 2022

xxlbigbrother commented May 10, 2022

zzm-hl commented May 11, 2022

zzm-hl commented May 11, 2022

SxJyJay commented May 11, 2022 • edited

zzm-hl commented May 11, 2022

wzmsltw commented May 30, 2022

SxJyJay commented May 30, 2022

wzmsltw commented May 30, 2022

wzmsltw commented May 31, 2022

SxJyJay commented May 31, 2022

zzj403 commented Aug 4, 2022 • edited

xpyqiubai commented Aug 12, 2022

HatakeKiki commented Aug 19, 2022

SxJyJay commented Aug 20, 2022

SxJyJay commented Aug 20, 2022

xpyqiubai commented Aug 20, 2022

yichen928 commented Sep 27, 2022 • edited

SxJyJay commented Sep 29, 2022 • edited

yichen928 commented Sep 29, 2022

minrui-hust commented Nov 18, 2022

SxJyJay commented Nov 22, 2022

minrui-hust commented Nov 30, 2022

carry-all-coder commented Dec 1, 2022

frogbam commented Dec 19, 2022 • edited

fanxlin commented Dec 21, 2022

jiangchaokang commented Jan 1, 2023

wang632846 commented Jan 27, 2023

SxJyJay commented Jan 28, 2023 • edited

wang632846 commented Jan 28, 2023

TE-fanxl commented Jan 31, 2023

ajinkyakhoche commented Feb 25, 2023

RostyslavUA commented Feb 28, 2023 • edited

gopi-erabati commented Mar 21, 2023

wqueree commented Mar 23, 2023

ToothlessBDG commented Mar 28, 2023

friendship1 commented Jan 23, 2024

XuyangBai commented May 3, 2022 •

edited

WWW2323 commented May 3, 2022 •

edited

XuyangBai commented May 5, 2022 •

edited

SxJyJay commented May 11, 2022 •

edited

zzj403 commented Aug 4, 2022 •

edited

yichen928 commented Sep 27, 2022 •

edited

SxJyJay commented Sep 29, 2022 •

edited

frogbam commented Dec 19, 2022 •

edited

SxJyJay commented Jan 28, 2023 •

edited

RostyslavUA commented Feb 28, 2023 •

edited