Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run single view detection on ScanNet? #72

Open
gyhandy opened this issue Aug 2, 2023 · 23 comments
Open

How to run single view detection on ScanNet? #72

gyhandy opened this issue Aug 2, 2023 · 23 comments

Comments

@gyhandy
Copy link

gyhandy commented Aug 2, 2023

Hi, thank you for your excellent work! If I want to un single view detection on ScanNet, are there any suggestions on code modification? Thanks!

@filaPro
Copy link
Contributor

filaPro commented Aug 3, 2023

Hi @gyhandy ,
You can just set n_images to 1 here.

@gyhandy
Copy link
Author

gyhandy commented Aug 3, 2023

Thank you for your reply! Can I train an ImvoxelNet using single view on ScanNet? Thanks!

@filaPro
Copy link
Contributor

filaPro commented Aug 3, 2023

Setting n_images to 1 in training pipeline should also be ok.

@gyhandy
Copy link
Author

gyhandy commented Aug 3, 2023

Many thanks for your quick response! If I use single view on Scannet training, during ScanNet dataset preprocess, do I need step 3 here?

"3. In this directory, extract RGB image with poses by running python extract_posed_images.py. This step is optional. Skip it if you don't plan to use multi-view RGB images. Add --max-images-per-scene -1 to disable limiting number of images per scene. ScanNet scenes contain up to 5000+ frames per each. After extraction, all the .jpg images require 2 Tb disk space. The recommended 300 images per scene require less then 100 Gb. For example multi-view 3d detector ImVoxelNet samples 50 and 100 images per training and test scene."

@filaPro
Copy link
Contributor

filaPro commented Aug 3, 2023

Yes, you need it.

@gyhandy
Copy link
Author

gyhandy commented Aug 3, 2023

Thanks again! Then which single view among the 300 pictures in the same scene will the model use to train and test?

@filaPro
Copy link
Contributor

filaPro commented Aug 3, 2023

Just a random one for each train or test iteration.

@gyhandy
Copy link
Author

gyhandy commented Aug 3, 2023

For reproducible, is it possible to fix the frame for each scene? Thanks!

@filaPro
Copy link
Contributor

filaPro commented Aug 3, 2023

You can add some workaround here.

@gyhandy
Copy link
Author

gyhandy commented Aug 3, 2023

I appreciate your help! I will try and update you here. Thanks!

@gyhandy
Copy link
Author

gyhandy commented Aug 3, 2023

Previously I use the MMdetection3d repo to run imvoxelnet experiment, but they do not support ScanNet. So I use this repo. But I found the same conda environment (I built for mmdetection3d) could not be directly used to run the code in this repo.
For instance, why do you need to constrain the mmcv versions?
mmcv_minimum_version = '1.1.5'
mmcv_maximum_version = '1.3.0'

Another question is how to change the MMdetection3D code of imvoxelnet (currently supporting sunrgbd) to run the Scannet dataset?
Thanks!

@filaPro
Copy link
Contributor

filaPro commented Aug 4, 2023

This repo is ~3 years old, so mmdetection3d had several major releases since that time, having almost nothing common with version 0.8.0 we are using here. The only way to run this repo is to follow the version of all packages in our Dockerfile.

@gyhandy
Copy link
Author

gyhandy commented Aug 4, 2023

Thanks!
I run the docker file (sudo docker build -t imvoxelnet .) and face the following errors:

ERROR [ 5/16] RUN pip install mmdet==2.10.0 11.6s

[ 5/16] RUN pip install mmdet==2.10.0:
0.852 Collecting mmdet==2.10.0
0.958 Downloading mmdet-2.10.0-py3-none-any.whl (547 kB)
1.096 Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from mmdet==2.10.0) (1.14.0)
1.137 Collecting terminaltables
1.149 Downloading terminaltables-3.1.10-py2.py3-none-any.whl (15 kB)
1.266 Collecting mmpycocotools
1.282 Downloading mmpycocotools-12.0.3.tar.gz (23 kB)
1.565 Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from mmdet==2.10.0) (1.18.1)
2.022 Collecting matplotlib
2.036 Downloading matplotlib-3.5.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
2.413 Requirement already satisfied: setuptools>=18.0 in /opt/conda/lib/python3.7/site-packages (from mmpycocotools->mmdet==2.10.0) (46.4.0.post20200518)
3.211 Collecting cython>=0.27.3
3.226 Downloading Cython-3.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB)
3.565 Collecting kiwisolver>=1.0.1
3.578 Downloading kiwisolver-1.4.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
3.731 Collecting pyparsing>=2.2.1
3.742 Downloading pyparsing-3.1.1-py3-none-any.whl (103 kB)
3.826 Collecting packaging>=20.0
3.837 Downloading packaging-23.1-py3-none-any.whl (48 kB)
3.903 Collecting cycler>=0.10
3.916 Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
3.995 Collecting python-dateutil>=2.7
4.010 Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
4.036 Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->mmdet==2.10.0) (7.1.2)
4.214 Collecting fonttools>=4.22.0
4.228 Downloading fonttools-4.38.0-py3-none-any.whl (965 kB)
4.347 Requirement already satisfied: typing-extensions; python_version < "3.8" in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->mmdet==2.10.0) (4.7.1)
4.348 Building wheels for collected packages: mmpycocotools
4.349 Building wheel for mmpycocotools (setup.py): started
4.830 Building wheel for mmpycocotools (setup.py): finished with status 'error'
4.830 ERROR: Command errored out with exit status 1:
......
......
Dockerfile:17
15 | # Install MMCV
16 | RUN pip install mmcv-full==1.2.7 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html
17 | >>> RUN pip install mmdet==2.10.0
18 |
19 | # Install MMDetection


ERROR: failed to solve: process "/bin/sh -c pip install mmdet==2.10.0" did not complete successfully: exit code: 1


Could you please help to provide a potential solution? Thanks!

@filaPro
Copy link
Contributor

filaPro commented Aug 5, 2023

Can you try with RUN conda install cython before RUN pip install mmdet==2.10.0 and tell me if it helps?

@gyhandy
Copy link
Author

gyhandy commented Aug 6, 2023

Thanks for your reply! Will try and update here. Another question: if we do monocular detection on Scannet (you also show single view test result), but we only have ground truth object labels for the whole scene, how to know which objects in the scene are visible in the given single view? Thanks!

@gyhandy
Copy link
Author

gyhandy commented Aug 6, 2023

After adding RUN conda install cython, the docker can be installed, thanks! But I face a new error when I run the
"bash tools/dist_train.sh configs/imvoxelnet/imvoxelnet_sunrgbd_fast.py 1" it looks like a CUDA or pytorch error, here are the details. Do you have recommended GPU card? Thanks!

Traceback (most recent call last):
File "tools/train.py", line 166, in
main()
File "tools/train.py", line 162, in main
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 82, in train_detector
find_unused_parameters=find_unused_parameters)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 333, in init
self.broadcast_bucket_size)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-u', 'tools/train.py', '--local_rank=0', 'configs/imvoxelnet/imvoxelnet_sunrgbd_fast.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

@filaPro
Copy link
Contributor

filaPro commented Aug 6, 2023

how to know which objects in the scene are visible in the given single view?

Yes, that it is not trivial and probably requires also extracting not only RGB but a depth image to check the occlusions. That's why we recommend to train single-view detection only on SUN RGB-D.

RuntimeError: CUDA error: no kernel image is available for execution on the device

That is not probably the bug of our code. Did you check that pytorch can for example create a single tensor in this docker image on your hardware?

@gyhandy
Copy link
Author

gyhandy commented Aug 6, 2023

Thank you for your reply! Do you have recommendations on other datasets (except SUN RGBD) that can run indoor single-view detection? Also, I am writing a code to run single view/multi view detection on ScanNet in MMdetection3D's repo (currently, MMdetection3D only supports single view on SUN RGBD, and does not support multiview on ScanNet), could you please help to refine it? Or do you have recommended strategy to modify the current code to run in the MMdetection official code? Thanks!

@gyhandy
Copy link
Author

gyhandy commented Aug 6, 2023

Yes, it can create the tensor with pytorch.

But when we use the GPU, it still shows different errors:
one thing, when we build the docker image,
we change "RUN pip install mmcv-full==1.2.7+torch1.6.0+cu101 -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html"
to "RUN pip install mmcv-full==1.2.7'

Because there is an error said can not find version of "mmcv-full==1.2.7+torch1.6.0+cu101 -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html"

Do you think this is the reason that cause the error? We tried both 3090 and 1080Ti, all does not work.

Here is the error in 1080Ti

RuntimeError: unable to write to file </torch_729_3230123415>
/mmdetection3d/mmdet3d/models/dense_heads/imvoxel_head_v2.py:172: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
flatten_valids
Exception ignored in: <function _MultiProcessingDataLoaderIter.del at 0x7fb5d6811170>
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1101, in del
self._shutdown_workers()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1075, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 45, in wait
if not wait([self.sentinel], timeout):
File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 920, in wait
ready = selector.select(timeout)
File "/opt/conda/lib/python3.7/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 727) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shar
ed memory limit.
Traceback (most recent call last):
File "tools/train.py", line 166, in
main()
File "tools/train.py", line 162, in main
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 247, in train_step
losses = self(**data)

Here is the error in 3090:

Traceback (most recent call last):
File "tools/train.py", line 166, in
main()
File "tools/train.py", line 162, in main
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 82, in train_detector
find_unused_parameters=find_unused_parameters)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 333, in init
self.broadcast_bucket_size)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-u', 'tools/train.py', '--local_rank=0', 'configs/imvoxelnet/imvoxelnet_sunrgbd_fast.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

@filaPro
Copy link
Contributor

filaPro commented Aug 6, 2023

3090 should not work with cuda 10. For 1080 just increase the shared memory of your docker image e.g. to 16Gb.

@gyhandy
Copy link
Author

gyhandy commented Aug 6, 2023

Thanks for your reply, after increase the shared memory on 1080, there are new errors:

Traceback (most recent call last):
File "tools/train.py", line 166, in
main()
File "tools/train.py", line 162, in main
meta=meta)
File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 46, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 247, in train_step
losses = self(**data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 181, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/mmdetection3d/mmdet3d/models/detectors/imvoxelnet.py", line 84, in forward_train
losses = self.bbox_head.forward_train(x, valids.float(), img_metas, gt_bboxes_3d, gt_labels_3d)
File "/mmdetection3d/mmdet3d/models/dense_heads/imvoxel_head_v2.py", line 62, in forward_train
losses = self.loss(*loss_inputs)
File "/mmdetection3d/mmdet3d/models/dense_heads/imvoxel_head_v2.py", line 104, in loss
gt_labels=gt_labels[i]
File "/mmdetection3d/mmdet3d/models/dense_heads/imvoxel_head_v2.py", line 180, in _loss_single
avg_factor=n_pos
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/focal_loss.py", line 177, in forwardavg_factor=avg_factor)
File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/focal_loss.py", line 86, in sigmoid_focal_loss
'none')
File "/opt/conda/lib/python3.7/site-packages/mmcv/ops/focal_loss.py", line 55, in forward
input, target, weight, output, gamma=ctx.gamma, alpha=ctx.alpha)
RuntimeError: SigmoidFocalLoss is not compiled with GPU support
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-u', 'tools/train.py', '--local_rank=0', 'configs/imvoxelnet/imvoxelnet_sunrgbd_fast.py', '--launcher'
, 'pytorch']' returned non-zero exit status 1.

How to solve this error:?
RuntimeError: SigmoidFocalLoss is not compiled with GPU support

Should I reinstall mmcv?

Thanks!

@gyhandy
Copy link
Author

gyhandy commented Aug 7, 2023

If I change the data format of ScanNet into the format of SUNRGBD to conduct monocular object detection, I find the camera position in SUNRGBD is always [0,0,0], which is the origin of the point cloud. While the Scannet camera position may not be [0,0,0], should I transform the point cloud in Scannet to make the camera always in the [0,0,0] position?

Theoretically, what is the format of Imvoxelnet prediction? Given an input image, will it predict the position in camera coordinates, and then transform the predicted 3D bbox back to the world coordinate based on the extrinsic to use the GT calculates loss? Thanks!

@filaPro
Copy link
Contributor

filaPro commented Aug 15, 2023

Can you please follow #55 for camera position info on SUN RGB-D?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants