Frequently Asked Questions #109

rentainhe · 2022-10-19T07:22:11Z

We keep this issue open to collect frequently asked questions and their solutions from the users.

Feel free to leave your comment here if you find any frequent issues and have ways to help others to solve them.

Notes

If you meed some convergence problem with less gpus, it's better to set a larger batch-size (batch-size=8/16) by setting dataloader.train.total_batch_size for training as mentioned in this issue: Convergence problem on coco with less gpus. #219

FAQs

1. ImportError: Cannot import 'detrex._C', therefore 'MultiScaleDeformableAttention' is not available.

detrex need CUDA runtime to build the MultiScaleDeformableAttention operator. In most cases, users do not need to specify this environment variable if you have installed cuda correctly. The default path of CUDA runtime is usr/local/cuda. If you find your CUDA_HOME is None. You may solve it as follows:

If you've already installed CUDA runtime in your environments, specify the environment variable (here we take cuda-11.3 as an example):

export CUDA_HOME=/path/to/cuda-11.3/

If you do not find the CUDA runtime in your environments, consider install it following the CUDA Toolkit Installation to install CUDA. Then specify the environment variable CUDA_HOME.
After setting CUDA_HOME, rebuild detrex again by running pip install -e .

You can also refer to these issues for more details: #98, #85

2. How to not filter empty annotations during training.

There're three ways for you to not filter empty annotations during training.

modify configs in configs/common/data/coco_detr.py as follows:

dataloader.train = L(build_detection_train_loader)(
    dataset=L(get_detection_dataset_dicts)(names="coco_2017_train", filter_empty=False),
    ...,
)

modify configs in projects as dino_r50_4scale_24ep.py.

# your config.py
dataloader = get_config("common/data/coco_detr.py").dataloader

# modify dataloader config
# not filter empty annotations during training
dataloader.train.dataset.filter_empty = False

modify your training scripts to override the config.

cd detrex
python tools/train_net.py --config-file projects/dino/configs/path/to/config.py --num-gpus 8 dataloader.train.dataset.filter_empy=False

You can also refer to these issues for more details: #78 (comment)

3. RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:54980 (errno: 98 - Address already in use).

This means that the process you started earlier did not exit correctly, there's two solution:

kill the process you started before totally
change the running port by setting --dist-url

python tools/train_net.py \
    --config-file path/to/config.py \
    --num-gpus 8 \
    --dist-url tcp://127.0.0.1:12345 \

4. DINO CPU inference

Please refer to this PR #157 for more details

5. Training coco-like custom dataset

Please refer to this PR #186 for more details.

The text was updated successfully, but these errors were encountered:

hg6185 · 2023-07-25T11:05:10Z

Hello,
I'm trying to install detrex on an hpc with Nvidia V100. I managed to set the path CUDA_HOME to path/CUDA/11.8.0

When I run the pip install -e . again, Im getting the following warning & error:

warning: nvcc warning : incompatible redefinition for option 'std', the last value of this option was used (I think this relates to one argument -std=c++17)

error:
/.../miniconda3/envs/fps-bm/lib/python3.10/site-packages/torch/include/c10/util/Half.h(73): error: identifier "_castu32_f32" is undefined

/.../miniconda3/envs/fps-bm/lib/python3.10/site-packages/torch/include/c10/util/Half.h(89): error: identifier "_castf32_u32" is undefined

2 errors detected in the compilation of "/.../detrex/detrex/layers/csrc/DCNv3/dcnv3_cuda.cu".
error: command '.../software/CUDA/11.8.0/bin/nvcc' failed with exit code 2

Did you ever encounter this and do you know a fix?
My gcc is 11.3 and supports c++17
Thanks in advance

rentainhe · 2023-07-25T11:32:33Z

Hello @hg6185

Seems like dcn_v3 operator not suitable for this environment, you can try this two ways:

search relative issue in InternImage repo here to see if there're same issues
remove this operator if you do not need to benchmark your model on InterImage backbone and re-compile detrex again

this is InternImage's official repo: https://github.com/OpenGVLab/InternImage

Seems like they already have python package for this operator: https://github.com/OpenGVLab/InternImage/releases/tag/whl_files

We will update detrex recently to remove such compiling process for this operator

hg6185 · 2023-07-25T17:37:37Z

Thanks for the quick reply @rentainhe!
Unfortunately, that's not the thing. I removed and reinstalled everything including detectron2 which now cannot be installed due to the same issue.
It seems to be a problem with c++ imports in PyTorch.

rentainhe · 2023-07-26T02:53:02Z

Thanks for the quick reply @rentainhe! Unfortunately, that's not the thing. I removed and reinstalled everything including detectron2 which now cannot be installed due to the same issue. It seems to be a problem with c++ imports in PyTorch.

I'm sorry to hear that. I suggest you could try lowering the PyTorch version to see if it helps to bypass this issue. @hg6185

hg6185 · 2023-07-26T06:40:36Z

Hi again @rentainhe ,
I found the problem. The Gcc version was incompatible with CUDA. Note that you should have a GCC that is < 10.
In my case, everything works fine with CUDA 11.3.1 and GCC 9.4.0. Thanks again for the quick support!

rentainhe · 2023-07-27T04:58:14Z

Hi again @rentainhe , I found the problem. The Gcc version was incompatible with CUDA. Note that you should have a GCC that is < 10. In my case, everything works fine with CUDA 11.3.1 and GCC 9.4.0. Thanks again for the quick support!

Would you like to add this situation in our FAQs here: #109 (comment)

hg6185 · 2023-07-27T18:43:25Z

Hi @rentainhe ,

I can add this, but what do you mean? :D
Do you want me to write a comment that makes a little summary, so you can delete the rest?

rentainhe · 2023-07-28T03:38:41Z

Hi @rentainhe ,

I can add this, but what do you mean? :D Do you want me to write a comment that makes a little summary, so you can delete the rest?

Yes, I was wondering if it's better to add it to somewhere or just keep our conversation here to help others who have met the same problem

hg6185 · 2023-08-01T14:57:54Z

hi @rentainhe
a summary of what fixed issue 1 for me: The 'latest' Detectron2 release requires a gcc version that is lower than 10.0.0. I am working on a HPC and I am able to load different CUDAs and GCCs which is practical in this case.

In order to build Detectron2 and Detrex, I used a miniconda env with CUDA 11.3.1 and gcc 9.4.0. I use PyTorch 3.8 which can be installed by this command (I post it here, because you will have to search for it since it's older):
conda install pytorch torchvision torchaudio pytorch-cuda=11.3 -c pytorch -c nvidia

Don't forget the Nvidia Toolkit matching with your version.
Note that there are some libs like matplotlib that needed to be deprecated to match an older gcc and Python version.
In general, you probably will encounter some issues on the way, but I managed to find a solution to all of them.

For instance, If you get an error with pycocotools, do pip uninstall and conda install (from conda forge)

rentainhe · 2023-08-02T02:19:40Z

hi @rentainhe a summary of what fixed issue 1 for me: The 'latest' Detectron2 release requires a gcc version that is lower than 10.0.0. I am working on a HPC and I am able to load different CUDAs and GCCs which is practical in this case.

In order to build Detectron2 and Detrex, I used a miniconda env with CUDA 11.3.1 and gcc 9.4.0. I use PyTorch 3.8 which can be installed by this command (I post it here, because you will have to search for it since it's older): conda install pytorch torchvision torchaudio pytorch-cuda=11.3 -c pytorch -c nvidia

Don't forget the Nvidia Toolkit matching with your version. Note that there are some libs like matplotlib that needed to be deprecated to match an older gcc and Python version. In general, you probably will encounter some issues on the way, but I managed to find a solution to all of them.

For instance, If you get an error with pycocotools, do pip uninstall and conda install (from conda forge)

Thank you so much for summarizing this! It's really useful!

rentainhe pinned this issue Oct 19, 2022

rentainhe added the question Further information is requested label Oct 19, 2022

rentainhe mentioned this issue Oct 27, 2022

Meet error when training DINO, Cannot import 'detrex._C', therefore 'MultiScaleDeformableAttention' is not available. #117

Closed

rentainhe assigned HaoZhang534, SlongLiu and FengLi-ust Dec 1, 2022

rentainhe mentioned this issue Dec 4, 2022

DINO inference on a CPU only machine fails #157

Open

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequently Asked Questions #109

Frequently Asked Questions #109

rentainhe commented Oct 19, 2022 •

edited

This comment was marked as outdated.

This comment was marked as outdated.

hg6185 commented Jul 25, 2023

rentainhe commented Jul 25, 2023 •

edited

hg6185 commented Jul 25, 2023

rentainhe commented Jul 26, 2023 •

edited

hg6185 commented Jul 26, 2023

rentainhe commented Jul 27, 2023

hg6185 commented Jul 27, 2023

rentainhe commented Jul 28, 2023

hg6185 commented Aug 1, 2023 •

edited

rentainhe commented Aug 2, 2023

Frequently Asked Questions #109

Frequently Asked Questions #109

Comments

rentainhe commented Oct 19, 2022 • edited

Notes

FAQs

This comment was marked as outdated.

This comment was marked as outdated.

hg6185 commented Jul 25, 2023

rentainhe commented Jul 25, 2023 • edited

hg6185 commented Jul 25, 2023

rentainhe commented Jul 26, 2023 • edited

hg6185 commented Jul 26, 2023

rentainhe commented Jul 27, 2023

hg6185 commented Jul 27, 2023

rentainhe commented Jul 28, 2023

hg6185 commented Aug 1, 2023 • edited

rentainhe commented Aug 2, 2023

rentainhe commented Oct 19, 2022 •

edited

rentainhe commented Jul 25, 2023 •

edited

rentainhe commented Jul 26, 2023 •

edited

hg6185 commented Aug 1, 2023 •

edited