Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'portalocker' has no attribute 'Lock' #4

Closed
Wei-i opened this issue Mar 26, 2021 · 31 comments
Closed

AttributeError: module 'portalocker' has no attribute 'Lock' #4

Wei-i opened this issue Mar 26, 2021 · 31 comments

Comments

@Wei-i
Copy link

Wei-i commented Mar 26, 2021

Thanks for sharing your great work. I am sorry that I have a bug when I use
python ./tools/train_net.py --num-gpus 1 --config-file ./configs/yolof_R_50_C5_1x.yaml

Bug log below as :

[03/26 07:38:03 d2.data.build]: Using training sampler TrainingSampler
[03/26 07:38:03 d2.data.common]: Serializing 117266 elements to byte tensors and concatenating them all ...
[03/26 07:38:10 d2.data.common]: Serialized dataset takes 451.21 MiB
[03/26 07:38:15 fvcore.common.checkpoint]: Loading checkpoint from detectron2://ImageNetPretrained/MSRA/R-50.pkl
Traceback (most recent call last):
File "./tools/train_net.py", line 234, in
args=(args,),
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "./tools/train_net.py", line 215, in main
trainer.resume_or_load(resume=args.resume)
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 353, in resume_or_load
checkpoint = self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS, resume=resume)
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/fvcore/common/checkpoint.py", line 215, in resume_or_load
return self.load(path, checkpointables=[])
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/fvcore/common/checkpoint.py", line 140, in load
path = self.path_manager.get_local_path(path)
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/iopath/common/file_io.py", line 1100, in get_local_path
path, force=force, **kwargs
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/utils/file_io.py", line 29, in _get_local_path
return PathManager.get_local_path(self.S3_DETECTRON2_PREFIX + name, **kwargs)
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/iopath/common/file_io.py", line 1100, in get_local_path
path, force=force, **kwargs
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/iopath/common/file_io.py", line 755, in _get_local_path
with file_lock(cached):
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/iopath/common/file_io.py", line 82, in file_lock
return portalocker.Lock(path + ".lock", timeout=3600) # type: ignore
AttributeError: module 'portalocker' has no attribute 'Lock'

I woule be grateful if you could give me some advice. Thanks.

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

It seems that there may be wrong with 'Loading checkpoint from detectron2://ImageNetPretrained/MSRA/R-50.pkl
'?

@chensnathan
Copy link
Owner

Could you check the version of portalocker in your environment? And run the following code snippets to verify whether the portalocker has the attribute Lock or not:

>>> import portalocker
>>> portalocker.__version__
>>> portalocker.Lock

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

Sorry, there must be something wrong with my portalocker?

(yolof) cw@MAC-3DGroup:~$ python
Python 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fvcore
>>> import portalocker
>>> 
>>> portalocker.__version__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'portalocker' has no attribute '__version__'

@chensnathan
Copy link
Owner

Try to re-install portalocker?

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

first
pip uninstall portalocker
and
conda install portalocker
then bug will be fixed.

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

[03/26 08:50:52 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks) [03/26 08:50:52 d2.utils.events]: iter: 0 lr: N/A max_mem: 622M Traceback (most recent call last): File "./tools/train_net.py", line 234, in <module> args=(args,), File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/launch.py", line 62, in launch main_func(*args) File "./tools/train_net.py", line 221, in main return trainer.train() File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 431, in train super().train(self.start_iter, self.max_iter) File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 140, in train self.run_step() File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 441, in run_step self._trainer.run_step() File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 234, in run_step loss_dict = self.model(data) File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 294, in forward pred_logits, pred_anchor_deltas) File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 387, in losses dist.all_reduce(num_foreground) File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 953, in all_reduce _check_default_pg() File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg "Default process group is not initialized" AssertionError: Default process group is not initialized

How to close DDP?

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

I think this is the last bug before I can train YOLOF...

@chensnathan
Copy link
Owner

chensnathan commented Mar 26, 2021

Comment out the lines with dist in the yolof.py file.

BTW, when you train with only one GPU, you should adjust the learning rate and batch size. Refer to this response.

@chensnathan
Copy link
Owner

Support training with one GPU in this commit.

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

Thanks !
[03/26 09:10:09 d2.engine.hooks]: Overall training speed: 57 iterations in 0:00:18 (0.3159 s / it)
[03/26 09:10:09 d2.engine.hooks]: Total training time: 0:00:18 (0:00:00 on hooks)
[03/26 09:10:09 d2.utils.events]: eta: 2:01:11 iter: 59 total_loss: 2.067 loss_cls: 1.342 loss_box_reg: 0.7438 time: 0.3139 data_time: 0.0022 lr: 3.9308e-06 max_mem: 1076M
Traceback (most recent call last):
File "tools/train_net.py", line 234, in
args=(args,),
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "tools/train_net.py", line 221, in main
return trainer.train()
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 431, in train
super().train(self.start_iter, self.max_iter)
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 140, in train
self.run_step()
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 234, in run_step
loss_dict = self.model(data)
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 295, in forward
pred_logits, pred_anchor_deltas)
File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 397, in losses
pred_class_logits[valid_idxs],
RuntimeError: CUDA error: device-side assert triggered

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

真的很抱歉,作者,由于我的水平太低,这又出现了新的bug。请问这个应该怎么改?

@chensnathan
Copy link
Owner

Could you give more details about what command you use?

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021

command :
CUDA_VISIBLE_DEVICES=1 python tools/train_net.py --num-gpus 1 --config-file ./configs/yolof_R_50_C5_1x.yaml

yaml:

MODEL:
META_ARCHITECTURE: "YOLOF"
BACKBONE:
NAME: "build_resnet_backbone"
RESNETS:
OUT_FEATURES: ["res5"]
DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)
DATALOADER:
#NUM_WORKERS: 8
NUM_WORKERS: 4
SOLVER:
#IMS_PER_BATCH: 64
IMS_PER_BATCH: 2
#BASE_LR: 0.12
BASE_LR: 0.00001
WARMUP_FACTOR: 0.00066667
WARMUP_ITERS: 1500
#STEPS: (15000, 20000)
STEPS: (480000, 640000)
#MAX_ITER: 22500
MAX_ITER: 720000
CHECKPOINT_PERIOD: 2500
INPUT:
MIN_SIZE_TRAIN: (800,)

@chensnathan
Copy link
Owner

Can you try this setting?

IMS_PER_BATCH: 8
BASE_LR: 0.03
WARMUP_FACTOR: 0.00066667
WARMUP_ITERS: 1500
STEPS: (120000, 160000)
MAX_ITER: 180000

@Wei-i
Copy link
Author

Wei-i commented Mar 26, 2021 via email

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

Good Morning! When I tried your setting, it stiill remains the same bug as:

[03/26 17:38:53 d2.engine.train_loop]: Starting training from iteration 0

/opt/conda/conda-bld/pytorch_1607370169888/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [11,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

ERROR [03/26 17:39:00 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 140, in train
self.run_step()
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 234, in run_step
loss_dict = self.model(data)
File "/home/cw/miniconda3/envs/yolof/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 295, in forward
pred_logits, pred_anchor_deltas) File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 397, in losses
pred_class_logits[valid_idxs],
RuntimeError: CUDA error: device-side assert triggered

@chensnathan
Copy link
Owner

Sorry, the BASE_LR should be 0.015. But I can run with one GPU with an initial learning rate of both 0.03 and 0.015. I can not reproduce your error on my side.

Try to warm up more iterations, e.g.,

WARMUP_FACTOR: 0.0002
WARMUP_ITERS: 5000

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

Thanks. It stil does not work...

@chensnathan
Copy link
Owner

Could you upload your training log file?

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

[03/26 21:44:54] detectron2 INFO: Rank of current process: 0. World size: 2
[03/26 21:44:55] detectron2 INFO: Environment info:


sys.platform linux
Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0]
numpy 1.19.1
detectron2 0.4 @/home/cw/detectron2/detectron2
Compiler GCC 5.4
CUDA compiler CUDA 9.0
detectron2 arch flags 6.1
DETECTRON2_ENV_MODULE
PyTorch 1.6.0 @/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0,1,2 GeForce GTX 1080 Ti (arch=6.1)
CUDA_HOME /usr/local/cuda-9.0
Pillow 7.2.0
torchvision 0.7.0 @/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0
fvcore 0.1.5.post20210327
iopath 0.1.7
cv2 4.4.0


PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 9.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

[03/26 21:44:55] detectron2 INFO: Command line arguments: Namespace(config_file='./configs/yolof_R_50_C5_1x.yaml', dist_url='tcp://127.0.0.1:50159', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=['OUTPUT_DIR', '/hdd2/wh/cw/train/yolof/R_50_C5_1x/'], resume=False)
[03/26 21:44:55] detectron2 INFO: Contents of args.config_file=./configs/yolof_R_50_C5_1x.yaml:
BASE: "Base-YOLOF.yaml"
MODEL:
WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
RESNETS:
DEPTH: 50
OUTPUT_DIR: "output/yolof/R_50_C5_1x"

[03/26 21:44:55] detectron2 INFO: Running with full config:
CUDNN_BENCHMARK: False
DATALOADER:
ASPECT_RATIO_GROUPING: True
FILTER_EMPTY_ANNOTATIONS: True
NUM_WORKERS: 8
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: ()
PROPOSAL_FILES_TRAIN: ()
TEST: ('coco_2017_val',)
TRAIN: ('coco_2017_train',)
GLOBAL:
HACK: 1.0
INPUT:
CROP:
ENABLED: False
SIZE: [0.9, 0.9]
TYPE: relative_range
DISTORTION:
ENABLED: False
EXPOSURE: 1.5
HUE: 0.1
SATURATION: 1.5
FORMAT: BGR
JITTER_CROP:
ENABLED: False
JITTER_RATIO: 0.3
MASK_FORMAT: polygon
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: (800,)
MIN_SIZE_TRAIN_SAMPLING: choice
MOSAIC:
ENABLED: False
MIN_OFFSET: 0.2
MOSAIC_HEIGHT: 640
MOSAIC_WIDTH: 640
NUM_IMAGES: 4
POOL_CAPACITY: 1000
RANDOM_FLIP: horizontal
RESIZE:
ENABLED: False
SCALE_JITTER: (0.8, 1.2)
SHAPE: (640, 640)
TEST_SHAPE: (608, 608)
SHIFT:
SHIFT_PIXELS: 32
MODEL:
ANCHOR_GENERATOR:
ANGLES: [[-90, 0, 90]]
ASPECT_RATIOS: [[1.0]]
NAME: DefaultAnchorGenerator
OFFSET: 0.0
SIZES: [[32, 64, 128, 256, 512]]
BACKBONE:
FREEZE_AT: 2
NAME: build_resnet_backbone
DARKNET:
DEPTH: 53
NORM: BN
OUT_FEATURES: ['res5']
RES5_DILATION: 1
WITH_CSP: True
DEVICE: cuda
FPN:
FUSE_TYPE: sum
IN_FEATURES: []
NORM:
OUT_CHANNELS: 256
KEYPOINT_ON: False
LOAD_PROPOSALS: False
MASK_ON: False
META_ARCHITECTURE: YOLOF
PANOPTIC_FPN:
COMBINE:
ENABLED: True
INSTANCES_CONFIDENCE_THRESH: 0.5
OVERLAP_THRESH: 0.5
STUFF_AREA_LIMIT: 4096
INSTANCE_LOSS_WEIGHT: 1.0
PIXEL_MEAN: [103.53, 116.28, 123.675]
PIXEL_STD: [1.0, 1.0, 1.0]
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
RESNETS:
DEFORM_MODULATED: False
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE: [False, False, False, False]
DEPTH: 50
NORM: FrozenBN
NUM_GROUPS: 1
OUT_FEATURES: ['res5']
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: True
WIDTH_PER_GROUP: 64
RETINANET:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7']
IOU_LABELS: [0, -1, 1]
IOU_THRESHOLDS: [0.4, 0.5]
NMS_THRESH_TEST: 0.5
NORM:
NUM_CLASSES: 80
NUM_CONVS: 4
PRIOR_PROB: 0.01
SCORE_THRESH_TEST: 0.05
SMOOTH_L1_LOSS_BETA: 0.1
TOPK_CANDIDATES_TEST: 1000
ROI_BOX_CASCADE_HEAD:
BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0))
IOUS: (0.5, 0.6, 0.7)
ROI_BOX_HEAD:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
CLS_AGNOSTIC_BBOX_REG: False
CONV_DIM: 256
FC_DIM: 1024
NAME:
NORM:
NUM_CONV: 0
NUM_FC: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
SMOOTH_L1_BETA: 0.0
TRAIN_ON_PRED_BOXES: False
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
IN_FEATURES: ['res4']
IOU_LABELS: [0, 1]
IOU_THRESHOLDS: [0.5]
NAME: Res5ROIHeads
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 80
POSITIVE_FRACTION: 0.25
PROPOSAL_APPEND_GT: True
SCORE_THRESH_TEST: 0.05
ROI_KEYPOINT_HEAD:
CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512)
LOSS_WEIGHT: 1.0
MIN_KEYPOINTS_PER_IMAGE: 1
NAME: KRCNNConvDeconvUpsampleHead
NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True
NUM_KEYPOINTS: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
ROI_MASK_HEAD:
CLS_AGNOSTIC_MASK: False
CONV_DIM: 256
NAME: MaskRCNNConvUpsampleHead
NORM:
NUM_CONV: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
RPN:
BATCH_SIZE_PER_IMAGE: 256
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
BOUNDARY_THRESH: -1
HEAD_NAME: StandardRPNHead
IN_FEATURES: ['res4']
IOU_LABELS: [0, -1, 1]
IOU_THRESHOLDS: [0.3, 0.7]
LOSS_WEIGHT: 1.0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
PRE_NMS_TOPK_TEST: 6000
PRE_NMS_TOPK_TRAIN: 12000
SMOOTH_L1_BETA: 0.0
SEM_SEG_HEAD:
COMMON_STRIDE: 4
CONVS_DIM: 128
IGNORE_VALUE: 255
IN_FEATURES: ['p2', 'p3', 'p4', 'p5']
LOSS_WEIGHT: 1.0
NAME: SemSegFPNHead
NORM: GN
NUM_CLASSES: 54
WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl
YOLOF:
BOX_TRANSFORM:
ADD_CTR_CLAMP: True
BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
CTR_CLAMP: 32
DECODER:
ACTIVATION: ReLU
CLS_NUM_CONVS: 2
IN_CHANNELS: 512
NORM: BN
NUM_ANCHORS: 5
NUM_CLASSES: 80
PRIOR_PROB: 0.01
REG_NUM_CONVS: 4
DETECTIONS_PER_IMAGE: 100
ENCODER:
ACTIVATION: ReLU
BACKBONE_LEVEL: res5
BLOCK_DILATIONS: [2, 4, 6, 8]
BLOCK_MID_CHANNELS: 128
IN_CHANNELS: 2048
NORM: BN
NUM_CHANNELS: 512
NUM_RESIDUAL_BLOCKS: 4
LOSSES:
BBOX_REG_LOSS_TYPE: giou
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
MATCHER:
TOPK: 4
NEG_IGNORE_THRESHOLD: 0.7
NMS_THRESH_TEST: 0.6
POS_IGNORE_THRESHOLD: 0.15
SCORE_THRESH_TEST: 0.05
TOPK_CANDIDATES_TEST: 1000
OUTPUT_DIR: /hdd2/wh/cw/train/yolof/R_50_C5_1x/
SEED: -1
SOLVER:
AMP:
ENABLED: False
BACKBONE_MULTIPLIER: 0.334
BASE_LR: 0.003
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 2500
CLIP_GRADIENTS:
CLIP_TYPE: value
CLIP_VALUE: 1.0
ENABLED: False
NORM_TYPE: 2.0
GAMMA: 0.1
IMS_PER_BATCH: 16
LR_SCHEDULER_NAME: WarmupMultiStepLR
MAX_ITER: 90000
MOMENTUM: 0.9
NESTEROV: False
REFERENCE_WORLD_SIZE: 0
STEPS: (60000, 80000)
WARMUP_FACTOR: 0.0002
WARMUP_ITERS: 5000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0.0001
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: False
FLIP: True
MAX_SIZE: 4000
MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200)
DETECTIONS_PER_IMAGE: 100
EVAL_PERIOD: 0
EXPECTED_RESULTS: []
KEYPOINT_OKS_SIGMAS: []
PRECISE_BN:
ENABLED: False
NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
[03/26 21:44:55] detectron2 INFO: Full config saved to /hdd2/wh/cw/train/yolof/R_50_C5_1x/config.yaml
[03/26 21:44:55] d2.utils.env INFO: Using a generated random seed 55931624
[03/26 21:44:56] d2.engine.defaults INFO: Model:
YOLOF(
(backbone): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
)
(encoder): DilatedEncoder(
(lateral_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
(lateral_norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(fpn_conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(dilated_encoder_blocks): Sequential(
(0): Bottleneck(
(conv1): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
)
(1): Bottleneck(
(conv1): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
)
(2): Bottleneck(
(conv1): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
)
(3): Bottleneck(
(conv1): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(8, 8), dilation=(8, 8))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
)
)
)
(decoder): Decoder(
(cls_subnet): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
(bbox_subnet): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace=True)
(9): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(10): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): ReLU(inplace=True)
)
(cls_score): Conv2d(512, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(bbox_pred): Conv2d(512, 20, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(object_pred): Conv2d(512, 5, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(anchor_generator): DefaultAnchorGenerator(
(cell_anchors): BufferList()
)
(anchor_matcher): UniformMatcher()
)
[03/26 21:45:18] d2.data.datasets.coco INFO: Loading datasets/coco/annotations/instances_train2017.json takes 21.19 seconds.
[03/26 21:45:19] d2.data.datasets.coco INFO: Loaded 118287 images in COCO format from datasets/coco/annotations/instances_train2017.json
[03/26 21:45:31] d2.data.build INFO: Removed 1021 images with no usable annotations. 117266 images left.
[03/26 21:45:38] d2.data.build INFO: Distribution of instances among all 80 categories:
�[36m| category | #instances | category | #instances | category | #instances |
|:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------|
| person | 257253 | bicycle | 7056 | car | 43533 |
| motorcycle | 8654 | airplane | 5129 | bus | 6061 |
| train | 4570 | truck | 9970 | boat | 10576 |
| traffic light | 12842 | fire hydrant | 1865 | stop sign | 1983 |
| parking meter | 1283 | bench | 9820 | bird | 10542 |
| cat | 4766 | dog | 5500 | horse | 6567 |
| sheep | 9223 | cow | 8014 | elephant | 5484 |
| bear | 1294 | zebra | 5269 | giraffe | 5128 |
| backpack | 8714 | umbrella | 11265 | handbag | 12342 |
| tie | 6448 | suitcase | 6112 | frisbee | 2681 |
| skis | 6623 | snowboard | 2681 | sports ball | 6299 |
| kite | 8802 | baseball bat | 3273 | baseball gl.. | 3747 |
| skateboard | 5536 | surfboard | 6095 | tennis racket | 4807 |
| bottle | 24070 | wine glass | 7839 | cup | 20574 |
| fork | 5474 | knife | 7760 | spoon | 6159 |
| bowl | 14323 | banana | 9195 | apple | 5776 |
| sandwich | 4356 | orange | 6302 | broccoli | 7261 |
| carrot | 7758 | hot dog | 2884 | pizza | 5807 |
| donut | 7005 | cake | 6296 | chair | 38073 |
| couch | 5779 | potted plant | 8631 | bed | 4192 |
| dining table | 15695 | toilet | 4149 | tv | 5803 |
| laptop | 4960 | mouse | 2261 | remote | 5700 |
| keyboard | 2854 | cell phone | 6422 | microwave | 1672 |
| oven | 3334 | toaster | 225 | sink | 5609 |
| refrigerator | 2634 | book | 24077 | clock | 6320 |
| vase | 6577 | scissors | 1464 | teddy bear | 4729 |
| hair drier | 198 | toothbrush | 1945 | | |
| total | 849949 | | | | |�[0m
[03/26 21:45:38] d2.data.build INFO: Using training sampler TrainingSampler
[03/26 21:45:40] d2.data.common INFO: Serializing 117266 elements to byte tensors and concatenating them all ...
[03/26 21:45:46] d2.data.common INFO: Serialized dataset takes 451.21 MiB
[03/26 21:45:54] fvcore.common.checkpoint INFO: Loading checkpoint from detectron2://ImageNetPretrained/MSRA/R-50.pkl
[03/26 21:45:54] d2.checkpoint.c2_model_loading INFO: Renaming Caffe2 weights ......
[03/26 21:45:54] d2.checkpoint.c2_model_loading INFO: Following weights matched with submodule backbone:
| Names in Model | Names in Checkpoint | Shapes |
|:------------------|:-------------------------|:------------------------------------------------|
| res2.0.conv1.* | res2_0_branch2a_{bn_,w} | (64,) (64,) (64,) (64,) (64,64,1,1) |
| res2.0.conv2.
| res2_0_branch2b_{bn_,w} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.0.conv3.
| res2_0_branch2c_{bn_,w} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res2.0.shortcut.
| res2_0_branch1_{bn_,w} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res2.1.conv1.
| res2_1_branch2a_{bn_,w} | (64,) (64,) (64,) (64,) (64,256,1,1) |
| res2.1.conv2.
| res2_1_branch2b_{bn_,w} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.1.conv3.
| res2_1_branch2c_{bn_,w} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res2.2.conv1.
| res2_2_branch2a_{bn_,w} | (64,) (64,) (64,) (64,) (64,256,1,1) |
| res2.2.conv2.
| res2_2_branch2b_{bn_,w} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.2.conv3.
| res2_2_branch2c_{bn_,w} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res3.0.conv1.
| res3_0_branch2a_{bn_,w} | (128,) (128,) (128,) (128,) (128,256,1,1) |
| res3.0.conv2.
| res3_0_branch2b_{bn_,w} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.0.conv3.
| res3_0_branch2c_{bn_,w} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res3.0.shortcut.
| res3_0_branch1_{bn_,w} | (512,) (512,) (512,) (512,) (512,256,1,1) |
| res3.1.conv1.
| res3_1_branch2a_{bn_,w} | (128,) (128,) (128,) (128,) (128,512,1,1) |
| res3.1.conv2.
| res3_1_branch2b_{bn_,w} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.1.conv3.
| res3_1_branch2c_{bn_,w} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res3.2.conv1.
| res3_2_branch2a_{bn_,w} | (128,) (128,) (128,) (128,) (128,512,1,1) |
| res3.2.conv2.
| res3_2_branch2b_{bn_,w} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.2.conv3.
| res3_2_branch2c_{bn_,w} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res3.3.conv1.
| res3_3_branch2a_{bn_,w} | (128,) (128,) (128,) (128,) (128,512,1,1) |
| res3.3.conv2.
| res3_3_branch2b_{bn_,w} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.3.conv3.
| res3_3_branch2c_{bn_,w} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res4.0.conv1.
| res4_0_branch2a_{bn_,w} | (256,) (256,) (256,) (256,) (256,512,1,1) |
| res4.0.conv2.
| res4_0_branch2b_{bn_,w} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.0.conv3.
| res4_0_branch2c_{bn_,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.0.shortcut.
| res4_0_branch1_{bn_,w} | (1024,) (1024,) (1024,) (1024,) (1024,512,1,1) |
| res4.1.conv1.
| res4_1_branch2a_{bn_,w} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.1.conv2.
| res4_1_branch2b_{bn_,w} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.1.conv3.
| res4_1_branch2c_{bn_,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.2.conv1.
| res4_2_branch2a_{bn_,w} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.2.conv2.
| res4_2_branch2b_{bn_,w} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.2.conv3.
| res4_2_branch2c_{bn_,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.3.conv1.
| res4_3_branch2a_{bn_,w} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.3.conv2.
| res4_3_branch2b_{bn_,w} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.3.conv3.
| res4_3_branch2c_{bn_,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.4.conv1.
| res4_4_branch2a_{bn_,w} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.4.conv2.
| res4_4_branch2b_{bn_,w} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.4.conv3.
| res4_4_branch2c_{bn_,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.5.conv1.
| res4_5_branch2a_{bn_,w} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.5.conv2.
| res4_5_branch2b_{bn_,w} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.5.conv3.
| res4_5_branch2c_{bn_,w} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res5.0.conv1.
| res5_0_branch2a_{bn_,w} | (512,) (512,) (512,) (512,) (512,1024,1,1) |
| res5.0.conv2.
| res5_0_branch2b_{bn_,w} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.0.conv3.
| res5_0_branch2c_{bn_,w} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1) |
| res5.0.shortcut.
| res5_0_branch1_{bn_,w} | (2048,) (2048,) (2048,) (2048,) (2048,1024,1,1) |
| res5.1.conv1.
| res5_1_branch2a_{bn_,w} | (512,) (512,) (512,) (512,) (512,2048,1,1) |
| res5.1.conv2.
| res5_1_branch2b_{bn_,w} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.1.conv3.
| res5_1_branch2c_{bn_,w} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1) |
| res5.2.conv1.
| res5_2_branch2a_{bn_,w} | (512,) (512,) (512,) (512,) (512,2048,1,1) |
| res5.2.conv2.
| res5_2_branch2b_{bn_,w} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.2.conv3.
| res5_2_branch2c_{bn_,w} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1) |
| stem.conv1.norm.
| res_conv1_bn_* | (64,) (64,) (64,) (64,) |
| stem.conv1.weight | conv1_w | (64, 3, 7, 7) |
[03/26 21:45:54] fvcore.common.checkpoint INFO: Some model parameters or buffers are not found in the checkpoint:
�[34manchor_generator.cell_anchors.0�[0m
�[34mdecoder.bbox_pred.{bias, weight}�[0m
�[34mdecoder.bbox_subnet.0.{bias, weight}�[0m
�[34mdecoder.bbox_subnet.1.{bias, running_mean, running_var, weight}�[0m
�[34mdecoder.bbox_subnet.10.{bias, running_mean, running_var, weight}�[0m
�[34mdecoder.bbox_subnet.3.{bias, weight}�[0m
�[34mdecoder.bbox_subnet.4.{bias, running_mean, running_var, weight}�[0m
�[34mdecoder.bbox_subnet.6.{bias, weight}�[0m
�[34mdecoder.bbox_subnet.7.{bias, running_mean, running_var, weight}�[0m
�[34mdecoder.bbox_subnet.9.{bias, weight}�[0m
�[34mdecoder.cls_score.{bias, weight}�[0m
�[34mdecoder.cls_subnet.0.{bias, weight}�[0m
�[34mdecoder.cls_subnet.1.{bias, running_mean, running_var, weight}�[0m
�[34mdecoder.cls_subnet.3.{bias, weight}�[0m
�[34mdecoder.cls_subnet.4.{bias, running_mean, running_var, weight}�[0m
�[34mdecoder.object_pred.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.0.conv1.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.0.conv1.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.0.conv2.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.0.conv2.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.0.conv3.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.0.conv3.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.1.conv1.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.1.conv1.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.1.conv2.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.1.conv2.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.1.conv3.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.1.conv3.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.2.conv1.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.2.conv1.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.2.conv2.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.2.conv2.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.2.conv3.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.2.conv3.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.3.conv1.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.3.conv1.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.3.conv2.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.3.conv2.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.dilated_encoder_blocks.3.conv3.0.{bias, weight}�[0m
�[34mencoder.dilated_encoder_blocks.3.conv3.1.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.fpn_conv.{bias, weight}�[0m
�[34mencoder.fpn_norm.{bias, running_mean, running_var, weight}�[0m
�[34mencoder.lateral_conv.{bias, weight}�[0m
�[34mencoder.lateral_norm.{bias, running_mean, running_var, weight}�[0m
[03/26 21:45:54] fvcore.common.checkpoint INFO: The checkpoint state_dict contains keys that are not used by the model:
�[35mfc1000.{bias, weight}�[0m
�[35mstem.conv1.bias�[0m
[03/26 21:45:54] d2.engine.train_loop INFO: Starting training from iteration 0

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

[03/26 21:45:54 d2.engine.train_loop]: Starting training from iteration 0
/opt/conda/conda-bld/pytorch_1595629408163/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [120,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1595629408163/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [61,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1595629408163/work/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f4c7325377d in /home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xb5d (0x7f4c734a3d9d in /home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f4c7323fb1d in /home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x53f0ea (0x7f4ca5ba30ea in /home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x1809da (0x5589712629da in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #5: + 0xfc039 (0x5589711de039 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #6: + 0xfa678 (0x5589711dc678 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #7: + 0xfa938 (0x5589711dc938 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #8: + 0xfa938 (0x5589711dc938 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #9: + 0xfa348 (0x5589711dc348 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #10: + 0xfadd8 (0x5589711dcdd8 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #11: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #12: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #13: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #14: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #15: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #16: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #17: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #18: + 0xfadec (0x5589711dcdec in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #19: + 0xfb238 (0x5589711dd238 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #20: + 0xfb2db (0x5589711dd2db in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #21: + 0x1dc923 (0x5589712be923 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x27b8 (0x5589712a6ea8 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #23: _PyFunction_FastCallKeywords + 0x187 (0x55897121a767 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #24: + 0x17f335 (0x558971261335 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x611 (0x5589712a4d01 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #26: _PyFunction_FastCallKeywords + 0x187 (0x55897121a767 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x3f5 (0x5589712a4ae5 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #28: _PyEval_EvalCodeWithName + 0x252 (0x5589711fadb2 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #29: _PyFunction_FastCallKeywords + 0x583 (0x55897121ab63 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #30: + 0x17f335 (0x558971261335 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x13fe (0x5589712a5aee in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #32: _PyEval_EvalCodeWithName + 0x252 (0x5589711fadb2 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #33: PyEval_EvalCode + 0x23 (0x5589711fc1e3 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #34: + 0x2271d2 (0x5589713091d2 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #35: PyRun_StringFlags + 0x7a (0x55897131417a in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #36: PyRun_SimpleStringFlags + 0x3c (0x5589713141dc in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #37: + 0x2322d9 (0x5589713142d9 in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #38: _Py_UnixMain + 0x3c (0x55897131467c in /home/cw/miniconda3/envs/py_dt2/bin/python)
frame #39: __libc_start_main + 0xf0 (0x7f4cbbde3830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #40: + 0x1d7101 (0x5589712b9101 in /home/cw/miniconda3/envs/py_dt2/bin/python)

Traceback (most recent call last):
File "./tools/train_net.py", line 234, in
args=(args,),
File "/home/cw/detectron2/detectron2/engine/launch.py", line 59, in launch
daemon=False,
File "/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/cw/detectron2/detectron2/engine/launch.py", line 94, in _distributed_worker
main_func(*args)
File "/home/cw/YOLOF/tools/train_net.py", line 221, in main
return trainer.train()
File "/home/cw/detectron2/detectron2/engine/defaults.py", line 431, in train
super().train(self.start_iter, self.max_iter)
File "/home/cw/detectron2/detectron2/engine/train_loop.py", line 140, in train
self.run_step()
File "/home/cw/detectron2/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/cw/detectron2/detectron2/engine/train_loop.py", line 234, in run_step
loss_dict = self.model(data)
File "/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/cw/miniconda3/envs/py_dt2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 295, in forward
pred_logits, pred_anchor_deltas)
File "/home/cw/YOLOF/yolof/modeling/yolof.py", line 394, in losses
pred_class_logits[valid_idxs],
RuntimeError: copy_if failed to synchronize: device-side assert triggered

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

command :
python ./tools/train_net.py --num-gpus 2 --config-file ./configs/yolof_R_50_C5_1x.yaml OUTPUT_DIR /hdd2/wh/cw/train/yolof/R_50_C5_1x/

yaml:
`
MODEL:
META_ARCHITECTURE: "YOLOF"
BACKBONE:
NAME: "build_resnet_backbone"
RESNETS:
OUT_FEATURES: ["res5"]
DATASETS:
TRAIN: ("coco_2017_train",)
TEST: ("coco_2017_val",)
DATALOADER:
NUM_WORKERS: 8
SOLVER:
#IMS_PER_BATCH: 64
IMS_PER_BATCH: 16
#BASE_LR: 0.12
BASE_LR: 0.03
WARMUP_FACTOR: 0.0002 # 0.00066667
WARMUP_ITERS: 5000 # 1500
#STEPS: (15000, 20000)
STEPS: (60000, 80000)
#MAX_ITER: 22500
MAX_ITER: 90000
CHECKPOINT_PERIOD: 2500
INPUT:
MIN_SIZE_TRAIN: (800,)

OUTPUT_DIR: '/hdd2/wh/cw/train/yolof/R_50_C5_1x'
`

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

作者您好,我粗浅的认为是跟数据有关的错误,可能是发生了数组越界等? 但我的数据集是coco2017,不应该有这个错误才对,其余的我也没有更改了,我原本的detectron2也重新升级了一下。
我发现部分输入的图片是没有错误的,网络还可以迭代几次,打印loss

@chensnathan
Copy link
Owner

看错误是在取index的时候越界了,但是很奇怪,我之前跑过这么多次都没有遇到过这个问题。log文件一眼看过去也并没有找到明显不对的地方,感觉没啥道理。。。你这个是每次一跑,必然会出现这个错误嘛?

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

pred_class_logits[valid_idxs].size()
是的每次一跑必定越界..... 我在yolof.py错误的地方调试了一下

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

print("gt_classes >= 0", gt_classes[gt_classes >= 0].size())

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

1.

gt_class.size() torch.Size([38000])
gt_classes >= 0 torch.Size([37818])
valid_idxs torch.Size([38000])
pred_class_logits.size() torch.Size([38000, 80])
pred_class_logits[valid_idxs] torch.Size([37818, 80])

2.

gt_class.size() torch.Size([42000])
gt_classes >= 0 torch.Size([41866])
valid_idxs torch.Size([42000])
pred_class_logits.size() torch.Size([42000, 80])
pred_class_logits[valid_idxs] torch.Size([41866, 80])

3.

gt_class.size() torch.Size([38000])
没输出print("gt_classes >= 0", gt_classes[gt_classes >= 0].size())

/opt/conda/conda-bld/pytorch_1595629408163/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [9,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1595629408163/work/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [11,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

@chensnathan
Copy link
Owner

感觉这里不应该有错,建议用

CUDA_LAUNCH_BLOCKING=1 python ./tools/train_net.py --num-gpus 2 --config-file ./configs/yolof_R_50_C5_1x.yaml OUTPUT_DIR /hdd2/wh/cw/train/yolof/R_50_C5_1x/

看看到底哪里出错了。
或者换个机器重新配一下环境,跑一下试试,按理来说能直接跑才对

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

还是不行,可能只能用其他机器试一下, 我不确定我的cuda 9.0是否会对这个有影响 。

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

作者您好 我换了一台cuda 10.1的然后重复了我之前的操作,(基本就是直接安装了,数据集也是直接从原来的服务器上传输的)。
然后就没问题了。。。奇葩 我暂时只能归咎于是我之前cuda的9.0,cudatoolkits 9.2不好适配您的代码?
(我跑adelaidet还是没问题的,本来大半年没更新了 今天更新了下 简要修改了train.py没啥问题hhh)

@Wei-i
Copy link
Author

Wei-i commented Mar 27, 2021

另外有一个小地方,我觉得您可以考虑修改下 。
就是您建议安装的mish-cuda,好像不能够直接 build
After git clone,you should movemish-cuda/external/CUDAApplyUtils.cuh to csrc/ before python setup.py build install

link-issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants