You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I trained on the VisDrone dataset, the error occurred like: cuda out of memory. Is the image size too large? But I have changed the input size to [511, 511]. Do you guys know how to fix it? PLZ help me, thanks!
loading all datasets...
using 4 threads
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.43s)
creating index...
index created!
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.59s)
creating index...
index created!
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.30s)
creating index...
index created!
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.47s)
creating index...
index created!
loading from cache file: cache/visdrone_val.pkl
loading annotations into memory...
Done (t=0.10s)
creating index...
index created!
system config...
{'batch_size': 8,
'cache_dir': 'cache',
'chunk_sizes': [2, 2, 2, 2],
'config_dir': 'config',
'data_dir': '/home/by/data',
'data_rng': <mtrand.RandomState object at 0x7f7f342904c8>,
'dataset': 'Visdrone',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f7f34290510>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'VisDrone2019-DET-test-dev',
'train_split': 'VisDrone2019-DET-train',
'val_iter': 500,
'val_split': 'VisDrone2019-DET-val',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 10,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 6471
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/by/s/pytorch/DET/CenterNet/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/by/s/pytorch/DET/CenterNet/models/py_utils/data_parallel.py", line 69, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/by/s/pytorch/DET/CenterNet/models/py_utils/data_parallel.py", line 74, in replicate
return replicate(module, device_ids)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, params)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: CUDA error: out of memory (allocate at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/THCCachingAllocator.cpp:510)
frame #0: THCStorage_resize + 0x123 (0x7f7f48085783 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #1: THCTensor_resizeNd + 0x30f (0x7f7f4809341f in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: THCudaTensor_newWithStorage + 0xfa (0x7f7f480998fa in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: at::CUDAFloatType::th_tensor(at::ArrayRef) const + 0xa5 (0x7f7f47fb99d5 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::tensor(at::Type const&, at::ArrayRef) + 0x3a (0x7f7f640d17da in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::Type::tensor(at::ArrayRef) const + 0x9 (0x7f7f642bfb69 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: torch::autograd::VariableType::tensor(at::ArrayRef) const + 0x44 (0x7f7f65f38ea4 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: torch::cuda::broadcast(at::Tensor const&, at::ArrayRef) + 0x194 (0x7f7f663eaf64 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: torch::cuda::broadcast_coalesced(at::ArrayRefat::Tensor, at::ArrayRef, unsigned long) + 0xa10 (0x7f7f663ec200 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #9: + 0xc4256b (0x7f7f663f056b in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: + 0x38a52b (0x7f7f65b3852b in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: THPFunction_apply(_object, _object) + 0x38f (0x7f7f65f16bcf in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #63: __libc_start_main + 0xf0 (0x7f7f79b2b830 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered:
When I trained on the VisDrone dataset, the error occurred like: cuda out of memory. Is the image size too large? But I have changed the input size to [511, 511]. Do you guys know how to fix it? PLZ help me, thanks!
loading all datasets...
using 4 threads
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.43s)
creating index...
index created!
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.59s)
creating index...
index created!
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.30s)
creating index...
index created!
loading from cache file: cache/visdrone_train.pkl
loading annotations into memory...
Done (t=1.47s)
creating index...
index created!
loading from cache file: cache/visdrone_val.pkl
loading annotations into memory...
Done (t=0.10s)
creating index...
index created!
system config...
{'batch_size': 8,
'cache_dir': 'cache',
'chunk_sizes': [2, 2, 2, 2],
'config_dir': 'config',
'data_dir': '/home/by/data',
'data_rng': <mtrand.RandomState object at 0x7f7f342904c8>,
'dataset': 'Visdrone',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f7f34290510>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'VisDrone2019-DET-test-dev',
'train_split': 'VisDrone2019-DET-train',
'val_iter': 500,
'val_split': 'VisDrone2019-DET-val',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 10,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 6471
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/by/s/pytorch/DET/CenterNet/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/by/s/pytorch/DET/CenterNet/models/py_utils/data_parallel.py", line 69, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/by/s/pytorch/DET/CenterNet/models/py_utils/data_parallel.py", line 74, in replicate
return replicate(module, device_ids)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, params)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: CUDA error: out of memory (allocate at /opt/conda/conda-bld/pytorch_1532581333611/work/aten/src/THC/THCCachingAllocator.cpp:510)
frame #0: THCStorage_resize + 0x123 (0x7f7f48085783 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #1: THCTensor_resizeNd + 0x30f (0x7f7f4809341f in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: THCudaTensor_newWithStorage + 0xfa (0x7f7f480998fa in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: at::CUDAFloatType::th_tensor(at::ArrayRef) const + 0xa5 (0x7f7f47fb99d5 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::tensor(at::Type const&, at::ArrayRef) + 0x3a (0x7f7f640d17da in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::Type::tensor(at::ArrayRef) const + 0x9 (0x7f7f642bfb69 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: torch::autograd::VariableType::tensor(at::ArrayRef) const + 0x44 (0x7f7f65f38ea4 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: torch::cuda::broadcast(at::Tensor const&, at::ArrayRef) + 0x194 (0x7f7f663eaf64 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: torch::cuda::broadcast_coalesced(at::ArrayRefat::Tensor, at::ArrayRef, unsigned long) + 0xa10 (0x7f7f663ec200 in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #9: + 0xc4256b (0x7f7f663f056b in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: + 0x38a52b (0x7f7f65b3852b in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: THPFunction_apply(_object, _object) + 0x38f (0x7f7f65f16bcf in /home/by/APP/anaconda2/envs/CenterNet/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #63: __libc_start_main + 0xf0 (0x7f7f79b2b830 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered: