root@5285e769b96f:/home/imaginaire# python -m torch.distributed.launch --nproc_per_node=8 train.py --config configs/projects/vid2vid/kitti/ampO1.yaml --logdir /home/logs ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. Make folder /home/logs cudnn benchmark: True cudnn deterministic: False LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. LMDB file at datasets/kitti/lmdb/train/images opened. LMDB file at datasets/kitti/lmdb/train/seg_maps opened. Num datasets: 1 Num sequences: 19 Max sequence length: 1065 Epoch length: 19 LMDB file at datasets/kitti/lmdb/val/images opened. LMDB file at datasets/kitti/lmdb/val/seg_maps opened. Num datasets: 1 Num sequences: 8 Max sequence length: 83 Epoch length: 8 Train dataset length: 19 Val dataset length: 8 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Concatenate seg_maps: ext: png num_channels: 35 interpolator: NEAREST normalize: False pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input label: 35 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input image: 3 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input image: 3 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Concatenate seg_maps: ext: png num_channels: 35 interpolator: NEAREST normalize: False pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input label: 35 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input image: 3 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input image: 3 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Concatenate seg_maps: ext: png num_channels: 35 interpolator: NEAREST normalize: False pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input label: 35 Concatenate images: ext: png num_channels: 3 interpolator: BILINEAR normalize: True pre_aug_ops: None post_aug_ops: None use_dont_care: False computed_on_the_fly: False for input. Num. of channels in the input image: 3 Initialize net_G and net_D weights using type: xavier gain: 0.02 net_G parameter count: 121,904,182 net_D parameter count: 1,422,658 Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods. Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Setup trainer. GAN mode: hinge Perceptual loss: Mode: vgg19 Perceptual loss is evaluated in the fp16 mode. FlowNet2 is running in fp16 mode. FlowNet2 is running in fp16 mode. FlowNet2 is running in fp16 mode. FlowNet2 is running in fp16 mode. FlowNet2 is running in fp16 mode. FlowNet2 is running in fp16 mode. FlowNet2 is running in fp16 mode. FlowNet2 is running in fp16 mode. Loss GAN Weight 1.0 Loss FeatureMatching Weight 10.0 Loss Perceptual Weight 10.0 Loss Flow Weight 10.0 Loss Flow_L1 Weight 10.0 Loss Flow_Warp Weight 10.0 Loss Flow_Mask Weight 10.0 Load from: /home/logs/epoch_00025_iteration_000000075_checkpoint.pt Done with loading the checkpoint. Epoch 25 ... Epoch length: 19 ------- Updating sequence length to 8 ------- Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2048.0 Epoch: 26, total time: 75.287018. Epoch 26 ... Epoch: 27, total time: 44.580074. Epoch 27 ... Epoch: 28, total time: 43.974533. Epoch 28 ... Epoch: 29, total time: 43.974683. Epoch 29 ... Epoch: 30, total time: 43.891855. Save output images to /home/logs/images/epoch_00030_iteration_000000090.jpg Save checkpoint to /home/logs/epoch_00030_iteration_000000090_checkpoint.pt Computing FID. Get FID mean and cov and save to /home/logs/regular_fid/epoch_00030_iteration_000000090.npy Extract mean and covariance. Number of videos used for evaluation: 8 Number of frames per video used for evaluation: 10 Load FID mean and cov from /home/logs/regular_fid/real_mean_cov.npz Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 53, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 99, in get_video_activations inception = inception.to('cuda') File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 611, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 380, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 609, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fbf5ebed99b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7fbf5ee30280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fbf5ebd5dfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7fbfe2bf54e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x55c633390aae in /opt/conda/bin/python) frame #5: + 0xf244f (0x55c6332e844f in /opt/conda/bin/python) frame #6: + 0xf244f (0x55c6332e844f in /opt/conda/bin/python) frame #7: + 0xf2828 (0x55c6332e8828 in /opt/conda/bin/python) frame #8: + 0x19aa90 (0x55c633390a90 in /opt/conda/bin/python) frame #9: + 0xf27f8 (0x55c6332e87f8 in /opt/conda/bin/python) frame #10: + 0x19aa90 (0x55c633390a90 in /opt/conda/bin/python) frame #11: + 0xf2247 (0x55c6332e8247 in /opt/conda/bin/python) frame #12: + 0xf20d7 (0x55c6332e80d7 in /opt/conda/bin/python) frame #13: + 0xf20ed (0x55c6332e80ed in /opt/conda/bin/python) frame #14: PyDict_SetItem + 0x3da (0x55c63332ed7a in /opt/conda/bin/python) frame #15: PyDict_SetItemString + 0x4f (0x55c633335c5f in /opt/conda/bin/python) frame #16: PyImport_Cleanup + 0x99 (0x55c63339adc9 in /opt/conda/bin/python) frame #17: Py_FinalizeEx + 0x61 (0x55c633405961 in /opt/conda/bin/python) frame #18: Py_Main + 0x35e (0x55c63340fcae in /opt/conda/bin/python) frame #19: main + 0xee (0x55c6332d9f2e in /opt/conda/bin/python) frame #20: __libc_start_main + 0xe7 (0x7fc00c257b97 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: + 0x1c327f (0x55c6333b927f in /opt/conda/bin/python) Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 53, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 99, in get_video_activations inception = inception.to('cuda') File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 611, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 380, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 609, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f7f5020799b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f7f5044a280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f7f501efdfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7f7fd420f4e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x5622432daaae in /opt/conda/bin/python) frame #5: + 0xf244f (0x56224323244f in /opt/conda/bin/python) frame #6: + 0xf244f (0x56224323244f in /opt/conda/bin/python) frame #7: + 0xf2828 (0x562243232828 in /opt/conda/bin/python) frame #8: + 0x19aa90 (0x5622432daa90 in /opt/conda/bin/python) frame #9: + 0xf27f8 (0x5622432327f8 in /opt/conda/bin/python) frame #10: + 0x19aa90 (0x5622432daa90 in /opt/conda/bin/python) frame #11: + 0xf2247 (0x562243232247 in /opt/conda/bin/python) frame #12: + 0xf20d7 (0x5622432320d7 in /opt/conda/bin/python) frame #13: + 0xf20ed (0x5622432320ed in /opt/conda/bin/python) frame #14: PyDict_SetItem + 0x3da (0x562243278d7a in /opt/conda/bin/python) frame #15: PyDict_SetItemString + 0x4f (0x56224327fc5f in /opt/conda/bin/python) frame #16: PyImport_Cleanup + 0x99 (0x5622432e4dc9 in /opt/conda/bin/python) frame #17: Py_FinalizeEx + 0x61 (0x56224334f961 in /opt/conda/bin/python) frame #18: Py_Main + 0x35e (0x562243359cae in /opt/conda/bin/python) frame #19: main + 0xee (0x562243223f2e in /opt/conda/bin/python) frame #20: __libc_start_main + 0xe7 (0x7f7ffd871b97 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: + 0x1c327f (0x56224330327f in /opt/conda/bin/python) Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 53, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 99, in get_video_activations inception = inception.to('cuda') File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 611, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 380, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 609, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f15edc0499b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f15ede47280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f15edbecdfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7f1671c0c4e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x555ba313faae in /opt/conda/bin/python) frame #5: + 0xf244f (0x555ba309744f in /opt/conda/bin/python) frame #6: + 0xf244f (0x555ba309744f in /opt/conda/bin/python) frame #7: + 0xf2828 (0x555ba3097828 in /opt/conda/bin/python) frame #8: + 0x19aa90 (0x555ba313fa90 in /opt/conda/bin/python) frame #9: + 0xf27f8 (0x555ba30977f8 in /opt/conda/bin/python) frame #10: + 0x19aa90 (0x555ba313fa90 in /opt/conda/bin/python) frame #11: + 0xf2247 (0x555ba3097247 in /opt/conda/bin/python) frame #12: + 0xf20d7 (0x555ba30970d7 in /opt/conda/bin/python) frame #13: + 0xf20ed (0x555ba30970ed in /opt/conda/bin/python) frame #14: PyDict_SetItem + 0x3da (0x555ba30ddd7a in /opt/conda/bin/python) frame #15: PyDict_SetItemString + 0x4f (0x555ba30e4c5f in /opt/conda/bin/python) frame #16: PyImport_Cleanup + 0x99 (0x555ba3149dc9 in /opt/conda/bin/python) frame #17: Py_FinalizeEx + 0x61 (0x555ba31b4961 in /opt/conda/bin/python) frame #18: Py_Main + 0x35e (0x555ba31becae in /opt/conda/bin/python) frame #19: main + 0xee (0x555ba3088f2e in /opt/conda/bin/python) frame #20: __libc_start_main + 0xe7 (0x7f169b26eb97 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: + 0x1c327f (0x555ba316827f in /opt/conda/bin/python) Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 53, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 99, in get_video_activations inception = inception.to('cuda') File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 611, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 380, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 609, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f1c349a399b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f1c34be6280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f1c3498bdfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7f1cb89ab4e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x557456104aae in /opt/conda/bin/python) frame #5: + 0xf244f (0x55745605c44f in /opt/conda/bin/python) frame #6: + 0xf244f (0x55745605c44f in /opt/conda/bin/python) frame #7: + 0xf2828 (0x55745605c828 in /opt/conda/bin/python) frame #8: + 0x19aa90 (0x557456104a90 in /opt/conda/bin/python) frame #9: + 0xf27f8 (0x55745605c7f8 in /opt/conda/bin/python) frame #10: + 0x19aa90 (0x557456104a90 in /opt/conda/bin/python) frame #11: + 0xf2247 (0x55745605c247 in /opt/conda/bin/python) frame #12: + 0xf20d7 (0x55745605c0d7 in /opt/conda/bin/python) frame #13: + 0xf20ed (0x55745605c0ed in /opt/conda/bin/python) frame #14: PyDict_SetItem + 0x3da (0x5574560a2d7a in /opt/conda/bin/python) frame #15: PyDict_SetItemString + 0x4f (0x5574560a9c5f in /opt/conda/bin/python) frame #16: PyImport_Cleanup + 0x99 (0x55745610edc9 in /opt/conda/bin/python) frame #17: Py_FinalizeEx + 0x61 (0x557456179961 in /opt/conda/bin/python) frame #18: Py_Main + 0x35e (0x557456183cae in /opt/conda/bin/python) frame #19: main + 0xee (0x55745604df2e in /opt/conda/bin/python) frame #20: __libc_start_main + 0xe7 (0x7f1ce200db97 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: + 0x1c327f (0x55745612d27f in /opt/conda/bin/python) Epoch 00030, Iteration 000000090, Regular FID 373.8494401389866 Computing FID. Get FID mean and cov and save to /home/logs/average_fid/epoch_00030_iteration_000000090.npy Extract mean and covariance. Number of videos used for evaluation: 8 Number of frames per video used for evaluation: 10 Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 53, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 99, in get_video_activations inception = inception.to('cuda') File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 611, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 380, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 609, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fc4a875899b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7fc4a899b280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fc4a8740dfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7fc52c7604e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x55da19891aae in /opt/conda/bin/python) frame #5: + 0xf244f (0x55da197e944f in /opt/conda/bin/python) frame #6: + 0xf244f (0x55da197e944f in /opt/conda/bin/python) frame #7: + 0xf2828 (0x55da197e9828 in /opt/conda/bin/python) frame #8: + 0x19aa90 (0x55da19891a90 in /opt/conda/bin/python) frame #9: + 0xf27f8 (0x55da197e97f8 in /opt/conda/bin/python) frame #10: + 0x19aa90 (0x55da19891a90 in /opt/conda/bin/python) frame #11: + 0xf2247 (0x55da197e9247 in /opt/conda/bin/python) frame #12: + 0xf20d7 (0x55da197e90d7 in /opt/conda/bin/python) frame #13: + 0xf20ed (0x55da197e90ed in /opt/conda/bin/python) frame #14: PyDict_SetItem + 0x3da (0x55da1982fd7a in /opt/conda/bin/python) frame #15: PyDict_SetItemString + 0x4f (0x55da19836c5f in /opt/conda/bin/python) frame #16: PyImport_Cleanup + 0x99 (0x55da1989bdc9 in /opt/conda/bin/python) frame #17: Py_FinalizeEx + 0x61 (0x55da19906961 in /opt/conda/bin/python) frame #18: Py_Main + 0x35e (0x55da19910cae in /opt/conda/bin/python) frame #19: main + 0xee (0x55da197daf2e in /opt/conda/bin/python) frame #20: __libc_start_main + 0xe7 (0x7fc555dc2b97 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: + 0x1c327f (0x55da198ba27f in /opt/conda/bin/python) Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 53, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 99, in get_video_activations inception = inception.to('cuda') File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 611, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 380, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 609, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f2661e4c99b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f266208f280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f2661e34dfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7f26e5e544e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x55ee34244aae in /opt/conda/bin/python) frame #5: + 0xf244f (0x55ee3419c44f in /opt/conda/bin/python) frame #6: + 0xf244f (0x55ee3419c44f in /opt/conda/bin/python) frame #7: + 0xf2828 (0x55ee3419c828 in /opt/conda/bin/python) frame #8: + 0x19aa90 (0x55ee34244a90 in /opt/conda/bin/python) frame #9: + 0xf27f8 (0x55ee3419c7f8 in /opt/conda/bin/python) frame #10: + 0x19aa90 (0x55ee34244a90 in /opt/conda/bin/python) frame #11: + 0xf2247 (0x55ee3419c247 in /opt/conda/bin/python) frame #12: + 0xf20d7 (0x55ee3419c0d7 in /opt/conda/bin/python) frame #13: + 0xf20ed (0x55ee3419c0ed in /opt/conda/bin/python) frame #14: PyDict_SetItem + 0x3da (0x55ee341e2d7a in /opt/conda/bin/python) frame #15: PyDict_SetItemString + 0x4f (0x55ee341e9c5f in /opt/conda/bin/python) frame #16: PyImport_Cleanup + 0x99 (0x55ee3424edc9 in /opt/conda/bin/python) frame #17: Py_FinalizeEx + 0x61 (0x55ee342b9961 in /opt/conda/bin/python) frame #18: Py_Main + 0x35e (0x55ee342c3cae in /opt/conda/bin/python) frame #19: main + 0xee (0x55ee3418df2e in /opt/conda/bin/python) frame #20: __libc_start_main + 0xe7 (0x7f270f4b6b97 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: + 0x1c327f (0x55ee3426d27f in /opt/conda/bin/python) Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 53, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 99, in get_video_activations inception = inception.to('cuda') File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 611, in to return self._apply(convert) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 358, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 380, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 609, in convert return t.to(device, dtype if t.is_floating_point() else None, non_blocking) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fdc9bd0199b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7fdc9bf44280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fdc9bce9dfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7fdd1fd094e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x55f9d45e0aae in /opt/conda/bin/python) frame #5: + 0xf244f (0x55f9d453844f in /opt/conda/bin/python) frame #6: + 0xf244f (0x55f9d453844f in /opt/conda/bin/python) frame #7: + 0xf2828 (0x55f9d4538828 in /opt/conda/bin/python) frame #8: + 0x19aa90 (0x55f9d45e0a90 in /opt/conda/bin/python) frame #9: + 0xf27f8 (0x55f9d45387f8 in /opt/conda/bin/python) frame #10: + 0x19aa90 (0x55f9d45e0a90 in /opt/conda/bin/python) frame #11: + 0xf2247 (0x55f9d4538247 in /opt/conda/bin/python) frame #12: + 0xf20d7 (0x55f9d45380d7 in /opt/conda/bin/python) frame #13: + 0xf20ed (0x55f9d45380ed in /opt/conda/bin/python) frame #14: PyDict_SetItem + 0x3da (0x55f9d457ed7a in /opt/conda/bin/python) frame #15: PyDict_SetItemString + 0x4f (0x55f9d4585c5f in /opt/conda/bin/python) frame #16: PyImport_Cleanup + 0x99 (0x55f9d45eadc9 in /opt/conda/bin/python) frame #17: Py_FinalizeEx + 0x61 (0x55f9d4655961 in /opt/conda/bin/python) frame #18: Py_Main + 0x35e (0x55f9d465fcae in /opt/conda/bin/python) frame #19: main + 0xee (0x55f9d4529f2e in /opt/conda/bin/python) frame #20: __libc_start_main + 0xe7 (0x7fdd4936bb97 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: + 0x1c327f (0x55f9d460927f in /opt/conda/bin/python) Traceback (most recent call last): File "train.py", line 93, in main() File "train.py", line 87, in main trainer.end_of_epoch(data, current_epoch, current_iteration) File "/home/imaginaire/imaginaire/trainers/base.py", line 402, in end_of_epoch self.write_metrics() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 699, in write_metrics regular_fid, average_fid = self._compute_fid() File "/home/imaginaire/imaginaire/trainers/vid2vid.py", line 745, in _compute_fid is_video=True, few_shot_video=few_shot) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 45, in compute_fid is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 133, in load_or_compute_stats is_video, few_shot_video) File "/home/imaginaire/imaginaire/evaluation/fid.py", line 165, in get_inception_mean_cov sample_size, preprocess, few_shot_video) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, **kwargs) File "/home/imaginaire/imaginaire/evaluation/common.py", line 157, in get_video_activations batch_y = torch.cat(batch_y).cpu().data.numpy() File "/opt/conda/lib/python3.6/site-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(*new_args, **kwargs) RuntimeError: CUDA error: the launch timed out and was terminated terminate called after throwing an instance of 'c10::Error' what(): CUDA error: the launch timed out and was terminated Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f5e37a6799b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f5e37caa280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f5e37a4fdfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #3: + 0x5414e2 (0x7f5ebba6f4e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x19aaae (0x55fd1871eaae in /opt/conda/bin/python) frame #5: + 0xf244f (0x55fd1867644f in /opt/conda/bin/python) frame #6: + 0xf2247 (0x55fd18676247 in /opt/conda/bin/python) frame #7: + 0xf20d7 (0x55fd186760d7 in /opt/conda/bin/python) frame #8: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #9: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #10: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #11: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #12: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #13: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #14: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #15: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #16: + 0xf20ed (0x55fd186760ed in /opt/conda/bin/python) frame #17: PyDict_SetItem + 0x3da (0x55fd186bcd7a in /opt/conda/bin/python) frame #18: PyDict_SetItemString + 0x4f (0x55fd186c3c5f in /opt/conda/bin/python) frame #19: PyImport_Cleanup + 0x99 (0x55fd18728dc9 in /opt/conda/bin/python) frame #20: Py_FinalizeEx + 0x61 (0x55fd18793961 in /opt/conda/bin/python) frame #21: Py_Main + 0x35e (0x55fd1879dcae in /opt/conda/bin/python) frame #22: main + 0xee (0x55fd18667f2e in /opt/conda/bin/python) frame #23: __libc_start_main + 0xe7 (0x7f5ee50d1b97 in /lib/x86_64-linux-gnu/libc.so.6) frame #24: + 0x1c327f (0x55fd1874727f in /opt/conda/bin/python) Traceback (most recent call last): File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in main() File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'train.py', '--local_rank=7', '--config', 'configs/projects/vid2vid/kitti/ampO1.yaml', '--logdir', '/home/logs']' died with .