You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
请问大佬下面这个问题是为什么
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: /root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
warn(f"Failed to load image Python extension: {e}")
/root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: /root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
warn(f"Failed to load image Python extension: {e}")
01/21 05:42:18 AM Added key: store_based_barrier_key:1 to store for rank: 1
01/21 05:42:18 AM Added key: store_based_barrier_key:1 to store for rank: 0
01/21 05:42:18 AM Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total 2.
01/21 05:42:18 AM Training in distributed mode with multiple processes, 1 GPU per process. Process 1, total 2.
01/21 05:42:20 AM Model slimmable_mbnet_v1_bn_uniform created, param count: 7676204
01/21 05:42:20 AM Data processing configuration for current model + dataset:
01/21 05:42:20 AM input_size: (3, 224, 224)
01/21 05:42:20 AM interpolation: bicubic
01/21 05:42:20 AM mean: (0.485, 0.456, 0.406)
01/21 05:42:20 AM std: (0.229, 0.224, 0.225)
01/21 05:42:20 AM crop_pct: 0.875
01/21 05:42:20 AM NVIDIA APEX not installed. AMP off.
01/21 05:42:21 AM Using torch DistributedDataParallel. Install NVIDIA Apex for Apex DDP.
01/21 05:42:21 AM Scheduled epochs: 40
01/21 05:42:21 AM Training folder does not exist at: images/train
01/21 05:42:21 AM Training folder does not exist at: images/train
Killing subprocess 239
Killing subprocess 240
Traceback (most recent call last):
File "/root/anaconda3/envs/0108/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/0108/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/0108/bin/python', '-u', 'train.py', '--local_rank=1', 'images', '-c', './configs/mobilenetv1_bn_uniform_reset_bn.yml']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered:
请问大佬下面这个问题是为什么
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: /root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
warn(f"Failed to load image Python extension: {e}")
/root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: /root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
warn(f"Failed to load image Python extension: {e}")
01/21 05:42:18 AM Added key: store_based_barrier_key:1 to store for rank: 1
01/21 05:42:18 AM Added key: store_based_barrier_key:1 to store for rank: 0
01/21 05:42:18 AM Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total 2.
01/21 05:42:18 AM Training in distributed mode with multiple processes, 1 GPU per process. Process 1, total 2.
01/21 05:42:20 AM Model slimmable_mbnet_v1_bn_uniform created, param count: 7676204
01/21 05:42:20 AM Data processing configuration for current model + dataset:
01/21 05:42:20 AM input_size: (3, 224, 224)
01/21 05:42:20 AM interpolation: bicubic
01/21 05:42:20 AM mean: (0.485, 0.456, 0.406)
01/21 05:42:20 AM std: (0.229, 0.224, 0.225)
01/21 05:42:20 AM crop_pct: 0.875
01/21 05:42:20 AM NVIDIA APEX not installed. AMP off.
01/21 05:42:21 AM Using torch DistributedDataParallel. Install NVIDIA Apex for Apex DDP.
01/21 05:42:21 AM Scheduled epochs: 40
01/21 05:42:21 AM Training folder does not exist at: images/train
01/21 05:42:21 AM Training folder does not exist at: images/train
Killing subprocess 239
Killing subprocess 240
Traceback (most recent call last):
File "/root/anaconda3/envs/0108/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/0108/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/0108/bin/python', '-u', 'train.py', '--local_rank=1', 'images', '-c', './configs/mobilenetv1_bn_uniform_reset_bn.yml']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered: