-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesla A100 40gb on Colab training ERROR #46
Comments
I've got the same error. |
Same. It dies on: noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample |
I found the solution. Don't move the focus from that window. If you do, the colab gets disconnected. You will have to restart the process and this time keep the window focused until the cell finishes running. |
@OleksandrKorovii noted in #51 that install the CUDA toolkit solved the issue:
|
and need create links: sudo ln -sf <path_to_libcudart.so.???> /usr/lib/libcudart.so |
I finally got a A100 40gb on colab but this error appeared on training :(
The following values were not passed to
accelerate launch
and had defaults used instead:--num_processes
was set to a value of1
--num_machines
was set to a value of1
--mixed_precision
was set to a value of'no'
--num_cpu_threads_per_process
was set to6
to improve out-of-box performanceTo avoid this warning pass in values for each of the problematic parameters or run
accelerate config
.===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('["--ip=172.28.0.2"],"debugAdapterMultiplexerPath"'), PosixPath('{"kernelManagerProxyPort"'), PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('true}'), PosixPath('"172.28.0.3","jupyterArgs"')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
"WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda111.so...
Steps: 0% 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/ibaisks', '--class_data_dir=/content/data/person', '--output_dir=/content/models/ibaisks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=ibaisks', '--class_prompt=person', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--sample_batch_size=4', '--max_train_steps=1000']' died with <Signals.SIGABRT: 6>.
The text was updated successfully, but these errors were encountered: