-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempting to register factory for plugin cuDNN/cuFFT/cuBLAS on Linux install #2263
Comments
Not sure I totally get what changes you are asking for... there is no requirements_nvidia.txt file... so what actual requirements do you want to see in it? |
Which requirements file need the upgrade to tensorflow 2.16? How do you install the GUI? Do you use special parameters to specify the requirements file? I don't use linux so I am not familiar with how you need the current solution to be changed and updated to properly work on your platform... |
I switched to: tensorboard==2.16.2 tensorflow[and-cuda]==2.16.1 As not every GPU is an nvidia, i suggested that we add a requirements_nvidia.txt, which contains the cuda package as default, while for other setups the normal requirements_linux.txt is used |
But how will you call this? Is the setup.sh going to handle this as is? I think the best would be if you create a pull request to propose all the needed code change to make this work properly. That way I can merge it and others will be able to use it… |
I execute it by pulling kohya_ss on the Ubuntu system/ Before setup.sh, please modify the requirements in both requests.linux.txt and requests.linux_docker.txt———— |
Python 3.10.11 |
Installing Now I can get As for TensorRT, you may check this, download and extract the tar file and setup the LD_LIBRARY_PATH in gui.sh, use symlink if needed. |
kohya uses pytorch for GPU training, so any messages from tensorflow saying "unable to register ____ factory" or "could not find cuda drivers" can be ignored. There's no practical use for installing a cuda-enabled build of tensorflow. It's only brought in as a dependency for tensorboard. |
Hi there,
i tried now each feasible way to install the WebUI on a Linux server with multiple GPUs.
There are some smaller issues identified:
When I use common commands for CUDA version checkup and installation verification, only tensorflow and torch commands fail.
This problem was fixed for Ubuntu in Version 2.16.
It seems that it was detected for WSL users, but still appears on other Ubuntu installations.
Server:
Errors:
accelerate launch --mixed_precision="fp16" --num_processes=1 --num_machines=1 --num_cpu_threads_per_process=2 "/home/excel/kohya_ss/sd-scripts/train_network.py" --bucket_no_upscale --bucket_reso_steps=64 --cache_latents --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --huber_c="0.1" --huber_schedule="snr" --learning_rate="0.0001" --logging_dir="/home/excel/kohya_ss/logs" --loss_type="l2" --lr_scheduler="cosine" --lr_scheduler_num_cycles="1" --lr_warmup_steps="57" --max_data_loader_n_workers="0" --max_grad_norm="1" --resolution="512,512" --max_train_steps="570" --min_timestep=0 --mixed_precision="fp16" --network_alpha="1" --network_dim=8 --network_module=networks.lora --optimizer_type="AdamW8bit" --output_dir="/home/excel/kohya_ss/outputs/hedgeforest" --output_name="hedgeforest" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="fp16" --text_encoder_lr=0.0001 --train_batch_size="1" --train_data_dir="/home/excel/bildersets/hedgehogs/images" --unet_lr=0.0001 --xformers
2024-04-11 16:08:53.335832: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-11 16:08:53.376158: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-11 16:08:53.376187: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-11 16:08:53.377607: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-11 16:08:53.384410: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-11 16:08:53.384622: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-11 16:08:54.317341: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
So i kindly request to upgrade to tensorflow 2.16 and also add the "cuda" options for the pip package installation as a default requirements_nvidia.txt
The text was updated successfully, but these errors were encountered: