Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error output when WD14 Captioning #188

Closed
ThanapatSornsrivichai opened this issue Feb 17, 2023 · 14 comments
Closed

Error output when WD14 Captioning #188

ThanapatSornsrivichai opened this issue Feb 17, 2023 · 14 comments

Comments

@ThanapatSornsrivichai
Copy link

Got this error when try to caption with WD14. Image size >1000x1000
GPU rtx3090
tried accelerate config and update again, not working

Captioning files in D:/Kohya/dataset/yorra ench...
accelerate launch "./finetune/tag_images_by_wd14_tagger.py" --batch_size="1" --thresh="0.35" --caption_extension=".txt" "D:/Kohya/dataset/yorra ench"
2023-02-18 01:25:45.516228: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-02-18 01:25:45.516363: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
using existing wd14 tagger model
found 19 images.
loading model and labels
2023-02-18 01:25:50.592676: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "D:\Kohya\kohya_ss\finetune\tag_images_by_wd14_tagger.py", line 200, in
main(args)
File "D:\Kohya\kohya_ss\finetune\tag_images_by_wd14_tagger.py", line 96, in main
model = load_model(args.model_dir)
File "D:\Kohya\kohya_ss\venv\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "D:\Kohya\kohya_ss\venv\lib\site-packages\tensorflow\python\eager\context.py", line 622, in ensure_initialized
context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.
Traceback (most recent call last):
File "C:\Users\thana\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\thana\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Kohya\kohya_ss\venv\Scripts\accelerate.exe_main
.py", line 7, in
File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\Kohya\kohya_ss\venv\Scripts\python.exe', './finetune/tag_images_by_wd14_tagger.py', '--batch_size=1', '--thresh=0.35', '--caption_extension=.txt', 'D:/Kohya/dataset/yorra ench']' returned non-zero exit status 1.
...captioning done

@bmaltais
Copy link
Owner

This is really strange. I have never seen this error and I have no idea that it might be... appear to be related to some Status: cudaGetErrorString symbol not found. when running the model... I will keep an eye on it.

@tetsuoo-online
Copy link

tetsuoo-online commented Feb 19, 2023

I'm having exactly the same issue, with a RTX 3060.
The optional CUDNN 8.6 has been installed, maybe I shouldn't have ? I don't have a single idea how to rebuild Tensorflow, let alone "with the appropriate complier flags", this is way over my skills
Edit: the trick for now is to use the WD14 Tagger extension for WebUI, there is a batch option :)
https://github.com/toriato/stable-diffusion-webui-wd14-tagger.git

@williamkmlau
Copy link

same problem here

@williamkmlau
Copy link

williamkmlau commented Feb 23, 2023

OK after more than an hour of testing different methods, I resolved the issue.

First ensure you have Microsoft Visual C++ Redistributable for Visual Studio 2015-2022 (I already had this, but this may be one of the reasons it fails).

Edit: Please note I am unsure if the following is a good solution because TF version > 2.10 cannot use GPU on Windows Native

For my case installing latest version of TF resolved the error:

Run in powershell / cmd at project root:

.\venv\Scripts\activate
pip install tf-nightly

This means its a problem with the TensorFlow version?
Hopefully this helps identify the problem @bmaltais

@williamkmlau
Copy link

OK Heres a different solution. I installed CUDA v11.2 (Only this exact version works for the tensorflow 2.10 required in this project) and CudNN (I used v8.1.1 but probably 8.5+ should be compatible too but I haven't tested). This got rid of the errors and let the script continue as intended.

@manofletters
Copy link

An alternative for the time being: https://github.com/toriato/stable-diffusion-webui-wd14-tagger

@treksis
Copy link

treksis commented Mar 6, 2023

I'm using windows with 3090

I had the same error with wd tagger but it worked out with @williamkmlau 's magic.

.\venv\Scripts\activate
pip install tf-nightly

@yunghoy
Copy link

yunghoy commented Mar 6, 2023

**treksis ** commented Mar 5, 2023

Have a dependency issue. Set up cleanly and installed tf-nightly.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xformers 0.0.14.dev0 requires pyre-extensions==0.0.23, which is not installed.
tensorflow 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.22.0 which is incompatible.
tensorboard 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.22.0 which is incompatible.
tensorboard 2.10.1 requires tensorboard-data-server<0.7.0,>=0.6.0, but you have tensorboard-data-server 0.7.0 which is incompatible.

I think using wd14-tagger from the public extension of automatic1111 is the good alternative for now.

@kumpuu
Copy link

kumpuu commented Mar 10, 2023

Don't know if it helps anyone, but I had the same error when installing:
2023-02-18 01:25:45.516228: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2023-02-18 01:25:45.516363: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

The issue was that somehow a newer version of torch/torchvision was installed, which does not seem to include cudart64_110.dll.
Running pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 again fixed the error for me.

@Zaki-XL
Copy link

Zaki-XL commented Apr 15, 2023

This is a monkey patch. Please consider a permanent solution.

\venv\Scripts\activate.bat

rem set PATH=%VIRTUAL_ENV%\Scripts;%PATH%
set PATH=%VIRTUAL_ENV%\Scripts;%VIRTUAL_ENV%\Lib\site-packages\torch\lib;%PATH%
set VIRTUAL_ENV_PROMPT=(venv)

In short, the problem is that the PATH set in venv does not include the path to the cudart64_110.dll installed in site-packages. I tried to solve this problem with os.add_dll_directory(), but I couldn't add the PATH in the venv environment.

I think those who don't encounter this problem have the cudart64_110.dll located in a PATH that is already set.

I got some hints from this issue:
tensorflow/tensorflow#43193

@Zaki-XL
Copy link

Zaki-XL commented Apr 15, 2023

If you want to debug this problem, it's a good idea to check the location of the cudart64_110.dll and exec print(os.environ['PATH']) .

@Zaki-XL
Copy link

Zaki-XL commented Apr 15, 2023

This idea is better because it does not require modifying venv.

gui.bat

@echo off

:: Activate the virtual environment
call .\venv\Scripts\activate.bat
set PATH=%PATH%;%~dp0venv\Lib\site-packages\torch\lib

:: Validate the requirements and store the exit code
python.exe .\tools\validate_requirements.py

:: If the exit code is 0, run the kohya_gui.py script with the command-line arguments
if %errorlevel% equ 0 (
    python.exe kohya_gui.py %*
)

@bmaltais
Copy link
Owner

Thanks, I will add it to thebat file and also add the equivalent for the ps1 file.

@bmaltais bmaltais mentioned this issue Apr 18, 2023
@DarksealStudios
Copy link

DarksealStudios commented Apr 29, 2023

cudart64_110.dll not found, 21.5.5

I'm new to all of this so I don't have a solid understanding of how to get it in there, but I have tried
.\venv\Scripts\activate
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
python.exe -m pip install --upgrade pip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants