Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Docker running without CUDA, gives CUDA-related error #247

Open
jfhc opened this issue Jun 15, 2024 · 2 comments
Open

In Docker running without CUDA, gives CUDA-related error #247

jfhc opened this issue Jun 15, 2024 · 2 comments

Comments

@jfhc
Copy link

jfhc commented Jun 15, 2024

In Windows, running docker run -e COQUI_TOS_AGREED=1 -v D:/.local/share/tts:/root/.local/share/tts -v ${PWD}:/root -w /root ghcr.io/aedocw/epub2tts:release 'The Emergence of Social Space_ - Kristin Ross.epub' --engine xtts --speaker "Royston Min", it says "Using CPU" but then fails because the CUDA_HOME variable is not set. How can I address this?

Not enough VRAM on GPU or CUDA not found. Using CPU
Loading model: /root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
 > Downloading model to /root/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
100%|██████████| 1.87G/1.87G [02:38<00:00, 11.8MiB/s]
100%|██████████| 4.37k/4.37k [00:00<00:00, 5.86kiB/s]
100%|██████████| 361k/361k [00:00<00:00, 413kiB/s]
100%|██████████| 32.0/32.0 [00:00<00:00, 32.7iB/s]
100%|██████████| 7.75M/7.75M [00:18<00:00, 13.2MiB/s] > Model's license - CPML
 > Check https://coqui.ai/cpml.txt for more info.
 > Using model: xtts
[2024-06-15 19:35:18,347] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-15 19:35:19,205] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown
[2024-06-15 19:35:19,208] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2024-06-15 19:35:19,209] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2024-06-15 19:35:19,210] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py310_cu121/transformer_inference...
Detected CUDA files, patching ldflags
Traceback (most recent call last):
  File "/opt/epub2tts/epub2tts.py", line 746, in <module>
    main()
  File "/opt/epub2tts/epub2tts.py", line 735, in main
    mybook.read_book(
  File "/opt/epub2tts/epub2tts.py", line 384, in read_book
    self.model.load_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/TTS/tts/models/xtts.py", line 783, in load_checkpoint
    self.gpt.init_gpt_for_inference(kv_cache=self.args.kv_cache, use_deepspeed=use_deepspeed)
  File "/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/xtts/gpt.py", line 224, in init_gpt_for_inference
    self.ds_engine = deepspeed.init_inference(
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/__init__.py", line 342, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 158, in __init__
    self._apply_injection_policy(config)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/inference/engine.py", line 418, in _apply_injection_policy
    replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 342, in replace_transformer_layer
    replaced_module = replace_module(model=model,
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 586, in replace_module
    replaced_module, _ = _replace_module(model, policy, state_dict=sd)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 646, in _replace_module
    _, layer_id = _replace_module(child,
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 646, in _replace_module
    _, layer_id = _replace_module(child,
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 622, in _replace_module
    replaced_module = policies[child.__class__][0](child,
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 298, in replace_fn
    new_module = replace_with_policy(child,
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/replace_module.py", line 247, in replace_with_policy
    _container.create_module()
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/module_inject/containers/gpt2.py", line 20, in create_module
    self.module = DeepSpeedGPTInference(_config, mp_group=self.mp_group)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/transformers/ds_gpt.py", line 20, in __init__
    super().__init__(config, mp_group, quantize_scales, quantize_groups, merge_count, mlp_extra_grouping)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 58, in __init__
    inference_module = builder.load()
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/op_builder/builder.py", line 458, in load
    return self.jit_load(verbose)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/op_builder/builder.py", line 502, in jit_load
    op_module = load(name=self.name,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1308, in load
    return _jit_compile(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1800, in _write_ninja_file_and_build_library
    extra_ldflags = _prepare_ldflags(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1899, in _prepare_ldflags
    if (not os.path.exists(_join_cuda_home(extra_lib_dir)) and
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2416, in _join_cuda_home
    raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
@aedocw
Copy link
Owner

aedocw commented Jun 16, 2024

I started to work on a CUDA-ready docker container (https://github.com/aedocw/epub2tts/blob/main/Dockerfile.cuda12) but it didn't work as expected, and I did not have a good test environment to go further unfortunately. Maybe someone else here will have answers but I wanted to let you know I'm not going to be able to address this, so it would have to be someone else.

Your best bet would be to run this in a python virtual environment in WSL.

@jfhc
Copy link
Author

jfhc commented Jun 16, 2024

Thanks, @aedocw - I have got it working in WSL now with the default model, but xtts still not working. I will raise another issue maybe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants