Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda Error with low_mem #36

Closed
PhysWiz314 opened this issue Jul 4, 2019 · 16 comments · Fixed by #536 or elliotthwang/Real-Time-Voice-Cloning#1
Closed

Cuda Error with low_mem #36

PhysWiz314 opened this issue Jul 4, 2019 · 16 comments · Fixed by #536 or elliotthwang/Real-Time-Voice-Cloning#1

Comments

@PhysWiz314
Copy link

Hi,
I have come across the following error when using the toolbox in low memory mode.
image
On this computer, my GPU only has 2GB so I need to use this mode.

I have tested this on another computer that has a GPU with 4GB RAM. The toolbox works perfectly in normal mode but when I turn on low_mem, I run into the same error.

I'm not sure what other information you would need to look into this so please let me know what else I can provide to help out.

@CorentinJ
Copy link
Owner

What does demo_cli.py give?

@PhysWiz314
Copy link
Author

Without the low_mem flag, it says that all tests pass. With the low_mem flag, I get the same CUDA_ERROR_NOT_INITIALIZED.

@CorentinJ
Copy link
Owner

Hmm, I don't have a solution right now. I'll have to look into it.

@nitram147
Copy link

Same problem.

"demo_cli.py" output without "--low_mem" flag:
Found 1 GPUs available. Using GPU 0 (GeForce MX150) of compute capability 6.1 with 2.1Gb total memory.
...
Use standard file APIs to check for files with this prefix. Testing the vocoder... All test passed! You can now synthesize speech.

"demo_cli.py" output with "--low_mem" flag:

Use standard file APIs to check for files with this prefix. E0707 15:55:23.503854 139813249828608 driver.py:321] Call to cuInit results in CUDA_ERROR_NOT_INITIALIZED
...
multiprocess.pool.RemoteTraceback: /python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 233, in initialize raise CudaSupportError("Error at driver init: \n%s:" % e) numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: [3] Call to cuInit results in CUDA_ERROR_NOT_INITIALIZED: multiprocess/pool.py", line 644, in get raise self._value numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: [3] Call to cuInit results in CUDA_ERROR_NOT_INITIALIZED:

@haoqizhenhao
Copy link

I have the same error

@CorentinJ
Copy link
Owner

Can you try pulling from this new branch and see what gives?

@nitram147
Copy link

New branch code without "--low_mem" flag get stucked on "Interactive generation loop" after "all test passed".
Output:

W0710 12:49:35.993566 140170637424384 deprecation.py:323] From /Real-Time-Voice-Cloning/env/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Testing the vocoder...
All test passed! You can now synthesize speech.
This is a GUI-less example of interface to SV2TTS. The purpose of this script is to show how you can interface this project easily with your own. See the source code for an explanation of what is happening.
Interactive generation loop

With "--low_mem" flag same cuda init error.
Output:
https://pastebin.com/PFBEbEnz

@PhysWiz314
Copy link
Author

I still run into the same issue as well with the --low_mem flag.

@haoqizhenhao
Copy link

When you get stucked on "Interactive generation loop", you should input your own audio file, then the code will work, you can read the source code . But I don't know why the message didn't work.

@ahmed-menshawy
Copy link

I have the same error with 2.1G GPU!

@CorentinJ
Copy link
Owner

I can't reproduce this unfortunately, I'm running a win10 install on two computers (one with 2gb VRAM) and tensorflow-gpu=1.13.1. I'm going to need more information to look into it.

@PhysWiz314
Copy link
Author

I am running Linux on both computers. Ubuntu 18.04 with nvidia-430 and Cuda 10.2 (2gb VRAM) and Linux Mint 19.1 with nvidia-410 and Cuda 10.0 (4gb VRAM) on the other.

I have been playing around a bit with the code and it seems that in vocoder/inference.py (almost) any call to numba.cuda results in the error.

@lilydjwg
Copy link

lilydjwg commented Sep 9, 2019

I've found out why and how to fix it: multiprocess uses forked workers by default, which inherits some state CUDA isn't expected. Switching to spawned workers fixes it.

patch
diff --git a/synthesizer/inference.py b/synthesizer/inference.py
index 99fb778..b9cc9c0 100644
--- a/synthesizer/inference.py
+++ b/synthesizer/inference.py
@@ -2,12 +2,12 @@ from synthesizer.tacotron2 import Tacotron2
 from synthesizer.hparams import hparams
 from multiprocess.pool import Pool  # You're free to use either one
 #from multiprocessing import Pool   # 
+from multiprocess.context import SpawnContext
 from synthesizer import audio
 from pathlib import Path
 from typing import Union, List
 import tensorflow as tf
 import numpy as np
-import numba.cuda
 import librosa
 
 
@@ -80,13 +80,15 @@ class Synthesizer:
             # Low memory inference mode: load the model upon every request. The model has to be 
             # loaded in a separate process to be able to release GPU memory (a simple workaround 
             # to tensorflow's intricacies)
-            specs, alignments = Pool(1).starmap(Synthesizer._one_shot_synthesize_spectrograms, 
-                                                [(self.checkpoint_fpath, embeddings, texts)])[0]
+            specs, alignments = Pool(1, context=SpawnContext()
+                                    ).starmap(Synthesizer._one_shot_synthesize_spectrograms, 
+                                              [(self.checkpoint_fpath, embeddings, texts)])[0]
     
         return (specs, alignments) if return_alignments else specs
 
     @staticmethod
     def _one_shot_synthesize_spectrograms(checkpoint_fpath, embeddings, texts):
+        import numba.cuda
         # Load the model and forward the inputs
         tf.reset_default_graph()
         model = Tacotron2(checkpoint_fpath, hparams)

@CorentinJ
Copy link
Owner

I've implemented it in a rush on the low_mem_fix branch, can you try and see what gives?

@PhysWiz314
Copy link
Author

demo_cli.py now passes but I've run into another bug with demo_toolbox.py so I can't verify that one.

@PiotrDabrowskey
Copy link

I've got the same problem and can confirm that @lilydjwg's solution solved it.

@ghost ghost mentioned this issue Jun 25, 2020
@ghost ghost mentioned this issue Sep 29, 2020
@ghost ghost closed this as completed in #536 Sep 30, 2020
ghost pushed a commit that referenced this issue Sep 30, 2020
* For low_mem, use spawned workers instead of forked workers (resolves #36)
Used implementation from @lilydjwg: #36 (comment)

* Different method of passing the seed for low_mem inference
Resolves #491, #529, #535
AIProphet added a commit to AIProphet/Real-Time-Voice-Cloning that referenced this issue Jul 12, 2023
* For low_mem, use spawned workers instead of forked workers (resolves #36)
Used implementation from @lilydjwg: CorentinJ/Real-Time-Voice-Cloning#36 (comment)

* Different method of passing the seed for low_mem inference
Resolves #491, #529, #535
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants