RuntimeError: CUDA out of memory. #1005

ricardohalla · 2021-12-08T19:46:12Z

ricardohalla
Dec 8, 2021

Greetings!

I'm very new to this TTS area. Therefore, I have a question regarding the usage of GPU memory.
I installed and run the CoquiTTS with CUDA following the instructions of GuyPaddock from this link.

I started training a tts model from a pre-trained model. Below is my system specifications, my pip list and my config.json file:

Laptop
Windows 10, 64 bits, 16 GB of RAM
Intel Core i7-7700HQ 2.80 GHz
NVIDIA GeForce GTX 1050 Ti, 4 GB of memory
Python 3.6.8

pip list ↓

Package            Version     
------------------ ------------
anyascii           0.3.0       
appdirs            1.4.4       
audioread          2.1.9       
Babel              2.8.1       
beautifulsoup4     4.10.0      
certifi            2021.10.8   
cffi               1.15.0      
charset-normalizer 2.0.7       
click              8.0.3       
colorama           0.4.4       
coqpit             0.0.14      
cycler             0.11.0      
Cython             0.29.24     
dataclasses        0.8
decorator          5.1.0
docopt             0.6.2
entrypoints        0.3
filelock           3.3.2
Flask              2.0.2
fsspec             2021.10.1
gdown              4.2.0
gruut              1.2.3
gruut-ipa          0.9.3
gruut-lang-cs      1.2
gruut-lang-de      1.2
gruut-lang-es      1.2
gruut-lang-fr      1.2.1
gruut-lang-it      1.2
gruut-lang-nl      1.2
gruut-lang-pt      1.2
gruut-lang-ru      1.2
gruut-lang-sv      1.2
idna               3.3
importlib-metadata 4.8.1
inflect            5.3.0
ipython-genutils   0.2.0
itsdangerous       2.0.1
jieba              0.42.1
Jinja2             3.0.2
joblib             1.1.0
jsonlines          1.2.0
kiwisolver         1.3.1
librosa            0.8.0
llvmlite           0.36.0
MarkupSafe         2.0.1
matplotlib         3.3.4
mecab-python3      1.0.3
num2words          0.5.10
numba              0.53.0
numpy              1.19.5
packaging          21.2
pandas             1.1.5
Pillow             8.4.0
pip                18.1
pooch              1.5.2
protobuf           3.19.1
pycparser          2.20
pynndescent        0.5.5
pyparsing          3.0.4
pypinyin           0.43.0
pysbd              0.3.4
PySocks            1.7.1
python-crfsuite    0.9.7
python-dateutil    2.8.2
pytz               2021.3
pyworld            0.3.0
PyYAML             6.0
pyzmq              22.3.0
requests           2.26.0      
resampy            0.2.2
scikit-learn       0.24.2
scipy              1.5.4
setuptools         40.6.2
six                1.16.0
SoundFile          0.10.3.post1
soupsieve          2.2.1
tensorboardX       2.4
threadpoolctl      3.0.0
torch              1.8.0+cu101
torchaudio         0.8.0
torchvision        0.9.0+cu101
tornado            6.1
tqdm               4.62.3
traitlets          4.3.3
TTS                0.4.2
typing-extensions  3.10.0.2
umap-learn         0.5.1
unidic-lite        1.0.8
urllib3            1.26.7
Werkzeug           2.0.2
zipp               3.6.0

config.json ↓

{
"github_branch":"* dev",
"restore_path":"../../voice-cloning/glowtts/ljspeech-mozilla/best_model.pth.tar",
    "model": "glow_tts",
    "run_name": "GlowTTS-Trans-Portuguese",
    "run_description": "GlowTTS Trans Portuguese",

    // AUDIO PARAMETERS
    // AUDIO PARAMETERS
    "audio":{
        // Audio processing parameters
        "num_mels": 80,         // size of the mel spec frame. 
        "fft_size": 1024,       // number of stft frequency levels. Size of the linear spectogram frame.
        "sample_rate": 22050,   // DATASET-RELATED: wav sample-rate. If different than the original data, it is resampled.
        "win_length": 1024,     // stft window length in ms.
        "hop_length": 256,      // stft window hop-lengh in ms.
        "frame_length_ms": null,  // stft window length in ms.If null, 'win_length' is used.
        "frame_shift_ms": null,   // stft window hop-lengh in ms. If null, 'hop_length' is used.
        "preemphasis": 0.98,    // pre-emphasis to reduce spec noise and make it more structured. If 0.0, no -pre-emphasis.
        "min_level_db": -100,   // normalization range
        "ref_level_db": 20,     // reference level db, theoretically 20db is the sound of air.
        "power": 1.5,           // value to sharpen wav signals after GL algorithm.
        "griffin_lim_iters": 60,// #griffin-lim iterations. 30-60 is a good range. Larger the value, slower the generation.
        "stft_pad_mode": "reflect",
        // Normalization parameters
        "signal_norm": true,    // normalize the spec values in range [0, 1]
        "symmetric_norm": true, // move normalization to range [-1, 1]
        "max_norm": 4.0,          // scale normalization to range [-max_norm, max_norm] or [0, max_norm]
        "clip_norm": true,      // clip normalized values into the range.
        "mel_fmin": 0.0,         // minimum freq level for mel-spec. ~50 for male and ~95 for female voices. Tune for dataset!!
        "mel_fmax": 8000.0,        // maximum freq level for mel-spec. Tune for dataset!!
        "spec_gain": 20.0, 
        "do_trim_silence": false,  // enable trimming of slience of audio as you load it. LJspeech (false), TWEB (false), Nancy (true)
        "trim_db": 60,          // threshold for timming silence. Set this according to your dataset.
        "stats_path": null    // DO NOT USE WITH MULTI_SPEAKER MODEL. scaler stats file computed by 'compute_statistics.py'. If it is defined, mean-std based notmalization is used and other normalization params are ignored
    },
    // VOCABULARY PARAMETERS
    // if custom character set is not defined,
    // default set in symbols.py is used
    "characters":{
        "pad": "_",
        "eos": "&",
        "bos": "*",
        "characters": "ABCDEFGHIJKLMNOPQRSTUVWXYZÇÃÀÁÂÊÉÍÓÔÕÚÛabcdefghijklmnopqrstuvwxyzçãàáâêéíóôõúû!(),-.:;? ",
        "punctuations":"!'(),-.:;? ",
        "phonemes":"iyɨʉɯuɪʏʊeøɘəɵɤoɛœɜɞʌɔæɐaɶɑɒᵻʘɓǀɗǃʄǂɠǁʛpbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟˈˌːˑʍwɥʜʢʡɕʑɺɧɚ˞ɫ'̃' "
    },

    "add_blank": true, // if true add a new token after each token of the sentence. This increases the size of the input sequence, but has considerably improved the prosody of the GlowTTS model.

    // DISTRIBUTED TRAINING
    "distributed":{
        "backend": "nccl",
        "url": "tcp:\/\/localhost:54321"
    },

    "reinit_layers": [],    // give a list of layer names to restore from the given checkpoint. If not defined, it reloads all heuristically matching layers.

    // MODEL PARAMETERS
    // "use_mas": false,       // use Monotonic Alignment Search if true. Otherwise use pre-computed attention alignments.
    "hidden_channels_encoder": 192,
    "hidden_channels_decoder": 192,
    "hidden_channels_duration_predictor": 256,
    "use_encoder_prenet": true,
    "encoder_type": "rel_pos_transformer",
    "encoder_params": {
        "kernel_size":3,
        "dropout_p": 0.1,
        "num_layers": 6,
        "num_heads": 2,
        "hidden_channels_ffn": 768,
        "input_length": null
    },

    // TRAINING
    "batch_size": 32,       // Batch size for training. Lower values than 32 might cause hard to learn attention. It is overwritten by 'gradual_training'.
    "eval_batch_size":50,
    "r": 1,                 // Number of decoder frames to predict per iteration. Set the initial values if gradual training is enabled.
    "loss_masking": true,   // enable / disable loss masking against the sequence padding.
    "mixed_precision": false, 
    "data_dep_init_iter": 10,

    // VALIDATION
    "run_eval": true,
    "test_delay_epochs": 20,       //Until attention is aligned, testing only wastes computation time.
    "test_sentences_file": "../../datasets/BRSpeech-3-Speakers-Paper/BRSpeech-3-Speakers-Paper/TTS-Portuguese_Corpus/test_setences.txt",  // set a file to load sentences to be used for testing. If it is null then we use default english sentences.
    // OPTIMIZER
    "noam_schedule": true,         // use noam warmup and lr schedule.
    "grad_clip": 5.0,              // upper limit for gradients for clipping.
    "epochs": 10000,               // total number of epochs to train.
    "lr": 1e-3,                    // Initial learning rate. If Noam decay is active, maximum learning rate.
    "wd": 0.000001,                // Weight decay weight.
    "warmup_steps": 4000,          // Noam decay steps to increase the learning rate from 0 to "lr"
    "seq_len_norm": false,         // Normalize eash sample loss with its length to alleviate imbalanced datasets. Use it if your dataset is small or has skewed distribution of sequence lengths.


    // TENSORBOARD and LOGGING
    "print_step": 25,       // Number of steps to log training on console.
    "tb_plot_step": 100,    // Number of steps to plot TB training figures.
    "print_eval": false,     // If True, it prints intermediate loss values in evalulation.
    "save_step": 5000,      // Number of training steps expected to save traninpg stats and checkpoints.
    "checkpoint": true,     // If true, it saves checkpoints per "save_step"
    "tb_model_param_stats": false,     // true, plots param stats per layer on tensorboard. Might be memory consuming, but good for debugging.
    "apex_amp_level": null,

    // DATA LOADING
    "text_cleaner": "portuguese_cleaners",
    "enable_eos_bos_chars": false, // enable/disable beginning of sentence and end of sentence chars.
    "num_loader_workers": 8,        // number of training data loader processes. Don't set it too big. 4-8 are good values.
    "num_val_loader_workers": 8,    // number of evaluation data loader processes.
    "batch_group_size": 0,  //Number of batches to shuffle after bucketing.
    "min_seq_len": 2,       // DATASET-RELATED: minimum text length to use in training
    "max_seq_len": 500,     // DATASET-RELATED: maximum text length
    "compute_f0": false,     // compute f0 values in data-loader
    "use_noise_augment": false,  //add a random noise to audio signal for augmentation at training .
    "compute_input_seq_cache": true,


    // PATHSx-special/nautilus-clipboard
    "output_path": "C:/Users/User/Desktop/Projeto TTS/Output/",

    // PHONEMES
    "phoneme_cache_path": "phoneme_cache",  // phoneme computation is slow, therefore, it caches results in the given folder.
    "use_phonemes": true,           // use phonemes instead of raw characters. It is suggested for better pronounciation.
    "phoneme_language": "pt-br",     // depending on your target language, pick one from  https://github.com/bootphon/phonemizer#languages

    // MULTI-SPEAKER and GST
    "use_random_speaker_data_augmentation": false, 
    "use_speaker_encoder_loss": false,
    "speaker_encoder":{
        "config_path": "../checkpoints/Speaker_Encoder/config.json",
        "checkpoint_path":"../checkpoints/Speaker_Encoder/best_model.pth.tar"
    },
    "use_speaker_embedding": false,     // use speaker embedding to enable multi-speaker learning.
    "use_external_speaker_embedding_file": "frame_length_ms", // if true, forces the model to use external embedding per sample instead of nn.embeddings, that is, it supports external embeddings such as those used at: https://arxiv.org/abs /1806.04558
    "external_speaker_embedding_file": null, // if not null and use_external_speaker_embedding_file is true, it is used to load a specific embedding file and thus uses these embeddings instead of nn.embeddings, that is, it supports external embeddings such as those used at: https://arxiv.org/abs /1806.04558

    "style_wav_for_test": null,          // path to style wav file to be used in TacotronGST inference.
    "use_gst": false,       // TACOTRON ONLY: use global style tokens

    "stopnet": false,
    "separate_stopnet": false,
    
    // DATASETS
    "datasets":   // List of datasets. They all merged and they get different speaker_ids.
        [
            {
                "name": "ljspeech",
                "path": "C:/Users/User/Desktop/Projeto TTS/Datasets/TTS-Portuguese-Corpus_22khz/",
                "meta_file_train": "metadata.txt",                 
                "meta_file_val": null
            }
        ]
}

And then, to start training, I typed into the PowerShell the following:

.\Scripts\python.exe .\TTS\bin\train_tts.py --config_path "C:\Users\User\Desktop\Projeto TTS\GlowTTS-HiFi-GAN-FT-TTS-Portuguese-Corpus\config.json" --restore_path "C:\Users\User\Desktop\Projeto TTS\GlowTTS-HiFi-GAN-FT-TTS-Portuguese-Corpus\best_model.pth.tar"

And below is the output which is followed by an error message which is the problem that I want to understand:

| > Found 3623 files in C:\Users\User\Desktop\Projeto TTS\Datasets\TTS-Portuguese-Corpus_22khz
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.98
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000.0
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Using model: glow_tts
 > Using CUDA:  True
 > Number of GPUs:  1
 > Restoring from best_model.pth.tar ...
 > Restoring Model...
 > Restoring Optimizer...
 > Model restored from step 346034

 > Model has 28610641 parameters
 > Restoring best loss from  ...
 > Starting with loaded last best loss -1.19548761844635.

 > EPOCH: 0/10000
 --> C:/Users/User/Desktop/Projeto TTS/Output/GlowTTS-Trans-Portuguese-December-08-2021_03+10PM-33aa27e2

 > DataLoader initialization
 | > Use phonemes: True
   | > phoneme language: pt-br
 | > Number of instances : 3587
 | > Computing phonemes ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3587/3587 [00:07<00:00, 496.36it/s]
 | > Max length sequence: 823
 | > Min length sequence: 3
 | > Avg length sequence: 238.48369110677447
 | > Num. instances discarded by max-min (max=500, min=2) seq limits: 178
 | > Batch group size: 0.

 > TRAINING (2021-12-08 15:11:07) 

   --> STEP: 16/106 -- GLOBAL_STEP: 346050
     | > loss: -0.39377  (-0.30809)
     | > log_mle: -0.86995  (-0.85265)
     | > loss_dur: 0.47619  (0.54456)
     | > grad_norm: 99.61745  (141.99257)
     | > current_lr: 4.25e-06
     | > step_time: 1.78730  (1.63771)
     | > loader_time: 0.00500  (0.00415)

Traceback (most recent call last):
  File "C:\CoquiTTS\TTS\TTS\trainer.py", line 1007, in fit
    self._fit()
  File "C:\CoquiTTS\TTS\TTS\trainer.py", line 992, in _fit
    self.train_epoch()
  File "C:\CoquiTTS\TTS\TTS\trainer.py", line 820, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "C:\CoquiTTS\TTS\TTS\trainer.py", line 691, in train_step
    batch, self.model, self.optimizer, self.scaler, self.criterion, self.scheduler, self.config
  File "C:\CoquiTTS\TTS\TTS\trainer.py", line 644, in _optimize
    loss_dict["loss"].backward()
  File "C:\CoquiTTS\TTS\lib\site-packages\torch\tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\CoquiTTS\TTS\lib\site-packages\torch\autograd\__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 19, in fallback_function
  return b <= a
def sub(a : int, b : Tensor) -> Tensor:
  return torch.neg(b) + a
         ~~~~~~~~~ <--- HERE
def div(a : int, b : Tensor) -> Tensor:
  return torch.reciprocal(b) * a
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.61 GiB already allocated; 0 bytes free; 2.86 GiB reserved in total by PyTorch)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\TTS\bin\train_tts.py", line 73, in <module>
    main()
  File ".\TTS\bin\train_tts.py", line 69, in main
    trainer.fit()
  File "C:\CoquiTTS\TTS\TTS\trainer.py", line 1026, in fit
    remove_experiment_folder(self.output_path)
  File "C:\CoquiTTS\TTS\TTS\utils\generic_utils.py", line 73, in remove_experiment_folder
    fs.rm(experiment_path, recursive=True)
  File "C:\CoquiTTS\TTS\lib\site-packages\fsspec\implementations\local.py", line 147, in rm
    shutil.rmtree(p)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\shutil.py", line 500, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\shutil.py", line 395, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\shutil.py", line 393, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] O arquivo já está sendo usado por outro processo: 'C:/Users/User/Desktop/Projeto TTS/Output/GlowTTS-Trans-Portuguese-December-08-2021_03+10PM-33aa27e2\\events.out.tfevents.1638987056.DESKTOP-GPO0UCR'

Look that there is a message saying: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.61 GiB already allocated; 0 bytes free; 2.86 GiB reserved in total by PyTorch)

I solved this problem by reducing the batch_size from 32 to 4. But in the description of the config.json it says that lower values than 32 might cause hard to learn attention. I suppose that for achieve good results I need at least 32 of batch_size, right?

And here are my questions:

Is these errors above, are really due to a GPU memory issue?
Is my GPU able to handle training of TTS models? Or do I need to get a better machine with a better GPU to achieve good results in a TTS task?
If my GPU is not able to handle this kind of task, what kind of GPU or machine can handle this? What does the community normally use?
I thought a poor GPU could handle this kind of task, the only difference is that, compared with better GPUs, the time a poor GPU takes to complete the task is higher than a good GPU. Am I correct on this affirmation?

Thank you!

Answered by tekinek

Dec 9, 2021

Is these errors above, are really due to a GPU memory issue?

Yes, the error was due to lack of GPU memory.

s my GPU able to handle training of TTS models? Or do I need to get a better machine with a better GPU to achieve good results in a TTS task?

4GB is really not enough for training TTS models, and it also largely depends on the model you are in interest. The glow-tts is memory hungry. Besides, the sentences in your dataset are long (max 823, avg 238 phones). The longer sentence consumes larger memory. You might be able to train a Tacotron2 by max_seq_len=150 and batch_size=10, but you may lose a lot of your sentences. The smaller batch_size is ok, but it is good to keep it above 10.

View full answer

tekinek · 2021-12-09T18:55:48Z

tekinek
Dec 9, 2021

Is these errors above, are really due to a GPU memory issue?

Yes, the error was due to lack of GPU memory.

s my GPU able to handle training of TTS models? Or do I need to get a better machine with a better GPU to achieve good results in a TTS task?

4GB is really not enough for training TTS models, and it also largely depends on the model you are in interest. The glow-tts is memory hungry. Besides, the sentences in your dataset are long (max 823, avg 238 phones). The longer sentence consumes larger memory. You might be able to train a Tacotron2 by max_seq_len=150 and batch_size=10, but you may lose a lot of your sentences. The smaller batch_size is ok, but it is good to keep it above 10.

If my GPU is not able to handle this kind of task, what kind of GPU or machine can handle this? What does the community normally use?

A GPU with memory size > 8G is enough, 11G is better, and >16GB is preferable. I have trained many of my models on Nvidia RTX1080ti (11GB). Recently I've got a RTX3090 (24GB) and it is about 1.8x faster training and less headache on batch_size. RTX3060ti(16G, upcoming), 2080ti (12G), a5000 (24GB), a6000 (48G) are also good, I think.

I thought a poor GPU could handle this kind of task, the only difference is that, compared with better GPUs, the time a poor GPU takes to complete the task is higher than a good GPU. Am I correct on this affirmation?

you are correct. The memory size is another bottleneck of "poor" GPU for training a large deep learning model (with millions of parameters)

1 reply

ricardohalla Dec 13, 2021
Author

Thank you for answering!

loganhart02 · 2021-12-09T21:10:19Z

loganhart02
Dec 9, 2021

a4000(16gb) is also good for the price compared to other cards carrying the same amount of memory.

…

On Dec 9, 2021, at 1:56 PM, tekinek ***@***.***> wrote: Is these errors above, are really due to a GPU memory issue? Yes, the error was due to lack of GPU memory. s my GPU able to handle training of TTS models? Or do I need to get a better machine with a better GPU to achieve good results in a TTS task? 4GB is really not enough for training TTS models, and it also largely depends on the model you are in interest. The glow-tts is memory hungry. Besides, the sentences in your dataset are long (max 823, avg 238 phones). The longer sentence consumes larger memory. You might be able to train a Tacotron2 by max_seq_len=150 and batch_size=10, but you may lose a lot of your sentences. The smaller batch_size is ok, but it is good to keep it above 10. If my GPU is not able to handle this kind of task, what kind of GPU or machine can handle this? What does the community normally use? A GPU with memory size > 8G is enough, 11G is better, and >16GB is preferable. I have trained many of my models on Nvidia RTX1080ti (11GB). Recently I've got a RTX3090 (24GB) and it is about 1.8x faster training and less headache on batch_size. RTX3060ti(16G, upcoming), 2080ti (12G), a5000 (24GB), a6000 (48G) are also good, I think. I thought a poor GPU could handle this kind of task, the only difference is that, compared with better GPUs, the time a poor GPU takes to complete the task is higher than a good GPU. Am I correct on this affirmation? you are correct. The memory size is another bottleneck of "poor" GPU for training a large deep learning model (with millions of parameters) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

0 replies

tekinek · 2021-12-10T02:50:15Z

tekinek
Dec 10, 2021

@loganhart420 how is your experiece with a4000? looking at the specs and price, it seems a good card for the price and 140W power consumption.

4 replies

loganhart02 Dec 10, 2021

Before I bought it I looked at the lambda labs gpu benchmarks and they say it trains transformers faster than 3080 but is about 10% on tacotron. From my personal experience I'd say that's about right. the a4000 goes slower than my 3090 when training the transformer based tts models but not really by much. For the price and the extra 8 gigs you get compared to the 3080 I'd say it is worth it. you will lose out on some speed having those 8 extra gigs of vram really helps especially if you mostly plan to train from scratch or have some big datasets for fine tuning, plus you can really only get a 2080Ti for the price they are going for and it's a pretty big upgrade to that.

loganhart02 Dec 10, 2021

https://lambdalabs.com/gpu-benchmarks

here is the lambda labs link so you can check it for yourself

loganhart02 Dec 10, 2021

my bad I meant 6 extra gigs of vram, I was thinking of the 3070

tekinek Dec 10, 2021

Thanks for your sharing. I've never tried A series cards. My next buy should toward a4000 (16gb) or a5000 (24gb). My turbo 3090 is very noisy, and with Tacotron I can't say it noticeably faster than my 4 years old 1080ti. It is about 1.8x faster with GAN based vocoders.

jreus · 2021-12-19T23:46:49Z

jreus
Dec 19, 2021

Just an extra comment to the above discussion ... many of the TTS training recipes include the gradual_training hyperparam that lets you change the batch size and r value as the step number changes. If a gradual_training value is provided, it will totally override your batch_size param.

I don't think this was the OP's issue, as I don't see them using gradual_training ~ but it can lead to frustrating memory errors for people new to TTS and coqui who are trying to lower the batch_size to get training to work on lower spec GPUs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA out of memory. #1005

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

RuntimeError: CUDA out of memory. #1005

Replies: 4 comments · 5 replies

ricardohalla Dec 13, 2021 Author

Replies: 4 comments 5 replies

ricardohalla Dec 13, 2021
Author