You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running training on a set of audio files and am wondering how I should assess how training is going.
After about 24 hours, I'm at about 13,000 epics. I'm not sure how to interpret the tensor board visualizations; any pointers would be very much appreciated.
/content/drive/MyDrive/RAVE_COLLAB
Recursive search in /content/drive/MyDrive/RAVE_COLLAB/resampled/parbass/
audio_00158_00000.wav: 100% 159/159 [00:04<00:00, 33.67it/s]
/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at /content/drive/MyDrive/RAVE_COLLAB/runs/parbass/rave/version_2/checkpoints/last-v1.ckpt
/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:342: UserWarning: The dirpath has changed from 'runs/parbass/rave/version_2/checkpoints' to 'runs/parbass/rave/version_3/checkpoints', therefore `best_model_score`, `kth_best_model_path`, `kth_value`, `last_model_path` and `best_k_models` won't be reloaded. Only `best_model_path` will be reloaded.
warnings.warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
------------------------------------------------------
0 | pqmf | CachedPQMF | 4.2 K
1 | loudness | Loudness | 0
2 | encoder | Encoder | 4.8 M
3 | decoder | Generator | 12.8 M
4 | discriminator | StackDiscriminators | 16.9 M
------------------------------------------------------
34.5 M Trainable params
0 Non-trainable params
34.5 M Total params
138.092 Total estimated model params size (MB)
Restored all states from the checkpoint file at /content/drive/MyDrive/RAVE_COLLAB/runs/parbass/rave/version_2/checkpoints/last-v1.ckpt
/content/miniconda/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py:1933: PossibleUserWarning: The number of training batches (19) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Epoch 11571: 0% 0/20 [00:00<00:00, -106397.56it/s]/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Epoch 12623: 95% 19/20 [00:04<?, ?it/s, v_num=3]
Validation: 0it [00:00, ?it/s]
Validation: 0% 0/1 [00:00<?, ?it/s]
Validation DataLoader 0: 0% 0/1 [00:00<?, ?it/s]
Epoch 12623: 100% 20/20 [00:04<00:00, 4.59s/it, v_num=3]
Epoch 12624: 0% 0/19 [00:00<00:00, -111926.65it/s, v_num=3]/content/miniconda/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Epoch 12706: 63% 12/19 [00:03<-1:59:57, -2.16it/s, v_num=3]
The text was updated successfully, but these errors were encountered:
In my experience, i've noticed that it goes by the number of steps. At least as of the last time I trained which was several months ago. The default was 6000000 steps which can be changed by setting the proper flag on the 'rave train' cmd
I've been running training on a set of audio files and am wondering how I should assess how training is going.
After about 24 hours, I'm at about 13,000 epics. I'm not sure how to interpret the tensor board visualizations; any pointers would be very much appreciated.
The text was updated successfully, but these errors were encountered: