Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are my val_loss values valid? #69

Closed
steven8274 opened this issue Apr 12, 2023 · 2 comments
Closed

Are my val_loss values valid? #69

steven8274 opened this issue Apr 12, 2023 · 2 comments

Comments

@steven8274
Copy link

Hi Nils, thanks for your great job in Deep Noise Suppression.I met a traning problem that confused me.
I followed the traning steps in 'README.md' to train the DTLN model, but the val_loss values I got after steps are always positive numbers around 45.I found that all the val_loss values people talked about here are always negtive numbers around -16.Anything wrong with me?I set the training set and validation set file path as:

path_to_train_mix = '/home/xxx/DNS-Challenge/training_set/train/noisy'
path_to_train_speech = '/home/xxx/DNS-Challenge/training_set/train/clean'
path_to_val_mix = '/home/xxx/DNS-Challenge/training_set/val/noisy'
path_to_val_speech = '/home/xxx/DNS-Challenge/training_set/val/clean'

My traing logs as:

None
Epoch 1/200
2023-04-12 15:02:54.045877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
02023-04-12 15:03:46.368859: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-04-12 15:05:01.907399: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-04-12 15:10:02.601697: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'

Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
3000/3000 [==============================] - ETA: 0s - loss: 0.0015       
Epoch 00001: val_loss improved from inf to 45.35575, saving model to ./models_DTLN_model/DTLN_model.h5
3000/3000 [==============================] - 1332s 444ms/step - loss: 0.0015 - val_loss: 45.3558 - lr: 0.0010
Epoch 2/200
3000/3000 [==============================] - ETA: 0s - loss: 0.0049   
Epoch 00002: val_loss did not improve from 45.35575
3000/3000 [==============================] - 1333s 444ms/step - loss: 0.0049 - val_loss: 45.4326 - lr: 0.0010
Epoch 3/200
3000/3000 [==============================] - ETA: 0s - loss: 0.0143   
Epoch 00003: val_loss did not improve from 45.35575
3000/3000 [==============================] - 1336s 445ms/step - loss: 0.0143 - val_loss: 45.4434 - lr: 0.0010
Epoch 4/200
3000/3000 [==============================] - ETA: 0s - loss: 0.0434   
Epoch 00004: val_loss improved from 45.35575 to 42.06635, saving model to ./models_DTLN_model/DTLN_model.h5
3000/3000 [==============================] - 1329s 443ms/step - loss: 0.0434 - val_loss: 42.0663 - lr: 0.0010
Epoch 5/200
3000/3000 [==============================] - ETA: 0s - loss: 0.0482   
Epoch 00005: val_loss did not improve from 42.06635
3000/3000 [==============================] - 1332s 444ms/step - loss: 0.0482 - val_loss: 43.5876 - lr: 0.0010
Epoch 6/200
3000/3000 [==============================] - ETA: 0s - loss: 0.0778   
Epoch 00006: val_loss did not improve from 42.06635
3000/3000 [==============================] - 1329s 443ms/step - loss: 0.0778 - val_loss: 46.5396 - lr: 0.0010
Epoch 7/200
3000/3000 [==============================] - ETA: 0s - loss: 0.0847   
Epoch 00007: val_loss did not improve from 42.06635
3000/3000 [==============================] - 1328s 443ms/step - loss: 0.0847 - val_loss: 46.8308 - lr: 0.0010
@steven8274 steven8274 changed the title Is my val_loss values valid? Are my val_loss values valid? Apr 12, 2023
@steven8274
Copy link
Author

2023-04-12 15:10:02.601697: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'

Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.

Does this error make the training course invalid?

@steven8274
Copy link
Author

2023-04-12 15:10:02.601697: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'

Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.

Does this error make the training course invalid?

That's the reason!
I used a RTX 3060Ti GPU,which is not compatible with CUDA 10.1.When I change CUDA version to 11.2, and TensorFlow version to 2.5.0,the val_loss goes to negtive now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant