Skip to content

fix softmax nan problem#598

Merged
nudles merged 2 commits intoapache:devfrom
XJDKC:SoftMax_Algorithm
Feb 13, 2020
Merged

fix softmax nan problem#598
nudles merged 2 commits intoapache:devfrom
XJDKC:SoftMax_Algorithm

Conversation

@XJDKC
Copy link
Copy Markdown
Member

@XJDKC XJDKC commented Feb 12, 2020

Here I change the algorithm of the cudnnSoftmaxForward function.

The original algorithm was CUDNN_SOFTMAX_FAST which may lead to overflow when the input numbers are too large. Change it to CUDNN_SOFTMAX_ACCURATE will solve this problem.

For example: If cudnn softmax is used in mlp.py, it will lead to this problem.

@chrishkchris
Copy link
Copy Markdown
Contributor

I strongly support this change for two reasons:

  1. The current setting fails some network structure/dataset, which make the functions not universal for all applications.
  2. SoftMax forward spent only very short time within the CNN training, e.g. resnet50 has many conv layers but only one SoftMax layer, so the time spent in SoftMax is not the bottleneck.

@chrishkchris
Copy link
Copy Markdown
Contributor

sorry, could you also change CUDNN_SOFTMAX_FAST to CUDNN_SOFTMAX_ACCURATE in SoftMaxbackward in tensor_math_cuda.cc ? I am not sure if both needed to be matched

@chrishkchris
Copy link
Copy Markdown
Contributor

thanks. I think it is ready for merge

@XJDKC
Copy link
Copy Markdown
Member Author

XJDKC commented Feb 12, 2020

Thank you for your reminding.

@nudles nudles merged commit bc5df6e into apache:dev Feb 13, 2020
@XJDKC XJDKC deleted the SoftMax_Algorithm branch February 13, 2020 17:16
@XJDKC XJDKC restored the SoftMax_Algorithm branch February 13, 2020 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants