Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hotfix: cpp softmax not impl #572

Merged
merged 3 commits into from
Jan 18, 2020
Merged

hotfix: cpp softmax not impl #572

merged 3 commits into from
Jan 18, 2020

Conversation

dcslin
Copy link
Member

@dcslin dcslin commented Jan 13, 2020

moved softmax(..., axis) implementation one level up, from cuda to tensor. After the change, softmax(..., axis) implemenation could serve for both cuda and cpp. Tested is added accordingly.

@dcslin
Copy link
Member Author

dcslin commented Jan 13, 2020

Hi @chrishkchris , thank you for pointing this out. could you please help on testing at your side?

@chrishkchris
Copy link
Contributor

thanks. I will test soon

@chrishkchris
Copy link
Contributor

chrishkchris commented Jan 13, 2020

@dcslin I tested the examples/autograd/mlp.py (multilayer perception) and there is such error:

ubuntu@ip-172-31-26-47:~/singa/examples/autograd$ python3 mlp.py
train_data_shape: (400, 2)
train_label_shape: (400, 2)
training loss =  6.682101
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0113 09:20:05.180658 12032 tensor_math_cpp.h:357] Check failed: a > 0.f (-nan vs. 0)
*** Check failure stack trace: ***
Aborted (core dumped)

However if I use softmax_cross_entropy instead of softmax + cross_entropy it will be totally okay, i.e.
loss = autograd.softmax_cross_entropy(x, target)
(note: softmax_cross_entropy does not use axis)
The good output will be:

ubuntu@ip-172-31-26-47:~/singa/examples/autograd$ python3 mlp.py
train_data_shape: (400, 2)
train_label_shape: (400, 2)
training loss =  0.6908062
training loss =  0.5960194
training loss =  0.57797414
training loss =  0.55334115
training loss =  0.48568404
training loss =  0.38458923
training loss =  0.30776194
training loss =  0.24188559
training loss =  0.18657134
training loss =  0.15864176
training loss =  0.13929243

@chrishkchris
Copy link
Contributor

chrishkchris commented Jan 13, 2020

@dcslin Another observation: in mnist_cnn.py, if I change softmax_cross_entropy to softmax + cross_entropy, it fails no matter what the axis is selected for SoftMax

ubuntu@ip-172-31-26-47:~/singa/examples/autograd$ python3 mnist_cnn.py
Starting Epoch 0:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0113 09:57:47.470569 14016 tensor_math_cpp.h:357] Check failed: a > 0.f (0 vs. 0)
*** Check failure stack trace: ***
Aborted (core dumped)

@chrishkchris
Copy link
Contributor

@dcslin
I used a simple test code to compare with pytorch when the input vector is of the size [1, 5].
The output is the same as pytorch. I suppose the difference maybe from cross_entropy.

  1. singa
    ubuntu@ip-172-31-26-47:~$ python3 softmaxtest.py
    x = [1. 3. 5. 7. 9.]
    dy = [1. 3. 5. 7. 9.]
    y = [[2.9007587e-04 2.1433870e-03 1.5837606e-02 1.1702495e-01 8.6470395e-01]]
    dx = [[-0.00222993 -0.01219034 -0.05839987 -0.19747007 0.27029037]]

  2. pytorch
    (pytorch_p36) ubuntu@ip-172-31-16-63:~/pytorch_ops_test$ python softmaxtest.py
    x = [[1. 3. 5. 7. 9.]]
    dy = [[1. 3. 5. 7. 9.]]
    y = [2.9007587e-04 2.1433870e-03 1.5837606e-02 1.1702495e-01 8.6470395e-01]
    dx = [-0.00222993 -0.01219034 -0.05839989 -0.19747011 0.27029008]

@chrishkchris
Copy link
Contributor

To sum up, this PR fixed the cpp SoftMax not implemented error. After checking the result with pytorch, it is okay.
So this fix is ready for merge.
Some future work maybe to check cross_entropy

@dcslin
Copy link
Member Author

dcslin commented Jan 14, 2020

Hi @chrishkchris , thank you the fix on mlp.py

@nudles nudles merged commit 7e72548 into apache:master Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants