New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA + UI + linux: incorrect scores reported #3026

Closed
AlexDBlack opened this Issue Mar 13, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@AlexDBlack
Copy link
Member

AlexDBlack commented Mar 13, 2017

Running the UI example below with CUDA results in the wrong score being reported.
https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/userInterface/UIExample.java

(Tested with 0.8.0 RC)

The network actually learns correctly, and gradient calculations etc seem fine.
Note that this isn't just a UI issue - the score reported by the score iteration listener is also incorrect, when the UI is added.
Remove the UI and the reported scores are fine.
UI + native is fine. CUDA + UI + windows is fine.

UI example with UI:
o.d.o.l.ScoreIterationListener - Score at iteration 0 is 2.3099706480672184
o.d.o.l.ScoreIterationListener - Score at iteration 1 is 57.59049713113546
o.d.o.l.ScoreIterationListener - Score at iteration 2 is 43.5893727864821
o.d.o.l.ScoreIterationListener - Score at iteration 3 is 57.147427348591776
o.d.o.l.ScoreIterationListener - Score at iteration 4 is 2.301530882267607
o.d.o.l.ScoreIterationListener - Score at iteration 5 is 45.450468038118764
o.d.o.l.ScoreIterationListener - Score at iteration 6 is 2.273451054067395
o.d.o.l.ScoreIterationListener - Score at iteration 7 is 6.671467979763633
o.d.o.l.ScoreIterationListener - Score at iteration 8 is 35.249274251986726
o.d.o.l.ScoreIterationListener - Score at iteration 9 is 23.989292983265166
o.d.o.l.ScoreIterationListener - Score at iteration 10 is 2.2144895093555252
o.d.o.l.ScoreIterationListener - Score at iteration 11 is 10.462500661294342
o.d.o.l.ScoreIterationListener - Score at iteration 12 is 33.02288713427966
o.d.o.l.ScoreIterationListener - Score at iteration 13 is 51.283211913055055
o.d.o.l.ScoreIterationListener - Score at iteration 14 is 2.1723118761399474
o.d.o.l.ScoreIterationListener - Score at iteration 15 is 2.1423628905991228
o.d.o.l.ScoreIterationListener - Score at iteration 16 is 80.28157114939405

UI example without UI:
o.d.o.l.ScoreIterationListener - Score at iteration 0 is 2.3099706480672184
o.d.o.l.ScoreIterationListener - Score at iteration 1 is 2.32872985003214
o.d.o.l.ScoreIterationListener - Score at iteration 2 is 2.3161250741083754
o.d.o.l.ScoreIterationListener - Score at iteration 3 is 2.300891257900553
o.d.o.l.ScoreIterationListener - Score at iteration 4 is 2.301530882267607
o.d.o.l.ScoreIterationListener - Score at iteration 5 is 2.274193996033132
o.d.o.l.ScoreIterationListener - Score at iteration 6 is 2.273451054067395
o.d.o.l.ScoreIterationListener - Score at iteration 7 is 2.2462024168840666
o.d.o.l.ScoreIterationListener - Score at iteration 8 is 2.2303728638152496
o.d.o.l.ScoreIterationListener - Score at iteration 9 is 2.2329697559935116
o.d.o.l.ScoreIterationListener - Score at iteration 10 is 2.2144895093555252
o.d.o.l.ScoreIterationListener - Score at iteration 11 is 2.1957475090079637
o.d.o.l.ScoreIterationListener - Score at iteration 12 is 2.1831812955429855
o.d.o.l.ScoreIterationListener - Score at iteration 13 is 2.1686362385216014
o.d.o.l.ScoreIterationListener - Score at iteration 14 is 2.1723118761399474
o.d.o.l.ScoreIterationListener - Score at iteration 15 is 2.1423628905991228
o.d.o.l.ScoreIterationListener - Score at iteration 16 is 2.1302781722290822

screenshot from 2017-03-13 15-44-56

@AlexDBlack AlexDBlack added the Bug label Mar 13, 2017

@crockpotveggies

This comment has been minimized.

Copy link
Collaborator

crockpotveggies commented Mar 28, 2017

Is cuDNN a factor in this issue? or is it just nd4j-cuda?

@AlexDBlack

This comment has been minimized.

Copy link
Member

AlexDBlack commented Mar 28, 2017

CuDNN wasn't used (as far as I recall) when I collected/posted those results.

@AlexDBlack

This comment has been minimized.

Copy link
Member

AlexDBlack commented Apr 26, 2017

Update: this seems to be fixed on current master - I'm unable to reproduce it as per the original issue.

(I suspected I was running into it again on another model, but this doesn't appear to be the case).

@AlexDBlack AlexDBlack closed this Apr 26, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Sep 29, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Sep 29, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.