New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark evaluation NPEs #3970
Comments
Hi! I'm experiencing the same issue in evaluation phase using DL4J 0.9.1 with Spark 2.1:
Is there any workaround I could try to make it work? I tried running it with different number of workers with no success. Thanks! |
@GregaVrbancic only thing I can suggest: take the code from here Lines 596 to 600 in 802ea94
and adapt it in your project to use the fix from this PR (i.e., the modified IEvaluationReduceFunction) : |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Under some rare circumstances, Spark evaluation can lead to a NPE here:
https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/multilayer/evaluation/IEvaluationReduceFunction.java#L19-L21
I'm not totally sure on the cause here - I suspect this can occur with empty partitions, or fewer partitions than executors (10 objects on 16 workers causes NPE, but 10 objects on 8 workers doesn't). Either way, a defensive null check in that merge function should help.
Tested with Spark 2.1.0.
The text was updated successfully, but these errors were encountered: