Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark evaluation NPEs #3970

Closed
AlexDBlack opened this issue Aug 29, 2017 · 4 comments
Closed

Spark evaluation NPEs #3970

AlexDBlack opened this issue Aug 29, 2017 · 4 comments
Labels
Bug Bugs and problems

Comments

@AlexDBlack
Copy link
Contributor

Under some rare circumstances, Spark evaluation can lead to a NPE here:
https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/multilayer/evaluation/IEvaluationReduceFunction.java#L19-L21

I'm not totally sure on the cause here - I suspect this can occur with empty partitions, or fewer partitions than executors (10 objects on 16 workers causes NPE, but 10 objects on 8 workers doesn't). Either way, a defensive null check in that merge function should help.
Tested with Spark 2.1.0.

Caused by: java.lang.NullPointerException
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:22)
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:16)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction2$1.apply(JavaPairRDD.scala:1037)
@AlexDBlack AlexDBlack added the Bug Bugs and problems label Aug 29, 2017
@huitseeker huitseeker self-assigned this Aug 31, 2017
@GregaVrbancic
Copy link

Hi! I'm experiencing the same issue in evaluation phase using DL4J 0.9.1 with Spark 2.1:

Caused by: java.lang.NullPointerException
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:19)
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:13)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction2$1.apply(JavaPairRDD.scala:1037)

Is there any workaround I could try to make it work? I tried running it with different number of workers with no success.

Thanks!

@AlexDBlack
Copy link
Contributor Author

@GregaVrbancic only thing I can suggest: take the code from here

public <T extends IEvaluation> T[] doEvaluation(JavaRDD<DataSet> data, int evalBatchSize, T... emptyEvaluations) {
IEvaluateFlatMapFunction<T> evalFn = new IEvaluateFlatMapFunction<>(false, sc.broadcast(conf.toJson()),
sc.broadcast(network.params()), evalBatchSize, emptyEvaluations);
JavaRDD<T[]> evaluations = data.mapPartitions(evalFn);
return evaluations.treeAggregate(null, new IEvaluateAggregateFunction<T>(), new IEvaluationReduceFunction<T>());

and adapt it in your project to use the fix from this PR (i.e., the modified IEvaluationReduceFunction) :
#4394

@AlexDBlack
Copy link
Contributor Author

#4394

@lock
Copy link

lock bot commented Sep 23, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot unassigned huitseeker Sep 23, 2018
@lock lock bot locked and limited conversation to collaborators Sep 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Bugs and problems
Projects
None yet
Development

No branches or pull requests

3 participants