Spark evaluation NPEs #3970

AlexDBlack · 2017-08-29T10:59:56Z

Under some rare circumstances, Spark evaluation can lead to a NPE here:
https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/multilayer/evaluation/IEvaluationReduceFunction.java#L19-L21

I'm not totally sure on the cause here - I suspect this can occur with empty partitions, or fewer partitions than executors (10 objects on 16 workers causes NPE, but 10 objects on 8 workers doesn't). Either way, a defensive null check in that merge function should help.
Tested with Spark 2.1.0.

Caused by: java.lang.NullPointerException
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:22)
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:16)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction2$1.apply(JavaPairRDD.scala:1037)

The text was updated successfully, but these errors were encountered:

GregaVrbancic · 2017-12-14T09:15:56Z

Hi! I'm experiencing the same issue in evaluation phase using DL4J 0.9.1 with Spark 2.1:

Caused by: java.lang.NullPointerException
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:19)
	at org.deeplearning4j.spark.impl.multilayer.evaluation.IEvaluationReduceFunction.call(IEvaluationReduceFunction.java:13)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction2$1.apply(JavaPairRDD.scala:1037)

Is there any workaround I could try to make it work? I tried running it with different number of workers with no success.

Thanks!

AlexDBlack · 2017-12-14T09:38:28Z

@GregaVrbancic only thing I can suggest: take the code from here

deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/multilayer/SparkDl4jMultiLayer.java

Lines 596 to 600 in 802ea94

    
           public <T extends IEvaluation> T[] doEvaluation(JavaRDD<DataSet> data, int evalBatchSize, T... emptyEvaluations) { 
        
               IEvaluateFlatMapFunction<T> evalFn = new IEvaluateFlatMapFunction<>(false, sc.broadcast(conf.toJson()), 
        
                               sc.broadcast(network.params()), evalBatchSize, emptyEvaluations); 
        
               JavaRDD<T[]> evaluations = data.mapPartitions(evalFn); 
        
               return evaluations.treeAggregate(null, new IEvaluateAggregateFunction<T>(), new IEvaluationReduceFunction<T>());

and adapt it in your project to use the fix from this PR (i.e., the modified IEvaluationReduceFunction) :
#4394

AlexDBlack · 2017-12-15T03:59:03Z

#4394

lock · 2018-09-23T22:26:26Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

AlexDBlack added the Bug Bugs and problems label Aug 29, 2017

huitseeker self-assigned this Aug 31, 2017

AlexDBlack added a commit that referenced this issue Dec 11, 2017

#3970 null check for IEvaluationReduceFunction

2415ced

AlexDBlack mentioned this issue Dec 11, 2017

Multiple issue fixes #4394

Merged

AlexDBlack added a commit that referenced this issue Dec 15, 2017

#3970 null check for IEvaluationReduceFunction

07fdddf

AlexDBlack closed this as completed Dec 15, 2017

treo mentioned this issue Mar 14, 2018

How to handle a very high dimensional dataset (~20K columnns) #4803

Closed

lock bot unassigned huitseeker Sep 23, 2018

lock bot locked and limited conversation to collaborators Sep 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark evaluation NPEs #3970

Spark evaluation NPEs #3970

AlexDBlack commented Aug 29, 2017

GregaVrbancic commented Dec 14, 2017

AlexDBlack commented Dec 14, 2017

AlexDBlack commented Dec 15, 2017

lock bot commented Sep 23, 2018

Spark evaluation NPEs #3970

Spark evaluation NPEs #3970

Comments

AlexDBlack commented Aug 29, 2017

GregaVrbancic commented Dec 14, 2017

AlexDBlack commented Dec 14, 2017

AlexDBlack commented Dec 15, 2017

lock bot commented Sep 23, 2018