Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20466][CORE] HadoopRDD#addLocalConfiguration throws NPE #19413

Closed
wants to merge 1 commit into from

Conversation

sahilTakiar
Copy link

What changes were proposed in this pull request?

Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, HadoopRDD uses a metadata cache to cache JobConf objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. HadoopRDD#getJobConf had a bug where it would check if the cache contained the JobConf, if it did it would get the JobConf from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call.

How was this patch tested?

Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify #getJobConf is doing the right thing. I deleted the method HadoopRDD#containsCachedMetadata so that we don't hit this issue again.

@vanzin
Copy link
Contributor

vanzin commented Oct 2, 2017

ok to test

@SparkQA
Copy link

SparkQA commented Oct 2, 2017

Test build #82397 has finished for PR 19413 at commit 680f32c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sahilTakiar
Copy link
Author

Fixed the style check.

@SparkQA
Copy link

SparkQA commented Oct 3, 2017

Test build #82399 has finished for PR 19413 at commit fd1df3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good, only a minor issue. Since the repro involves GC, maybe there is no straight forward way to construct a test case here. Also cc @cloud-fan

initLocalJobConfFuncOpt.foreach(f => f(newJobConf))
HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf)
newJobConf
val newJobConf = HadoopRDD.getCachedMetadata(jobConfCacheKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using two val with the same name here, normally we don't recommend that this way. How about follow vanzin's suggestion in the JIRA issue, like this:

Option(HadoopRDD.getCachedMetadata(jobConfCacheKey)).getOrElse(
     HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
          ......
})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, but then I would have to remove the logDebug("Re-using cached JobConf") statement. Unless there is some way to get around that in Scala.

Otherwise, I can remove the val name to cachedJobConf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do that without the val:

Option(HadoopRDD.getCachedMetadata(jobConfCacheKey))
  .map { conf =>
     logDebug(...)
     conf.asInstanceOf[...]
  }
  .getOrElse {
     // code in the "else" block
  }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. Updated the patch.

@SparkQA
Copy link

SparkQA commented Oct 3, 2017

Test build #82429 has finished for PR 19413 at commit 079a4e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 3, 2017

LGTM. Merging to master and back to 2.0 unless I hit a conflict.

asfgit pushed a commit that referenced this pull request Oct 3, 2017
## What changes were proposed in this pull request?

Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, `HadoopRDD` uses a metadata cache to cache `JobConf` objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. `HadoopRDD#getJobConf` had a bug where it would check if the cache contained the `JobConf`, if it did it would get the `JobConf` from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call.

## How was this patch tested?

Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify `#getJobConf` is doing the right thing. I deleted the method `HadoopRDD#containsCachedMetadata` so that we don't hit this issue again.

Author: Sahil Takiar <stakiar@cloudera.com>

Closes #19413 from sahilTakiar/master.

(cherry picked from commit e36ec38)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
asfgit pushed a commit that referenced this pull request Oct 3, 2017
## What changes were proposed in this pull request?

Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, `HadoopRDD` uses a metadata cache to cache `JobConf` objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. `HadoopRDD#getJobConf` had a bug where it would check if the cache contained the `JobConf`, if it did it would get the `JobConf` from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call.

## How was this patch tested?

Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify `#getJobConf` is doing the right thing. I deleted the method `HadoopRDD#containsCachedMetadata` so that we don't hit this issue again.

Author: Sahil Takiar <stakiar@cloudera.com>

Closes #19413 from sahilTakiar/master.

(cherry picked from commit e36ec38)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin
Copy link
Contributor

vanzin commented Oct 3, 2017

(FYI, didn't merge to 2.0.)

@asfgit asfgit closed this in e36ec38 Oct 3, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
## What changes were proposed in this pull request?

Fix for SPARK-20466, full description of the issue in the JIRA. To summarize, `HadoopRDD` uses a metadata cache to cache `JobConf` objects. The cache uses soft-references, which means the JVM can delete entries from the cache whenever there is GC pressure. `HadoopRDD#getJobConf` had a bug where it would check if the cache contained the `JobConf`, if it did it would get the `JobConf` from the cache and return it. This doesn't work when soft-references are used as the JVM can delete the entry between the existence check and the get call.

## How was this patch tested?

Haven't thought of a good way to test this yet given the issue only occurs sometimes, and happens during high GC pressure. Was thinking of using mocks to verify `#getJobConf` is doing the right thing. I deleted the method `HadoopRDD#containsCachedMetadata` so that we don't hit this issue again.

Author: Sahil Takiar <stakiar@cloudera.com>

Closes apache#19413 from sahilTakiar/master.

(cherry picked from commit e36ec38)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants