Lazy init of fullyQualifiedStorageDirectory in HDFS pusher #5684

jon-wei · 2018-04-24T02:10:33Z

Some users have encountered issues when running Druid 0.12.0 against a Hadoop cluster with namenode HA enabled, where the mapper encounters an UnknownHostException when it tries to interpret the logical nameservice string as a hostname (e.g.: #5552). The users encountering issue also report that their hadoop ingestion was working in Druid 0.10.0 with the same hadoop configs.

In a local test environment, I was able to run a hadoop ingestion task successfully using namenode HA (#5552 (comment)), so this issue might be a misconfiguration in some cases.

However, in another case, we observed this issue in a customer environment where the Hadoop .xml files in the Druid classpath were correctly configured for namenode HA, but the mapper failed to pick up the configurations. The root cause was unclear.

Given that the mapper does not actually need the DataSegmentPusher, this PR is a workaround that lazily evaluates the the fullyQualifiedStorageDirectory variable in HdfsDataSegmentPusher, to unblock users who are encountering this issue, until a more complete understanding of the issue is reached.

I suspect that these users did not encounter issues in 0.10.0 because in that version HadoopDruidIndexerConfig did not have the DataSegmentPusher as a field (see #4116).

b-slim · 2018-04-24T02:34:30Z

@jon-wei LGTM but i don't this this is a bug, i bet it is a config issue.
Also am not really sure how the Reducer will be able to pick the right configs if my memory is correct the classpath is the same.

jihoonson · 2018-04-24T03:12:58Z

extensions-core/hdfs-storage/src/main/java/io/druid/storage/hdfs/HdfsDataSegmentPusher.java

@@ -230,4 +231,19 @@ public String makeIndexPathName(DataSegment dataSegment, String indexName)
        indexName
    );
  }
+
+  private void initFullyQualifiedStorageDirectory()


Would you please add a comment about why this should be lazily initialized?

Whoops, forgot to push that change, added a comment

jon-wei · 2018-04-24T04:07:13Z

@b-slim Thanks for the review, I removed the bug label for now

gianm · 2018-04-24T21:29:51Z

extensions-core/hdfs-storage/src/main/java/io/druid/storage/hdfs/HdfsDataSegmentPusher.java

@@ -53,7 +53,8 @@

  private final Configuration hadoopConfig;
  private final ObjectMapper jsonMapper;
-  private final String fullyQualifiedStorageDirectory;
+  private final Path storageDir;
+  private String fullyQualifiedStorageDirectory;


This should be volatile since the object is shared amongst multiple threads.

Went with the memoize approach

gianm · 2018-04-24T21:30:59Z

extensions-core/hdfs-storage/src/main/java/io/druid/storage/hdfs/HdfsDataSegmentPusher.java

+  private void initFullyQualifiedStorageDirectory()
+  {
+    try {
+      if (fullyQualifiedStorageDirectory == null) {


If it was a conscious decision to not synchronize this, say why in a comment. Or if it wasn't a conscious decision, consider synchronizing it.

Maybe https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Suppliers.html#memoize(com.google.common.base.Supplier)? Might be cleaner than calling init in places too

Went with the memoize approach

gianm · 2018-04-24T21:31:09Z

extensions-core/hdfs-storage/src/main/java/io/druid/storage/hdfs/HdfsDataSegmentPusher.java

@@ -230,4 +231,22 @@ public String makeIndexPathName(DataSegment dataSegment, String indexName)
        indexName
    );
  }
+
+
+  // We lazily initialiize fullQualifiedStorageDirectory to avoid potential issues with Hadoop namenode HA.


initialize (spelling)

* Lazy init of fullyQualifiedStorageDirectory in HDFS pusher * Comment * Fix test * PR comments

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher

9cc9a81

jon-wei added Bug Area - Batch Ingestion Compatibility labels Apr 24, 2018

jihoonson reviewed Apr 24, 2018

View reviewed changes

Comment

ba4443c

jon-wei removed the Bug label Apr 24, 2018

Fix test

31c6447

jihoonson approved these changes Apr 24, 2018

View reviewed changes

gianm reviewed Apr 24, 2018

View reviewed changes

PR comments

52867bd

clintropolis approved these changes Apr 26, 2018

View reviewed changes

b-slim merged commit 513fab7 into apache:master Apr 29, 2018

sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher (apache#5684)

ee94061

* Lazy init of fullyQualifiedStorageDirectory in HDFS pusher * Comment * Fix test * PR comments

dclim added this to the 0.13.0 milestone Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher #5684

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher #5684

jon-wei commented Apr 24, 2018

b-slim commented Apr 24, 2018

jihoonson Apr 24, 2018

jon-wei Apr 24, 2018

jon-wei commented Apr 24, 2018

gianm Apr 24, 2018

jon-wei Apr 25, 2018

gianm Apr 24, 2018

clintropolis Apr 24, 2018

jon-wei Apr 25, 2018

gianm Apr 24, 2018

jon-wei Apr 25, 2018

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher #5684

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher #5684

Conversation

jon-wei commented Apr 24, 2018

b-slim commented Apr 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei commented Apr 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment