Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy init of fullyQualifiedStorageDirectory in HDFS pusher #5684

Merged
merged 4 commits into from
Apr 29, 2018

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Apr 24, 2018

Some users have encountered issues when running Druid 0.12.0 against a Hadoop cluster with namenode HA enabled, where the mapper encounters an UnknownHostException when it tries to interpret the logical nameservice string as a hostname (e.g.: #5552). The users encountering issue also report that their hadoop ingestion was working in Druid 0.10.0 with the same hadoop configs.

In a local test environment, I was able to run a hadoop ingestion task successfully using namenode HA (#5552 (comment)), so this issue might be a misconfiguration in some cases.

However, in another case, we observed this issue in a customer environment where the Hadoop .xml files in the Druid classpath were correctly configured for namenode HA, but the mapper failed to pick up the configurations. The root cause was unclear.

Given that the mapper does not actually need the DataSegmentPusher, this PR is a workaround that lazily evaluates the the fullyQualifiedStorageDirectory variable in HdfsDataSegmentPusher, to unblock users who are encountering this issue, until a more complete understanding of the issue is reached.

I suspect that these users did not encounter issues in 0.10.0 because in that version HadoopDruidIndexerConfig did not have the DataSegmentPusher as a field (see #4116).

@b-slim
Copy link
Contributor

b-slim commented Apr 24, 2018

@jon-wei LGTM but i don't this this is a bug, i bet it is a config issue.
Also am not really sure how the Reducer will be able to pick the right configs if my memory is correct the classpath is the same.

@@ -230,4 +231,19 @@ public String makeIndexPathName(DataSegment dataSegment, String indexName)
indexName
);
}

private void initFullyQualifiedStorageDirectory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add a comment about why this should be lazily initialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, forgot to push that change, added a comment

@jon-wei jon-wei removed the Bug label Apr 24, 2018
@jon-wei
Copy link
Contributor Author

jon-wei commented Apr 24, 2018

@b-slim Thanks for the review, I removed the bug label for now

@@ -53,7 +53,8 @@

private final Configuration hadoopConfig;
private final ObjectMapper jsonMapper;
private final String fullyQualifiedStorageDirectory;
private final Path storageDir;
private String fullyQualifiedStorageDirectory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be volatile since the object is shared amongst multiple threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the memoize approach

private void initFullyQualifiedStorageDirectory()
{
try {
if (fullyQualifiedStorageDirectory == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a conscious decision to not synchronize this, say why in a comment. Or if it wasn't a conscious decision, consider synchronizing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the memoize approach

@@ -230,4 +231,22 @@ public String makeIndexPathName(DataSegment dataSegment, String indexName)
indexName
);
}


// We lazily initialiize fullQualifiedStorageDirectory to avoid potential issues with Hadoop namenode HA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialize (spelling)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@b-slim b-slim merged commit 513fab7 into apache:master Apr 29, 2018
sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018
* Lazy init of fullyQualifiedStorageDirectory in HDFS pusher

* Comment

* Fix test

* PR comments
@dclim dclim added this to the 0.13.0 milestone Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants