-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hadoop index task stopped working with 0.10.1-rc1 #4536
Comments
Anyone have any idea what might cause this? I can try all kinds of things to fix it but I'm kinda lost right now. |
@erikdubbelboer after #4116 user can now propagate selected properties to hadoop workers. |
@b-slim It seems like this is broken. When I set
So it doesn't seem to propagate to the actual job. |
@b-slim Is it because these methods pass |
This has fixed the issue for me: diff --git a/indexing-hadoop/src/main/java/io/druid/indexer/HadoopTuningConfig.java b/indexing-hadoop/src/main/java/io/druid/indexer/HadoopTuningConfig.java
index e64c0e7..b131a5e 100644
--- a/indexing-hadoop/src/main/java/io/druid/indexer/HadoopTuningConfig.java
+++ b/indexing-hadoop/src/main/java/io/druid/indexer/HadoopTuningConfig.java
@@ -267,7 +267,7 @@ public class HadoopTuningConfig implements TuningConfig
numBackgroundPersistThreads,
forceExtendableShardSpecs,
useExplicitVersion,
- null
+ allowedHadoopPrefix
);
}
@@ -292,7 +292,7 @@ public class HadoopTuningConfig implements TuningConfig
numBackgroundPersistThreads,
forceExtendableShardSpecs,
useExplicitVersion,
- null
+ allowedHadoopPrefix
);
}
@@ -317,7 +317,7 @@ public class HadoopTuningConfig implements TuningConfig
numBackgroundPersistThreads,
forceExtendableShardSpecs,
useExplicitVersion,
- null
+ allowedHadoopPrefix
);
}
|
You want me to make a pull request? |
yeah sure will review that, @erikdubbelboer thanks. |
How were the properties passed down before? I thought that pre 0.10.1, properties were generally not passed down to the hadoop cluster. So it seems weird that now they'd need to be. Or is the difference that now in 0.10.1, the druid.storage.type property is passed down, which means now druid.google. needs to be as well if you're using the google extension? If that's the case, I think we should either add druid.google. to the default list, or give the deep storage extensions some way to say what properties they need passed down; otherwise upgrading will be too much of a pain for people using the extension. |
@gianm i think you are right, now that we push down the storage properties, guice tries to bind things up, hence the issues. Am not sure what is the best fix but seems like we need to push all the properties of the selected storage module at least. Any way i will send a first PR to fix the null issue (guess @erikdubbelboer is in a different timezone) and then will focus on how to solve this. @gianm should this be part of 0.10.1 ? |
@b-slim I think we should at least have something in the release notes explaining the situation. Pushing down the storage properties would be even better. Perhaps we could do this by adding a |
i don't want to block the RC for this issue thought, i will get to work on it maybe next week. |
I'm trying to post at the druid-development list but my messages keep on being automatically deleted (even without any links). I don't think this only affects the Google Storage extension. I think it affects all storage extensions as all of them use different prefixes for their properties and all of them inject their config into their SegmentPushers? |
Most of them put their properties under
They got stuck in the list's spam filter for some reason. I marked them as not-spam and whitelisted your email address. |
Looks like the old messages are deleted though, so you might have to re-post them. |
I guess if you want to store your segments in S3 you also need druid.s3: https://github.com/druid-io/druid/blob/60cdf946778d0624db6ddce0366e5fdb131437b2/extensions-core/s3-extensions/src/main/java/io/druid/storage/s3/S3StorageDruidModule.java#L77 |
I think S3 should be ok, since Hadoop's S3 FileSystem is used for connection from hadoop -> s3, not Druid's S3 client. So even though Druid's S3 client won't be properly instantiated, it wouldn't matter. Has anyone tried this with 0.10.1-rc1 to confirm? I think we have but I'm not totally sure. |
I just ran a batch ingestion task with 0.10.1-rc1 with EMR, using S3 as deep storage, the task worked |
@erikdubbelboer could you try #4562 to see if that works for you, without having to specify allowedHadoopPrefix in the task? |
@gianm I looks like #4562 can't be applied to the 0.10.1 branch we are running. I tried upgrading only our middlemanager node to master but this just resulted in lots of |
@erikdubbelboer please try #4567 which is a backport of that to 0.10.1. Although, exceptions when middleManagers are upgraded and the rest of the cluster is not, are concerning since we do want to support upgrading them first as part of a rolling update. Could you please paste the full error message & stack traces you got when you saw those errors? Might be something we need to fix. |
@gianm with #4567 everything works fine. There is only one exception that repeats a couple of times at the start but it doesn't seem to affect anything:
The exception I'm seeing over and over again when upgrading only the middlemanager to master is:
|
The JNDI exception at startup can be ignored. For the second exception, based on the line numbers in ServerDiscoverySelector, it looks like you've pulled in some post 0.10.1 changes (i.e., in master but not in 0.10.1-rc1 or rc2), specifically this PR that adds TLS support: #4270 I'm guessing what's happening is that the service announcement changes that set the TLS-enabled port property are missing because only the middleManager got the upgrade, but this newer code running on the middleManager expects that property to be non-null and gets an NPE here. Can you try basing your custom branch on 0.10.1-rc2? |
Is that what happens when #4270 is rolled out to MMs but not overlords? If so, that makes it impossible to do a proper rolling update, and we should address it before 0.11.0. |
With the previous version of Druid everything worked fine. But after the upgrade to 0.10.1-rc1 we are getting:
That seems to be that somehow the druid.google.* aren't propagated to the hadoop task like they were before?
The text was updated successfully, but these errors were encountered: