-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JobHelper.ensurePaths will set job properties from config (tuningConf… #1484
Conversation
import static org.junit.Assert.assertEquals; | ||
|
||
/** | ||
* Created by michael.schiff on 6/26/15. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntelliJ does this by default but it is standard practice in Druid to remove this.
Thanks for the contrib @michaelschiff ! Would you mind putting a comment above addInputPaths indicating that addJobProperties should be called first? Since the general workflow of HadoopDruidIndexerConfig is to only operate on a limited set of data per method, I think it is a reasonable onus on the caller to ensure proper configuration (as per this PR), and such configuration cannot reliably be guessed if it has or has not been performed (unless we mandate configuration be done through the HadoopDruidIndexerConfig class). After minor concerns related to comments, I'm good here. |
330b66e
to
58cb4e4
Compare
import java.util.HashMap; | ||
import java.util.Map; | ||
|
||
import static org.junit.Assert.assertEquals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not using static import will be more consistent with the druid codebase.
👍 |
} | ||
|
||
@Before | ||
public void setup() throws Exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in druid code, generally, setup methods are kept before the test methods.
58cb4e4
to
feb36ff
Compare
…ig.jobProperties) before adding input paths to the config. Adding input paths will create Path and FileSystem instances which may depend on the values in the job config. This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files. IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set) JobHelperTest confirms that JobHelper.ensurePaths adds job properties javadoc for addInputPaths to explain relationship with addJobProperties
feb36ff
to
6ad451a
Compare
@@ -316,6 +316,14 @@ public HadoopyShardSpec getShardSpec(Bucket bucket) | |||
return schema.getTuningConfig().getShardSpecs().get(bucket.time).get(bucket.partitionNum); | |||
} | |||
|
|||
/** | |||
* Job instance should have Configuration set (by calling {@link #addJobProperties(Job)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you think of a way we can programmatically check for that, so that we can throw an ISE in case the configuration is not set before calling addInputPaths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my concerns is that "Use defaults" could be a valid case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my main concern regarding forcing addJobProperties. It does not seem incorrect to addInputPaths without adding job properties (e.g. "Use defaults" case).
If we were sure that any jobProperties configured in the spec should have highest precedence, addInputPaths could set the the job properties directly... Then the "Use Defaults" case is handled by not overriding anything in the Spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything else you would like to see before this is ready for merge?
@drcrallen @xvrl any more concerns? |
@fjy no particular concerns. Maybe we can remove the temporary folder in the tests to limit I/O during tests runs, they don't seem to be used other than for their paths. But I don't want to hold up the PR on this, so feel free to merge. We can clean up IO in a different PR. |
JobHelper.ensurePaths will set job properties from config (tuningConf…
Fix for #1463
JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config.
Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.