Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobHelper.ensurePaths will set job properties from config (tuningConf… #1484

Merged
merged 1 commit into from
Jul 7, 2015

Conversation

michaelschiff
Copy link
Contributor

Fix for #1463

JobHelper.ensurePaths will set job properties from config (tuningConfig.jobProperties) before adding input paths to the config.
Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.

IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set)

JobHelperTest confirms that JobHelper.ensurePaths adds job properties

import static org.junit.Assert.assertEquals;

/**
* Created by michael.schiff on 6/26/15.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntelliJ does this by default but it is standard practice in Druid to remove this.

@drcrallen
Copy link
Contributor

Thanks for the contrib @michaelschiff !

Would you mind putting a comment above addInputPaths indicating that addJobProperties should be called first?

Since the general workflow of HadoopDruidIndexerConfig is to only operate on a limited set of data per method, I think it is a reasonable onus on the caller to ensure proper configuration (as per this PR), and such configuration cannot reliably be guessed if it has or has not been performed (unless we mandate configuration be done through the HadoopDruidIndexerConfig class).

After minor concerns related to comments, I'm good here.

import java.util.HashMap;
import java.util.Map;

import static org.junit.Assert.assertEquals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not using static import will be more consistent with the druid codebase.

@drcrallen
Copy link
Contributor

👍

}

@Before
public void setup() throws Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in druid code, generally, setup methods are kept before the test methods.

…ig.jobProperties) before adding input paths to the config.

Adding input paths will create Path and FileSystem instances which may depend on the values in the job config.
This allows all properties to be set from the spec file, avoiding having to directly edit cluster xml files.

IndexGeneratorJob.run adds job properties before adding input paths (adding input paths may depend on having job properies set)

JobHelperTest confirms that JobHelper.ensurePaths adds job properties

javadoc for addInputPaths to explain relationship with
addJobProperties
@@ -316,6 +316,14 @@ public HadoopyShardSpec getShardSpec(Bucket bucket)
return schema.getTuningConfig().getShardSpecs().get(bucket.time).get(bucket.partitionNum);
}

/**
* Job instance should have Configuration set (by calling {@link #addJobProperties(Job)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you think of a way we can programmatically check for that, so that we can throw an ISE in case the configuration is not set before calling addInputPaths?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my concerns is that "Use defaults" could be a valid case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my main concern regarding forcing addJobProperties. It does not seem incorrect to addInputPaths without adding job properties (e.g. "Use defaults" case).

If we were sure that any jobProperties configured in the spec should have highest precedence, addInputPaths could set the the job properties directly... Then the "Use Defaults" case is handled by not overriding anything in the Spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything else you would like to see before this is ready for merge?

@fjy
Copy link
Contributor

fjy commented Jul 7, 2015

@drcrallen @xvrl any more concerns?

@xvrl
Copy link
Member

xvrl commented Jul 7, 2015

@fjy no particular concerns. Maybe we can remove the temporary folder in the tests to limit I/O during tests runs, they don't seem to be used other than for their paths. But I don't want to hold up the PR on this, so feel free to merge. We can clean up IO in a different PR.

drcrallen added a commit that referenced this pull request Jul 7, 2015
JobHelper.ensurePaths will set job properties from config (tuningConf…
@drcrallen drcrallen merged commit b2bc46b into apache:master Jul 7, 2015
@kzadorozhny kzadorozhny deleted the feature/1463 branch October 8, 2015 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants