Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMZA-1143 Universal config support for localized resource #90

Closed
wants to merge 6 commits into from

Conversation

@fredji97
Copy link

fredji97 commented Mar 16, 2017

More details in https://issues.apache.org/jira/browse/SAMZA-1143

Tests: ./gradlew clean check successful and all unit tests passed

* The key will be localizer.resource.<resourceVisibility>.<resourceType>.<resourceName>
* The value corresponding to the key will be the URI.
*/
public static final String LOCALIZER_RESOURCE_PREFIX = "yarn.localizer.resource.";

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

Wouldn't it be nicer to follow the standard Samza configs? (atleast we're consistent with the rest of the config that way) Also,I wonder if we should not have to encode information on the nature of the value directly in the key.

For instance
configs can explicitly be:
yarn.resources.myResource.visibility = PUBLIC, yarn.resources.myResource.uri = http://myuri and yarn.resources.resource-name.type = FILE
instead of

yarn.resource.PUBLIC.FILE.myResource = uri

The following nice properties fall out:

  • You don't have to change a config-key should you decide to change the scope of the resource from public to private / or from a file to an archive.
  • It's consistent with Samza configs for the stores.store-name..., serdes.serde-name..., streams.stream-name... namespaces.
  • You can define sensible defaults (in case, one of these scopes is not needed)
  • You can get-rid of the regex parsing logic , and instead rely on simple k-v pair style configs.

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

Thanks for the suggestion. I am changing to this pattern of config.

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.yarn.api.records.LocalResourceType;
import org.apache.hadoop.yarn.api.records.LocalResourceVisibility;
import org.apache.samza.clustermanager.*;

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

strip off blanket imports.

This comment has been minimized.

Copy link
@fredji97
this.localResourceMap = new HashMap<>();
}

public void map() throws LocalizerResourceException {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

Don't need an explicit map here. I wonder if we can use invoke map internally inside the constructor.

  • Not sure I see a use-case to invoke map multiple times in this class.
  • We can make localResourceMap immutable

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

made map() private now and made localResourceMap immutable


}

public Map<String, LocalResource> getResourceMap() {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor
  1. I wonder if this class can instead be made an util / or a function.
  2. Should we return an Immutable collection or a copy here? That way, external callers don't mutate the map, and this class is immune to those changes.

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

I have put the map() function into the constructor, and now the caller just need to initialize the object and then call getResourceMap(). I think it is simple enough. Although we can make it one static function, but that is not very testing friendly.
Regarding the immutable map, Thanks for the suggestion, I changed it to ImmutableMap.

@@ -120,6 +119,11 @@ public YarnClusterResourceManager(Config config, JobModelManager jobModelManager
hConfig = new YarnConfiguration();
hConfig.set("fs.http.impl", HttpFileSystem.class.getName());

if (config.containsKey("fs.certfs.impl.override")) { //TODO: change to use constants from CertFSConstants once that CertFSConstants.java is in

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

Perhaps, I'm missing something. Why do we need 2 configs - one for fs.certfs.impl and the override config?

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

updated to use the same config name and made it general for any scheme.

@@ -47,6 +49,8 @@
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.*;

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

remove blanket imports

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

removed


// include the resources from the universal resource configurations
try {
LocalizerResourceMapper resourceMapper = new LocalizerResourceMapper(config, yarnConfiguration);

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

Would n't it be nicer to standardize all localizer related configs in one-place? Currently, they appear to be in here, as well as in LocalizerResourceMapper. I wonder if we can consolidate all of them into a single function that takes in a config , and returns a map of local resources we can directly pass in to SamzaContainerContext.?

Something along the lines of :

public Map<String, LocalResource> getLocalResources(Config config) {

}

You can use the existing LocalizerResourceMapper class, or provide a static function (in one of the util classes)

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

Currently, it calls the one from LocalizerResourceMapper and returns the localResourceMap for the context.
There might be some legacy for package due to historical reason, and we plan to remove it later (TODO: SAMZA-1144 in the code), so I chose not to touch them now.


log.debug("setting package to {}", packageResource);
log.debug("setting localResourceMap to {}", localResourceMap);

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

thanks for carefully changing in all the logging statements 👍


if (config.containsKey("fs.certfs.impl.override")) { //TODO: change to use constants from CertFSConstants once that CertFSConstants.java is in
hConfig.set("fs.certfs.impl", config.get("fs.certfs.impl.override"))
logger.info("samza job config fs.certfs.impl.override is used for yarn.")

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 17, 2017

Contributor

I'm probably missing some context here.

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

changed to use fs..impl and made the same config name for both samza job config and yarn config.

Fred Ji added 2 commits Mar 21, 2017
…n.resources.<resourceName>.*) for each resource instead of just one config
…n.resources.<resourceName>.*) for each resource instead of just one config

public FsImplConfigManager(final Config config) {
if (null == config) {
this.config = new MapConfig();

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Issue: null configs should ideally throw an IllegalArgumentException. Is there a valid case where this could be null?

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 22, 2017

Author

changed to throw IllegalArgumentException instead.

* Get all schemes
* @return List of schemes in strings
*/
public List<String> getSchemes() {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Issue: where is this public API used? Don't find an usage of this anywhere else (apart from tests.) Prefer to have this package private (with a comment that /* package private */)

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 22, 2017

Author

after removing overrideYarnConfiguration, this should be public API now.

* Take the fs.&lt;scheme&gt;impl from the samza job configuration and override the values in yarn configuration
* @param yarnConfiguration
*/
public void overrideYarnConfiguration(YarnConfiguration yarnConfiguration) {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Issue: Don't think we need this mutating method here. Prefer to keep this class simple, methods as functional.

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

This is to avoid the duplicate code in YarnJobFactory and YarnClusterResourceManager.

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

If the responsibility of this class is parsing and returning a subset of configs, I'm not sure overrideYarnConfiguration belongs here. In fact, this should not have to do anything with Yarn.

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 22, 2017

Author

removed overrideYarnConfiguration.

* PUBLIC, PRIVATE, or APPLICATION.
* If not set, the default value is is APPLICATION.
*/
public class LocalizerResourceConfigManager{

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Thanks for refactoring this class as per the new config scheme!

*/
package org.apache.samza.job.yarn;

public class LocalizerResourceException extends Exception {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Should we instead be throwing RunTimeExceptions instead of Exceptions?

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

Nice catch. Will change. Thanks.

private LocalizerResourceConfigManager resourceConfigManager;
private ImmutableMap<String, LocalResource> localResourceMap;

public LocalizerResourceMapper(Config config, YarnConfiguration yarnConfiguration) throws LocalizerResourceException {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Can this take in LocalizerResourceConfig here explicitly instead of taking in the entire config object?

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 22, 2017

Author

yeah, I changed to take LocalizerResourceConfig

map(); // set up localResourceMap
}

private void map() throws LocalizerResourceException {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor
  • Prefer methods that are functional instead of private void functions that mutate internal state.
  • I wonder if this can instead be Map<String, LocalResource> buildResourceMapping()?

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 21, 2017

Author

updated based on the comment.


}

public ImmutableMap<String, LocalResource> getResourceMap() {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Prefer general types when returning values from functions - Map<String, LocalResource> . The Map implementation returned might very well be an ImmutableMap but the API can still return Map. That way, clients don't have to depend on guava.

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 22, 2017

Author

updated.

@@ -144,13 +147,15 @@ public void runContainer(int samzaContainerId, Container container, CommandBuild
* Runs a command as a process on the container. All binaries needed by the physical process are packaged in the URL
* specified by packagePath.
*/
private void startContainer(Path packagePath,
private void startContainer(Path packagePath, //keep packagePath out of resourceNamePathMap for backward compatibility purpose

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

I don't understand why this prevents or allows backwards compatibility. It's a private internal API anyways. We should take in Map<String, LocalResource> as a parameter right?

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 22, 2017

Author

Removed this comment for avoiding confusion.

@@ -190,13 +196,24 @@ private void startContainer(Path packagePath,
throw new SamzaContainerLaunchException("IO Exception when writing credentials to output buffer");
}

Map<String, LocalResource> localResourceMap = new HashMap<>();

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 21, 2017

Contributor

Not sure I understand completely. I was proposing to remove this duplication by refactoring the logic that populates the package resource inside LocalizerResourceMapper .

  • The logic for parsing configs is now spread across 2 places (one to construct the mapping for yarn.package.path and the other to construct this resources map). I'm wondering if the two can be in the same class.

This comment has been minimized.

Copy link
@fredji97

fredji97 Mar 22, 2017

Author

the old yarn.package.path does not fit in the general LocalizerResourceConfig and LocalizerResourceMapper. As we would like to see if we can deprecate this one in favor of using yarn.resources.package.path which follows the general localizer resource config, I prefer not to hack LocalizerResourceConfig and LocalizerResourceMapper to support this old config.

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 22, 2017

Contributor

I'm fine if this is not done right away. We agree that once we deprecate yarn.package.path, the code path will be unified as per my previous comment.

…nfigManager and also updated based on the additional review comments
Copy link
Contributor

vjagadish1989 left a comment

Thanks for addressing all issues raised in previous drafts.

Mostly nits (in this final pass). Also, some classes are missing docs. In some cases, docs could be more concise. Feel free to take a call, and re-word them as appropriate.

Fix It, and Ship It!


@Test
public void testGetJobWithDefaultFsImpl() {

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 22, 2017

Contributor

nit: Get rid off unnecessary new lines.

@@ -190,13 +196,24 @@ private void startContainer(Path packagePath,
throw new SamzaContainerLaunchException("IO Exception when writing credentials to output buffer");
}

Map<String, LocalResource> localResourceMap = new HashMap<>();

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 22, 2017

Contributor

I'm fine if this is not done right away. We agree that once we deprecate yarn.package.path, the code path will be unified as per my previous comment.



/**
* FsImplConfigManager is intented to manage the Samza config for fs.&lt;scheme&gt;impl.

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 22, 2017

Contributor

nit: java docs seem to be still referring to FsImplConfigManager

* FsImplConfigManager is intented to manage the Samza config for fs.&lt;scheme&gt;impl.
* e.g. fs.http.impl
*
* Usually this config exist in yarn config,

This comment has been minimized.

Copy link
@vjagadish1989

vjagadish1989 Mar 22, 2017

Contributor

nit: Java docs can be clearer and concise.

@asfgit asfgit closed this in e4cfeeb Mar 27, 2017
prateekm pushed a commit to prateekm/samza that referenced this pull request Mar 31, 2017
More details in https://issues.apache.org/jira/browse/SAMZA-1143

Tests: ./gradlew clean check successful and all unit tests passed

Author: Fred Ji <fji@linkedin.com>

Reviewers: Jagadish <jagadish@apache.org>

Closes apache#90 from fredji97/universalLocalizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.