Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9028][flip6] perform parameters checking before starting cluster #5726

Conversation

sihuazhou
Copy link
Contributor

What is the purpose of the change

Perform parameters checking before starting cluster to prevent to setup a problematic cluster.

Brief change log

  • perform parameters checking before starting cluster

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

no

@sihuazhou sihuazhou changed the title [FLINK-FLINK-9028][flip6] perform parameters checking before starting cluster [FLINK-9028][flip6] perform parameters checking before starting cluster Mar 20, 2018
@sihuazhou
Copy link
Contributor Author

CC: @tillrohrmann

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @sihuazhou. The fix is a good idea. I think we can still improve it a bit by refactoring the verification logic and moving the check to AbstractYarnClusterDescriptor#deployInternal.

// aim to check the parameters
ContaineredTaskManagerParameters.create(
effectiveConfiguration,
effectiveConfiguration.getInteger(TaskManagerOptions.TASK_MANAGER_HEAP_MEMORY), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pull the verification logic out of the ContaineredTaskManagerParameters#create method and place it somewhere else. I would also suggest to move this check to AbstractYarnClusterDescriptor#deployInternal and use the provided ClusterSpecification#getTaskManagerMemory for the memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will pull it out and add a test to guard it.

@sihuazhou
Copy link
Contributor Author

@tillrohrmann I made a fixup according to your comments, could you please have a look again?

@sihuazhou sihuazhou force-pushed the flip6_check_parameter_before_start_cluster branch from efd1ec5 to b39bd8b Compare March 20, 2018 11:39
@sihuazhou sihuazhou force-pushed the flip6_check_parameter_before_start_cluster branch from b39bd8b to c984370 Compare March 20, 2018 11:42
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some more comments.

/**
* Tests to guard {@link ResourceManagerRuntimeServices}.
*/
public class ResourceManagerRuntimeServicesTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing extends TestLogger

@@ -423,6 +431,9 @@ public void terminateCluster(ApplicationId applicationId) throws FlinkException
@Nullable JobGraph jobGraph,
boolean detached) throws Exception {

// ------------------ Check if configuration is valid --------------------
checkConfig(clusterSpecification);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not abbreviate and name it validateClusterSpecification

@@ -409,6 +411,12 @@ public void terminateCluster(ApplicationId applicationId) throws FlinkException
}
}

private void checkConfig(ClusterSpecification clusterSpecification) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should throw a checked exception. Something like InvalidConfigurationException or so. But not just fail with an IllegalStateException.

private void checkConfig(ClusterSpecification clusterSpecification) {
long taskManagerMemorySize = clusterSpecification.getTaskManagerMemoryMB();
long cutoff = ResourceManagerRuntimeServices.calculateCutoffMB(flinkConfiguration, taskManagerMemorySize);
TaskManagerServices.calculateHeapSizeMB(taskManagerMemorySize - cutoff, flinkConfiguration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow don't like that we calculate some value just in order to check whether an unchecked exception is thrown. This should be imo more explicit. Or at least it should get a comment explaining what's going on here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...This is a bit tricky, the calculation logic is a bit hard to be separated from the check logical so clearly, because when we perform the checking we also need to do the calculation by the way. So pull the check logical from the calculateFunction to be a separated function means we will have some
code duplication. So I'd like to add a comment here, if you still think we should pull out a new function I would just do it ;).


final long javaMemorySizeMB = containerMemoryMB - cutoff;
// (1) try to compute how much memory used by container
final long cutoffMB = ResourceManagerRuntimeServices.calculateCutoffMB(config, containerMemoryMB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this method in the ContaineredTaskManagerParameters class.

@sihuazhou sihuazhou force-pushed the flip6_check_parameter_before_start_cluster branch from b8776f2 to 2e65190 Compare March 20, 2018 14:10
@sihuazhou
Copy link
Contributor Author

@tillrohrmann I have addressed your comments, could you please review it again.

@tillrohrmann
Copy link
Contributor

Thanks for your contribution @sihuazhou. Changes look good to me. Merging this PR once Travis gave green light.

@sihuazhou sihuazhou force-pushed the flip6_check_parameter_before_start_cluster branch from 2e65190 to f6e1105 Compare March 20, 2018 14:40
tillrohrmann pushed a commit to tillrohrmann/flink that referenced this pull request Mar 20, 2018
@sihuazhou
Copy link
Contributor Author

...found a checkstyle bug in my local built, I made a fixup.

@tillrohrmann
Copy link
Contributor

Have seen it as well and fixed it. So no worries.

tillrohrmann pushed a commit to tillrohrmann/flink that referenced this pull request Mar 21, 2018
@asfgit asfgit closed this in 38aa863 Mar 21, 2018
asfgit pushed a commit that referenced this pull request Mar 21, 2018
sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants