Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GOBBLIN-875: Emit container health metrics when running in cluster mode #2729

Closed
wants to merge 7 commits into from

Conversation

sv2000
Copy link
Contributor

@sv2000 sv2000 commented Sep 9, 2019

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    This task implements a service that emits CPU/Memory health metrics from the JVM when running in the cluster mode.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Added unit test in ContainerHealthMetricsServiceTest.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@@ -954,4 +954,5 @@
*/
public static final String AVRO_SCHEMA_CHECK_STRATEGY = "avro.schema.check.strategy";
public static final String AVRO_SCHEMA_CHECK_STRATEGY_DEFAULT = "org.apache.gobblin.util.schema_check.AvroSchemaCheckDefaultStrategy";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line, another one inGobblinApplicationMaster.java, line 116

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -196,6 +196,11 @@ public GobblinTaskRunner(String applicationName,
this.services.addAll(suite.getServices());

this.services.addAll(getServices());

if (ConfigUtils.getBoolean(this.config, GobblinClusterConfigurationKeys.CONTAINER_HEALTH_METRICS_SERVICE_ENABLED, false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we package this service as part of TaskRunnerSuiteBase instead of having a this.services.addAll(getServices) plus another service which is used for metric-reporting outside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TaskRunnerSuiteBase is an abstract class with two implementations, process model and thread model. Wanted to leave getServices() in TaskRunnerSuiteBase as an abstract method so as not to change the contract of the class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should not mentioned TaskRunnerSuiteBase. What I meant to say is, does it make more sense to add this service inside getServices method in GobblinTaskRunner ?

The comment of getServices is :
Creates and returns a {@link List} of additional {@link Service}s that should be run in this {@link GobblinTaskRunner}. Sub-classes that need additional {@link Service}s to run, should override this method

@codecov-io
Copy link

codecov-io commented Sep 9, 2019

Codecov Report

Merging #2729 into master will increase coverage by 0.08%.
The diff coverage is 70.42%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2729      +/-   ##
============================================
+ Coverage     44.99%   45.08%   +0.08%     
- Complexity     8742     8759      +17     
============================================
  Files          1884     1886       +2     
  Lines         70295    70377      +82     
  Branches       7715     7718       +3     
============================================
+ Hits          31629    31726      +97     
+ Misses        35735    35709      -26     
- Partials       2931     2942      +11
Impacted Files Coverage Δ Complexity Δ
...bblin/cluster/GobblinClusterConfigurationKeys.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...che/gobblin/yarn/GobblinYarnConfigurationKeys.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...apache/gobblin/cluster/ContainerHealthMetrics.java 0% <0%> (ø) 0 <0> (?)
.../apache/gobblin/cluster/GobblinClusterManager.java 53.91% <0%> (-0.51%) 27 <0> (ø)
...org/apache/gobblin/yarn/GobblinYarnTaskRunner.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...a/org/apache/gobblin/aws/GobblinAWSTaskRunner.java 0% <0%> (ø) 0 <0> (ø) ⬇️
.../apache/gobblin/yarn/GobblinApplicationMaster.java 18.84% <100%> (+2.42%) 3 <0> (ø) ⬇️
...n/java/org/apache/gobblin/yarn/YarnHelixUtils.java 20% <100%> (+2.75%) 3 <1> (+1) ⬆️
.../org/apache/gobblin/cluster/GobblinTaskRunner.java 64.35% <50%> (-0.91%) 28 <1> (-1)
...gobblin/cluster/ContainerHealthMetricsService.java 78.57% <78.57%> (ø) 5 <5> (?)
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a409908...621c530. Read the comment docs.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly minor comments / questions.

@@ -167,4 +167,5 @@

public static final String HELIX_JOB_STOPPING_STATE_TIMEOUT_SECONDS = GOBBLIN_CLUSTER_PREFIX + "job.stoppingStateTimeoutSeconds";
public static final long DEFAULT_HELIX_JOB_STOPPING_STATE_TIMEOUT_SECONDS = 300;
public static final String CONTAINER_HEALTH_METRICS_SERVICE_ENABLED = GOBBLIN_CLUSTER_PREFIX + "container.health.metrics.service.enabled" ;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a default.

@@ -196,6 +196,11 @@ public GobblinTaskRunner(String applicationName,
this.services.addAll(suite.getServices());

this.services.addAll(getServices());

if (ConfigUtils.getBoolean(this.config, GobblinClusterConfigurationKeys.CONTAINER_HEALTH_METRICS_SERVICE_ENABLED, false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a single static constant for the default value for this config, so you don't have to say false in two places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -133,4 +131,8 @@ public static void addFileAsLocalResource(FileSystem fs, Path destFilePath, Loca

return environmentVariableMap;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc.

@@ -133,4 +131,8 @@ public static void addFileAsLocalResource(FileSystem fs, Path destFilePath, Loca

return environmentVariableMap;
}

public static String getContainerNum(String containerId) {
return "container-" + containerId.substring(containerId.lastIndexOf("_") + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing + 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the substring starting from the char immediately following the last "_". e.g. if containerId = "container_e94_1567552810874_2132400_01_000001", we want to return
"container-000001". Added javadoc to make the behavior clear.

long processCpuTime2 = service.processCpuTime.get();
Assert.assertTrue(processCpuTime1 < processCpuTime2);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline at end of file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added newline.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Why is Travis unhappy?

Copy link
Contributor

@autumnust autumnust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. LGTM.


/**
* A utility class that periodically emits system level metrics that report the health of the container.
* Reported metrics include CPU/Memory usage of the JVM, system load, file descriptors used etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which parameters are concerning with file descriptors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the javadoc.

@asfgit asfgit closed this in 9aa9e6e Sep 16, 2019
Will-Lo pushed a commit to Will-Lo/incubator-gobblin that referenced this pull request Sep 18, 2019
jhsenjaliya pushed a commit to jhsenjaliya/incubator-gobblin that referenced this pull request Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants