-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GOBBLIN-875: Emit container health metrics when running in cluster mode #2729
Conversation
@@ -954,4 +954,5 @@ | |||
*/ | |||
public static final String AVRO_SCHEMA_CHECK_STRATEGY = "avro.schema.check.strategy"; | |||
public static final String AVRO_SCHEMA_CHECK_STRATEGY_DEFAULT = "org.apache.gobblin.util.schema_check.AvroSchemaCheckDefaultStrategy"; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty line, another one inGobblinApplicationMaster.java
, line 116
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -196,6 +196,11 @@ public GobblinTaskRunner(String applicationName, | |||
this.services.addAll(suite.getServices()); | |||
|
|||
this.services.addAll(getServices()); | |||
|
|||
if (ConfigUtils.getBoolean(this.config, GobblinClusterConfigurationKeys.CONTAINER_HEALTH_METRICS_SERVICE_ENABLED, false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we package this service as part of TaskRunnerSuiteBase
instead of having a this.services.addAll(getServices)
plus another service which is used for metric-reporting outside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TaskRunnerSuiteBase is an abstract class with two implementations, process model and thread model. Wanted to leave getServices() in TaskRunnerSuiteBase as an abstract method so as not to change the contract of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I should not mentioned TaskRunnerSuiteBase
. What I meant to say is, does it make more sense to add this service inside getServices
method in GobblinTaskRunner
?
The comment of getServices
is :
Creates and returns a {@link List} of additional {@link Service}s that should be run in this {@link GobblinTaskRunner}. Sub-classes that need additional {@link Service}s to run, should override this method
Codecov Report
@@ Coverage Diff @@
## master #2729 +/- ##
============================================
+ Coverage 44.99% 45.08% +0.08%
- Complexity 8742 8759 +17
============================================
Files 1884 1886 +2
Lines 70295 70377 +82
Branches 7715 7718 +3
============================================
+ Hits 31629 31726 +97
+ Misses 35735 35709 -26
- Partials 2931 2942 +11
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly minor comments / questions.
@@ -167,4 +167,5 @@ | |||
|
|||
public static final String HELIX_JOB_STOPPING_STATE_TIMEOUT_SECONDS = GOBBLIN_CLUSTER_PREFIX + "job.stoppingStateTimeoutSeconds"; | |||
public static final long DEFAULT_HELIX_JOB_STOPPING_STATE_TIMEOUT_SECONDS = 300; | |||
public static final String CONTAINER_HEALTH_METRICS_SERVICE_ENABLED = GOBBLIN_CLUSTER_PREFIX + "container.health.metrics.service.enabled" ; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a default.
@@ -196,6 +196,11 @@ public GobblinTaskRunner(String applicationName, | |||
this.services.addAll(suite.getServices()); | |||
|
|||
this.services.addAll(getServices()); | |||
|
|||
if (ConfigUtils.getBoolean(this.config, GobblinClusterConfigurationKeys.CONTAINER_HEALTH_METRICS_SERVICE_ENABLED, false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a single static constant for the default value for this config, so you don't have to say false in two places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -133,4 +131,8 @@ public static void addFileAsLocalResource(FileSystem fs, Path destFilePath, Loca | |||
|
|||
return environmentVariableMap; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added javadoc.
@@ -133,4 +131,8 @@ public static void addFileAsLocalResource(FileSystem fs, Path destFilePath, Loca | |||
|
|||
return environmentVariableMap; | |||
} | |||
|
|||
public static String getContainerNum(String containerId) { | |||
return "container-" + containerId.substring(containerId.lastIndexOf("_") + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we doing + 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the substring starting from the char immediately following the last "_". e.g. if containerId = "container_e94_1567552810874_2132400_01_000001", we want to return
"container-000001". Added javadoc to make the behavior clear.
long processCpuTime2 = service.processCpuTime.get(); | ||
Assert.assertTrue(processCpuTime1 < processCpuTime2); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline at end of file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Why is Travis unhappy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. LGTM.
|
||
/** | ||
* A utility class that periodically emits system level metrics that report the health of the container. | ||
* Reported metrics include CPU/Memory usage of the JVM, system load, file descriptors used etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which parameters are concerning with file descriptors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the javadoc.
… in cluster mode Closes apache#2729 from sv2000/metrics
… in cluster mode Closes apache#2729 from sv2000/metrics
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
This task implements a service that emits CPU/Memory health metrics from the JVM when running in the cluster mode.
Tests
Added unit test in ContainerHealthMetricsServiceTest.
Commits