From 99ab8feb8536a8f22f9abd8a83b1d854d80a513f Mon Sep 17 00:00:00 2001 From: yew1eb Date: Fri, 20 Oct 2017 15:55:21 +0800 Subject: [PATCH 1/3] add a column that the type of metrics --- docs/monitoring/metrics.md | 185 ++++++++++++++++++++++++++----------- 1 file changed, 129 insertions(+), 56 deletions(-) diff --git a/docs/monitoring/metrics.md b/docs/monitoring/metrics.md index 595051c9861fe..044ee753a78fe 100644 --- a/docs/monitoring/metrics.md +++ b/docs/monitoring/metrics.md @@ -519,10 +519,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -531,10 +532,12 @@ Thus, in order to infer the metric identifier: + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.CPU Load The recent CPU usage of the JVM.Gauge
Time The CPU time used by the JVM.Gauge
@@ -543,10 +546,11 @@ Thus, in order to infer the metric identifier: - - - - + + + + + @@ -554,51 +558,63 @@ Thus, in order to infer the metric identifier: - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + +
ScopeInfixMetricsDescriptionScopeInfixMetricsDescriptionType
Job-/TaskManager Status.JVM.Memory Heap.UsedThe amount of heap memory currently used.The amount of heap memory currently used (in bytes).Gauge
Heap.CommittedThe amount of heap memory guaranteed to be available to the JVM.The amount of heap memory guaranteed to be available to the JVM (in bytes).Gauge
Heap.MaxThe maximum amount of heap memory that can be used for memory management.The maximum amount of heap memory that can be used for memory management (in bytes).Gauge
NonHeap.UsedThe amount of non-heap memory currently used.The amount of non-heap memory currently used (in bytes).Gauge
NonHeap.CommittedThe amount of non-heap memory guaranteed to be available to the JVM.The amount of non-heap memory guaranteed to be available to the JVM (in bytes).Gauge
NonHeap.MaxThe maximum amount of non-heap memory that can be used for memory management.The maximum amount of non-heap memory that can be used for memory management (in bytes).Gauge
Direct.CountThe number of buffers in the direct buffer pool.The number of buffers in the direct buffer pool (in bytes).Gauge
Direct.MemoryUsedThe amount of memory used by the JVM for the direct buffer pool.The amount of memory used by the JVM for the direct buffer pool (in bytes).Gauge
Direct.TotalCapacityThe total capacity of all buffers in the direct buffer pool.The total capacity of all buffers in the direct buffer pool (in bytes).Gauge
Mapped.CountThe number of buffers in the mapped buffer pool.The number of buffers in the mapped buffer pool (in bytes).Gauge
Mapped.MemoryUsedThe amount of memory used by the JVM for the mapped buffer pool.The amount of memory used by the JVM for the mapped buffer pool (in bytes).Gauge
Mapped.TotalCapacityThe number of buffers in the mapped buffer pool.The number of buffers in the mapped buffer pool (in bytes).Gauge
@@ -607,10 +623,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -619,6 +636,7 @@ Thus, in order to infer the metric identifier: +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.ClassLoader Threads.Count The total number of live threads.Gauge
@@ -627,10 +645,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -639,10 +658,12 @@ Thus, in order to infer the metric identifier: + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.GarbageCollector <GarbageCollector>.Count The total number of collections that have occurred.Gauge
<GarbageCollector>.Time The total time spent performing garbage collection.Gauge
@@ -651,10 +672,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -663,10 +685,12 @@ Thus, in order to infer the metric identifier: + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.ClassLoader ClassesLoaded The total number of classes loaded since the start of the JVM.Gauge
ClassesUnloaded The total number of classes unloaded since the start of the JVM.Gauge
@@ -675,10 +699,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -687,46 +712,56 @@ Thus, in order to infer the metric identifier: + + + + + + + + + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.Network AvailableMemorySegments The number of unused memory segments.Gauge
TotalMemorySegments The number of allocated memory segments.Gauge
Task buffers inputQueueLength The number of queued input buffers.Gauge
outputQueueLength The number of queued output buffers.Gauge
inPoolUsage An estimate of the input buffers usage.Gauge
outPoolUsage An estimate of the output buffers usage.Gauge
Network.<Input|Output>.<gate>
(only available if taskmanager.net.detailed-metrics config option is set)
totalQueueLen Total number of queued buffers in all input/output channels.Gauge
minQueueLen Minimum number of queued buffers in all input/output channels.Gauge
maxQueueLen Maximum number of queued buffers in all input/output channels.Gauge
avgQueueLen Average number of queued buffers in all input/output channels.Gauge
@@ -735,9 +770,10 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -745,18 +781,22 @@ Thus, in order to infer the metric identifier: + + + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
JobManager numRegisteredTaskManagers The number of registered taskmanagers.Gauge
numRunningJobs The number of running jobs.Gauge
taskSlotsAvailable The number of available task slots.Gauge
taskSlotsTotal The total number of task slots.Gauge
@@ -765,34 +805,39 @@ Thus, in order to infer the metric identifier: - - - + + + + - + + + + - + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Job (only available on JobManager) restartingTimeThe time it took to restart the job, or how long the current restart has been in progress.The time it took to restart the job, or how long the current restart has been in progress (in milliseconds).Gauge
uptime The time that the job has been running without interruption. -

Returns -1 for completed jobs.

+

Returns -1 for completed jobs (in milliseconds).

Gauge
downtime For jobs currently in a failing/recovering situation, the time elapsed during this outage. -

Returns 0 for running jobs and -1 for completed jobs.

+

Returns 0 for running jobs and -1 for completed jobs (in milliseconds).

Gauge
fullRestartsThe total number of full restarts since this job was submitted.The total number of full restarts since this job was submitted (in milliseconds).Gauge
@@ -801,53 +846,64 @@ Thus, in order to infer the metric identifier: - - - + + + + - + + - + + + - + + - + + + + + + - + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Job (only available on JobManager) lastCheckpointDurationThe time it took to complete the last checkpoint.The time it took to complete the last checkpoint (in milliseconds).Gauge
lastCheckpointSizeThe total size of the last checkpoint.The total size of the last checkpoint (in bytes).Gauge
lastCheckpointExternalPath The path where the last external checkpoint was stored.Gauge
lastCheckpointRestoreTimestampTimestamp when the last checkpoint was restored at the coordinator.Timestamp when the last checkpoint was restored at the coordinator (in milliseconds).Gauge
lastCheckpointAlignmentBufferedThe number of buffered bytes during alignment over all subtasks for the last checkpoint.The number of buffered bytes during alignment over all subtasks for the last checkpoint (in bytes).Gauge
numberOfInProgressCheckpoints The number of in progress checkpoints.Gauge
numberOfCompletedCheckpoints The number of successfully completed checkpoints.Gauge
numberOfFailedCheckpoints The number of failed checkpoints.Gauge
totalNumberOfCheckpoints The number of total checkpoints (in progress, completed, failed).Gauge
Task checkpointAlignmentTimeThe time in nanoseconds that the last barrier alignment took to complete, or how long the current alignment has taken so far.The time in nanoseconds that the last barrier alignment took to complete, or how long the current alignment has taken so far (in nanoseconds).Gauge
@@ -856,66 +912,80 @@ Thus, in order to infer the metric identifier: - - - + + + + - + + + + + + + + + + + + + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Task currentLowWatermarkThe lowest watermark this task has received.The lowest watermark this task has received (in milliseconds).Gauge
numBytesInLocal The total number of bytes this task has read from a local source.Counter
numBytesInLocalPerSecond The number of bytes this task reads from a local source per second.Meter
numBytesInRemote The total number of bytes this task has read from a remote source.Counter
numBytesInRemotePerSecond The number of bytes this task reads from a remote source per second.Meter
numBytesOut The total number of bytes this task has emitted.Counter
numBytesOutPerSecond The number of bytes this task emits per second.Meter
Task/Operator numRecordsIn The total number of records this operator/task has received.Counter
numRecordsInPerSecond The number of records this operator/task receives per second.Meter
numRecordsOut The total number of records this operator/task has emitted.Counter
numRecordsOutPerSecond The number of records this operator/task sends per second.Meter
Operator latency The latency distributions from all incoming sources.Histogram
numSplitsProcessed The total number of InputSplits this data source has processed (if the operator is a data source).Gauge
@@ -926,9 +996,10 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -936,11 +1007,13 @@ Thus, in order to infer the metric identifier: + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Operator commitsSucceeded Kafka offset commit success count if Kafka commit is turned on and checkpointing is enabled.Counter
Operator commitsFailed Kafka offset commit failure count if Kafka commit is turned on and checkpointing is enabled.Counter
From 219a47027b00d16922e23c2cb64cadbfff8dd6f8 Mon Sep 17 00:00:00 2001 From: yew1eb Date: Fri, 20 Oct 2017 15:55:21 +0800 Subject: [PATCH 2/3] add a column that the type of metrics --- docs/monitoring/metrics.md | 191 ++++++++++++++++++++++++++----------- 1 file changed, 135 insertions(+), 56 deletions(-) diff --git a/docs/monitoring/metrics.md b/docs/monitoring/metrics.md index 595051c9861fe..09f2924edce0e 100644 --- a/docs/monitoring/metrics.md +++ b/docs/monitoring/metrics.md @@ -519,10 +519,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -531,10 +532,12 @@ Thus, in order to infer the metric identifier: + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.CPU Load The recent CPU usage of the JVM.Gauge
Time The CPU time used by the JVM.Gauge
@@ -543,10 +546,11 @@ Thus, in order to infer the metric identifier: - - - - + + + + + @@ -554,51 +558,63 @@ Thus, in order to infer the metric identifier: - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + +
ScopeInfixMetricsDescriptionScopeInfixMetricsDescriptionType
Job-/TaskManager Status.JVM.Memory Heap.UsedThe amount of heap memory currently used.The amount of heap memory currently used (in bytes).Gauge
Heap.CommittedThe amount of heap memory guaranteed to be available to the JVM.The amount of heap memory guaranteed to be available to the JVM (in bytes).Gauge
Heap.MaxThe maximum amount of heap memory that can be used for memory management.The maximum amount of heap memory that can be used for memory management (in bytes).Gauge
NonHeap.UsedThe amount of non-heap memory currently used.The amount of non-heap memory currently used (in bytes).Gauge
NonHeap.CommittedThe amount of non-heap memory guaranteed to be available to the JVM.The amount of non-heap memory guaranteed to be available to the JVM (in bytes).Gauge
NonHeap.MaxThe maximum amount of non-heap memory that can be used for memory management.The maximum amount of non-heap memory that can be used for memory management (in bytes).Gauge
Direct.CountThe number of buffers in the direct buffer pool.The number of buffers in the direct buffer pool (in bytes).Gauge
Direct.MemoryUsedThe amount of memory used by the JVM for the direct buffer pool.The amount of memory used by the JVM for the direct buffer pool (in bytes).Gauge
Direct.TotalCapacityThe total capacity of all buffers in the direct buffer pool.The total capacity of all buffers in the direct buffer pool (in bytes).Gauge
Mapped.CountThe number of buffers in the mapped buffer pool.The number of buffers in the mapped buffer pool (in bytes).Gauge
Mapped.MemoryUsedThe amount of memory used by the JVM for the mapped buffer pool.The amount of memory used by the JVM for the mapped buffer pool (in bytes).Gauge
Mapped.TotalCapacityThe number of buffers in the mapped buffer pool.The number of buffers in the mapped buffer pool (in bytes).Gauge
@@ -607,10 +623,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -619,6 +636,7 @@ Thus, in order to infer the metric identifier: +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.ClassLoader Threads.Count The total number of live threads.Gauge
@@ -627,10 +645,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -639,10 +658,12 @@ Thus, in order to infer the metric identifier: + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.GarbageCollector <GarbageCollector>.Count The total number of collections that have occurred.Gauge
<GarbageCollector>.Time The total time spent performing garbage collection.Gauge
@@ -651,10 +672,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -663,10 +685,12 @@ Thus, in order to infer the metric identifier: + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.JVM.ClassLoader ClassesLoaded The total number of classes loaded since the start of the JVM.Gauge
ClassesUnloaded The total number of classes unloaded since the start of the JVM.Gauge
@@ -675,10 +699,11 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -687,46 +712,56 @@ Thus, in order to infer the metric identifier: + + + + + + + + + +
ScopeInfixMetricsScopeInfixMetrics DescriptionType
Status.Network AvailableMemorySegments The number of unused memory segments.Gauge
TotalMemorySegments The number of allocated memory segments.Gauge
Task buffers inputQueueLength The number of queued input buffers.Gauge
outputQueueLength The number of queued output buffers.Gauge
inPoolUsage An estimate of the input buffers usage.Gauge
outPoolUsage An estimate of the output buffers usage.Gauge
Network.<Input|Output>.<gate>
(only available if taskmanager.net.detailed-metrics config option is set)
totalQueueLen Total number of queued buffers in all input/output channels.Gauge
minQueueLen Minimum number of queued buffers in all input/output channels.Gauge
maxQueueLen Maximum number of queued buffers in all input/output channels.Gauge
avgQueueLen Average number of queued buffers in all input/output channels.Gauge
@@ -735,9 +770,10 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -745,18 +781,22 @@ Thus, in order to infer the metric identifier: + + + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
JobManager numRegisteredTaskManagers The number of registered taskmanagers.Gauge
numRunningJobs The number of running jobs.Gauge
taskSlotsAvailable The number of available task slots.Gauge
taskSlotsTotal The total number of task slots.Gauge
@@ -765,34 +805,42 @@ Thus, in order to infer the metric identifier: - - - + + + + - + + + + - + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Job (only available on JobManager) restartingTimeThe time it took to restart the job, or how long the current restart has been in progress. + The time it took to restart the job, +

or how long the current restart has been in progress (in milliseconds).

+
Gauge
uptime The time that the job has been running without interruption. -

Returns -1 for completed jobs.

+

Returns -1 for completed jobs (in milliseconds).

Gauge
downtime For jobs currently in a failing/recovering situation, the time elapsed during this outage. -

Returns 0 for running jobs and -1 for completed jobs.

+

Returns 0 for running jobs and -1 for completed jobs (in milliseconds).

Gauge
fullRestartsThe total number of full restarts since this job was submitted.The total number of full restarts since this job was submitted (in milliseconds).Gauge
@@ -801,53 +849,67 @@ Thus, in order to infer the metric identifier: - - - + + + + - + + - + + + - + + - + + + + + + - + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Job (only available on JobManager) lastCheckpointDurationThe time it took to complete the last checkpoint.The time it took to complete the last checkpoint (in milliseconds).Gauge
lastCheckpointSizeThe total size of the last checkpoint.The total size of the last checkpoint (in bytes).Gauge
lastCheckpointExternalPath The path where the last external checkpoint was stored.Gauge
lastCheckpointRestoreTimestampTimestamp when the last checkpoint was restored at the coordinator.Timestamp when the last checkpoint was restored at the coordinator (in milliseconds).Gauge
lastCheckpointAlignmentBufferedThe number of buffered bytes during alignment over all subtasks for the last checkpoint.The number of buffered bytes during alignment over all subtasks for the last checkpoint (in bytes).Gauge
numberOfInProgressCheckpoints The number of in progress checkpoints.Gauge
numberOfCompletedCheckpoints The number of successfully completed checkpoints.Gauge
numberOfFailedCheckpoints The number of failed checkpoints.Gauge
totalNumberOfCheckpoints The number of total checkpoints (in progress, completed, failed).Gauge
Task checkpointAlignmentTimeThe time in nanoseconds that the last barrier alignment took to complete, or how long the current alignment has taken so far. + The time that the last barrier alignment took to complete, +

or how long the current alignment has taken so far (in nanoseconds).

+
Gauge
@@ -856,66 +918,80 @@ Thus, in order to infer the metric identifier: - - - + + + + - + + + + + + + + + + + + + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Task currentLowWatermarkThe lowest watermark this task has received.The lowest watermark this task has received (in milliseconds).Gauge
numBytesInLocal The total number of bytes this task has read from a local source.Counter
numBytesInLocalPerSecond The number of bytes this task reads from a local source per second.Meter
numBytesInRemote The total number of bytes this task has read from a remote source.Counter
numBytesInRemotePerSecond The number of bytes this task reads from a remote source per second.Meter
numBytesOut The total number of bytes this task has emitted.Counter
numBytesOutPerSecond The number of bytes this task emits per second.Meter
Task/Operator numRecordsIn The total number of records this operator/task has received.Counter
numRecordsInPerSecond The number of records this operator/task receives per second.Meter
numRecordsOut The total number of records this operator/task has emitted.Counter
numRecordsOutPerSecond The number of records this operator/task sends per second.Meter
Operator latency The latency distributions from all incoming sources.Histogram
numSplitsProcessed The total number of InputSplits this data source has processed (if the operator is a data source).Gauge
@@ -926,9 +1002,10 @@ Thus, in order to infer the metric identifier: - - - + + + + @@ -936,11 +1013,13 @@ Thus, in order to infer the metric identifier: + +
ScopeMetricsDescriptionScopeMetricsDescriptionType
Operator commitsSucceeded Kafka offset commit success count if Kafka commit is turned on and checkpointing is enabled.Counter
Operator commitsFailed Kafka offset commit failure count if Kafka commit is turned on and checkpointing is enabled.Counter
From 77d632b45d174416240a6872768fdff4b6a380bb Mon Sep 17 00:00:00 2001 From: yew1eb Date: Thu, 26 Oct 2017 18:06:50 +0800 Subject: [PATCH 3/3] add time unit for latency metric --- docs/monitoring/metrics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/monitoring/metrics.md b/docs/monitoring/metrics.md index 044ee753a78fe..864b9e93d36e5 100644 --- a/docs/monitoring/metrics.md +++ b/docs/monitoring/metrics.md @@ -588,7 +588,7 @@ Thus, in order to infer the metric identifier: Direct.Count - The number of buffers in the direct buffer pool (in bytes). + The number of buffers in the direct buffer pool. Gauge @@ -603,7 +603,7 @@ Thus, in order to infer the metric identifier: Mapped.Count - The number of buffers in the mapped buffer pool (in bytes). + The number of buffers in the mapped buffer pool. Gauge @@ -979,7 +979,7 @@ Thus, in order to infer the metric identifier: Operator latency - The latency distributions from all incoming sources. + The latency distributions from all incoming sources (in milliseconds). Histogram