Skip to content

Commit

Permalink
HBASE-24405 : Document hbase:slowlog related operations (#1747)
Browse files Browse the repository at this point in the history
Signed-off-by: ramkrish86 <ramkrishna@apache.org>
Signed-off-by: Anoop Sam John <anoopsamjohn@apache.org>
  • Loading branch information
virajjasani committed May 22, 2020
1 parent 9d9f07b commit e1d7dc8
Show file tree
Hide file tree
Showing 4 changed files with 151 additions and 0 deletions.
12 changes: 12 additions & 0 deletions hbase-common/src/main/resources/hbase-default.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1947,6 +1947,18 @@ possible configurations would overwhelm and obscure the important.
responses with complete data.
</description>
</property>
<property>
<name>hbase.regionserver.slowlog.systable.enabled</name>
<value>false</value>
<description>
Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled. If enabled
(true), all slow/large RPC logs would be persisted to system table hbase:slowlog (in addition
to in-memory ring buffer at each RegionServer). The records are stored in increasing
order of time. Operators can scan the table with various combination of ColumnValueFilter.
More details are provided in the doc section:
"Get Slow/Large Response Logs from System table hbase:slowlog"
</description>
</property>
<property>
<name>hbase.master.metafixer.max.merge.count</name>
<value>64</value>
Expand Down
18 changes: 18 additions & 0 deletions src/main/asciidoc/_chapters/hbase-default.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2246,6 +2246,24 @@ The percent of region server RPC threads failed to abort RS.
`false`


[[hbase.regionserver.slowlog.systable.enabled]]
*`hbase.regionserver.slowlog.systable.enabled`*::
+
.Description

Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled.
If enabled (true), all slow/large RPC logs would be persisted to system table
hbase:slowlog (in addition to in-memory ring buffer at each RegionServer).
The records are stored in increasing order of time.
Operators can scan the table with various combination of ColumnValueFilter and
time range.
More details are provided in the doc section:
"Get Slow/Large Response Logs from System table hbase:slowlog"

+
.Default
`false`


[[hbase.master.metafixer.max.merge.count]]
*`hbase.master.metafixer.max.merge.count`*::
Expand Down
3 changes: 3 additions & 0 deletions src/main/asciidoc/_chapters/ops_mgt.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2079,6 +2079,9 @@ Examples:
----

include::slow_log_responses_from_systable.adoc[]


=== Block Cache Monitoring

Starting with HBase 0.98, the HBase Web UI includes the ability to monitor and report on the performance of the block cache.
Expand Down
118 changes: 118 additions & 0 deletions src/main/asciidoc/_chapters/slow_log_responses_from_systable.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
////
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
////
[[slow_log_responses_from_systable]]
==== Get Slow/Large Response Logs from System table hbase:slowlog
The above section provides details about Admin APIs:
* get_slowlog_responses
* get_largelog_responses
* clear_slowlog_responses
All of the above APIs access online in-memory ring buffers from
individual RegionServers and accumulate logs from ring buffers to display
to end user. However, since the logs are stored in memory, after RegionServer is
restarted, all the objects held in memory of that RegionServer will be cleaned up
and previous logs are lost. What if we want to persist all these logs forever?
What if we want to store them in such a manner that operator can get all historical
records with some filters? e.g get me all large/slow RPC logs that are triggered by
user1 and are related to region:
cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. ?
If we have a system table that stores such logs in increasing (not so strictly though)
order of time, it can definitely help operators debug some historical events
(scan, get, put, compaction, flush etc) with detailed inputs.
Config which enabled system table to be created and store all log events is
`hbase.regionserver.slowlog.systable.enabled`.
The default value for this config is `false`. If provided `true`
(Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`),
a cron job running in every RegionServer will persist the slow/large logs into
table hbase:slowlog. By default cron job runs every 10 min. Duration can be configured
with key: `hbase.slowlog.systable.chore.duration`. By default, RegionServer will
store upto 1000(config key: `hbase.regionserver.slowlog.systable.queue.size`)
slow/large logs in an internal queue and the chore will retrieve these logs
from the queue and perform batch insertion in hbase:slowlog.
hbase:slowlog has single ColumnFamily: `info`
`info` contains multiple qualifiers which are the same attributes present as
part of `get_slowlog_responses` API response.
* info:call_details
* info:client_address
* info:method_name
* info:param
* info:processing_time
* info:queue_time
* info:region_name
* info:response_size
* info:server_class
* info:start_time
* info:type
* info:username
And example of 2 rows from hbase:slowlog scan result:
[source]
----
\x024\xC1\x03\xE9\x04\xF5@ column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
\x024\xC1\x03\xE9\x04\xF5@ column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347
\x024\xC1\x03\xE9\x04\xF5@ column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan
\x024\xC1\x03\xE9\x04\xF5@ column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle
vel_" value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks
: true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_
handles_heartbeats: true track_scan_metrics: false
\x024\xC1\x03\xE9\x04\xF5@ column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18
\x024\xC1\x03\xE9\x04\xF5@ column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0
\x024\xC1\x03\xE9\x04\xF5@ column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1
\x024\xC1\x03\xE9\x04\xF5@ column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575
\x024\xC1\x03\xE9\x04\xF5@ column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer
\x024\xC1\x03\xE9\x04\xF5@ column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732
\x024\xC1\x03\xE9\x04\xF5@ column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL
\x024\xC1\x03\xE9\x04\xF5@ column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2
\x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
\x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348
\x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan
\x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a
ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2
097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met
rics: false
\x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24
\x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0
\x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf.
\x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227
\x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer
\x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932
\x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL
\x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=user1
----
Operator can use ColumnValueFilter to filter records based on region_name, username,
client_address etc.
Time range based queries will also be very useful.
Example:
[source]
----
scan 'hbase:slowlog', { TIMERANGE => [1589621394000, 1589637999999] }
----

0 comments on commit e1d7dc8

Please sign in to comment.