Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 122 additions & 20 deletions docs/_docs/extensions-and-integrations/performance-statistics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,20 @@ link:#print-statistics[print statistics] from the collected files.

== Building the Report

Ignite provides a tool to generate the report from the performance statistics files.
The extension provides a tool to generate an offline HTML report from performance statistics files.

Follow these steps to build the performance report:

1. Stop collecting statistics and place files from all nodes under an empty directory. For example:

/path_to_files/
├── node-162c7147-fef8-4ea2-bd25-8653c41fc7fa.prf
├── node-162c7147-fef8-4ea2-bd25-8653c41fc7fa-system-views.prf
├── node-7b8a7c5c-f3b7-46c3-90da-e66103c00001.prf
└── node-faedc6c9-3542-4610-ae10-4ff7e0600000.prf
└── node-7b8a7c5c-f3b7-46c3-90da-e66103c00001-system-views.prf

+
`node-\*-system-views*.prf` files are stored next to regular performance statistics files. If the system view files are too large, you can omit them. The report will still be generated, but without `System views` data.

2. Run the script from the release package of the tool:

Expand All @@ -45,34 +49,108 @@ The performance report is created in the new directory under the performance sta
`path_to_files/report_yyyy-MM-dd_HH-mm-ss/`.
Open `report_yyyy-MM-dd_HH-mm-ss/index.html` in the browser to see the report.

The report includes `Cluster info`, `Cache operations`, `Transactions`, `SQL queries`, `Scan queries`,
`Index queries`, `Tasks and jobs`, and `System views`. When `node-\*-system-views*.prf` files are available in the
input directory, the `System views` tab is populated with their data.

For more details run the help command:

[source,shell]
----
performance-statistics-tool/build-report.sh --help
----

=== Performance Statistics Report Example
=== HTML Report Tabs

:perf_stat_url: images/integrations/performance-statistics
:perf_stat_image_url: /docs/images/integrations/performance-statistics

==== Cluster info

Cluster topology and caches found in the input files: `Cluster nodes` and `Started caches` tables.

image::{perf_stat_url}/perf_stat_cluster_info.png[Cluster info report, link={perf_stat_image_url}/perf_stat_cluster_info.png]

==== Cache operations

Cache and cache store activity for the selected cache and node. Cache and node selectors filter all charts on the page.

A bar chart summarizes the distribution of cache operation counts by type:

image::{perf_stat_url}/perf_stat_cache_operations.png[Distribution of cache operations, link={perf_stat_image_url}/perf_stat_cache_operations.png]

For each operation type, a line chart shows the operation count over time:

image::{perf_stat_url}/perf_stat_cache_operations_1.png[Cache operation count over time, link={perf_stat_image_url}/perf_stat_cache_operations_1.png]

The `Cache store operations` section provides the same breakdowns for cache store calls. A distribution chart
summarizes cache store operation counts by type:

image::{perf_stat_url}/perf_stat_cache_operations_2.png[Distribution of cache store operations, link={perf_stat_image_url}/perf_stat_cache_operations_2.png]

Line charts show cache store operation counts over time:

image::{perf_stat_url}/perf_stat_cache_operations_3.png[Cache store operation count over time, link={perf_stat_image_url}/perf_stat_cache_operations_3.png]

==== Transactions

Transaction activity for the selected cache and node. A histogram shows the distribution of transaction durations:

image::{perf_stat_url}/perf_stat_transactions.png[Transactions histogram, link={perf_stat_image_url}/perf_stat_transactions.png]

Below it, line charts plot `commit` and `rollback` counts over time:

image::{perf_stat_url}/perf_stat_transactions_1.png[Transactions commit chart, link={perf_stat_image_url}/perf_stat_transactions_1.png]

==== SQL queries

Aggregated SQL query statistics. The `Overall statistics` table shows total executions, duration, and reads for each
query. Each row expands into a `Properties` subtable with map and reduce phase plans, and a `Rows` subtable with
rows fetched on mapper and reducer nodes:

image::{perf_stat_url}/perf_stat_sql_queries.png[SQL queries overall statistics, link={perf_stat_image_url}/perf_stat_sql_queries.png]

The `Top of slowest queries` table lists slow query executions with the same expandable details:

[tabs]
--
tab:Cache operations[]
image:{perf_stat_url}/perf_stat_2.jpg[Cache operations report]
tab:Transactions[]
image:{perf_stat_url}/perf_stat_3.jpg[Transactions report]
tab:Queries[]
image:{perf_stat_url}/perf_stat_4.jpg[Queries report]
tab:Compute[]
image:{perf_stat_url}/perf_stat_5.jpg[Compute report]
tab:Cluster info[]
image:{perf_stat_url}/perf_stat_1.jpg[Cluster info report]
--
image::{perf_stat_url}/perf_stat_sql_queries_1.png[Top of slowest SQL queries, link={perf_stat_image_url}/perf_stat_sql_queries_1.png]

==== Scan queries

Aggregated scan query statistics by cache and the slowest scan queries: `Overall statistics` and `Top of slowest
queries` tables.

image::{perf_stat_url}/perf_stat_scan_queries.png[Scan queries report, link={perf_stat_image_url}/perf_stat_scan_queries.png]

==== Index queries

Aggregated index query statistics and the slowest index queries: `Overall statistics` and `Top of slowest queries`
tables.

image::{perf_stat_url}/perf_stat_index_queries.png[Index queries report, link={perf_stat_image_url}/perf_stat_index_queries.png]

==== Tasks and jobs

Aggregated task execution statistics. The `Overall statistics` table shows executions, total duration, and job
counts per task:

image::{perf_stat_url}/perf_stat_tasks_and_jobs.png[Tasks and jobs overall statistics, link={perf_stat_image_url}/perf_stat_tasks_and_jobs.png]

The `Top of slowest tasks` table lists slow task executions. Each row expands into a `Jobs` subtable with per-node
job durations:

image::{perf_stat_url}/perf_stat_tasks_and_jobs_1.png[Top of slowest tasks with jobs subtable, link={perf_stat_image_url}/perf_stat_tasks_and_jobs_1.png]

==== System views

Rows from collected system views. The tab provides a node selector, a system view selector, and a searchable
paginated table for the selected view.

image::{perf_stat_url}/perf_stat_system_views.png[System views report, link={perf_stat_image_url}/perf_stat_system_views.png]

== Print Statistics

Ignite provides a tool to print statistics to a console or to a file in JSON format.
The extension provides a tool to print raw performance statistics records to the console or to a file in JSON format.
Use this tool when you need access to individual records.

Run the script from the release package of the tool to print statistics:

Expand All @@ -81,16 +159,40 @@ Run the script from the release package of the tool to print statistics:
performance-statistics-tool/print-statistics.sh path_to_files
----

Note that `path_to_files` is a path to the performance statistics file or files directory.
Note that `path_to_files` can point either to a single performance statistics file or to a directory with multiple
files.

The tool supports the following options:

The script provides the ability to filter operations by operation's type, time, or cache. For more details run the
help command:
[cols="1,3",opts="header"]
|===
|Option |Description

|`--out out_file` |Appends JSON output to the specified file.
|`--ops op_types` |Prints only the specified comma-separated operation types.
|`--from start_time_from` |Prints only records whose start time is greater than or equal to the specified value (Unix epoch milliseconds).
|`--to start_time_to` |Prints only records whose start time is less than or equal to the specified value (Unix epoch milliseconds).
|`--caches cache_names` |Prints only records related to the specified comma-separated cache names.
|===

For more details run the help command:

[source,shell]
----
performance-statistics-tool/print-statistics.sh --help
----

Examples:

[source,shell]
----
performance-statistics-tool/print-statistics.sh path_to_files --ops QUERY,QUERY_ROWS,QUERY_PROPERTY
performance-statistics-tool/print-statistics.sh path_to_files --ops SYSTEM_VIEW_ROW
performance-statistics-tool/print-statistics.sh path_to_files --ops CHECKPOINT,PAGES_WRITE_THROTTLE
performance-statistics-tool/print-statistics.sh path_to_files --from 1713700000000 --to 1713703600000
performance-statistics-tool/print-statistics.sh path_to_files --caches cache1,cache2 --out perfstat.json
----

See the output example below:

{"op":"CACHE_GET","nodeId":"955130d1-5218-4e46-87f6-62755e92e9b4","cacheId":-1809642915,"startTime":1616837094237,"duration":64992213}
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
94 changes: 84 additions & 10 deletions docs/_docs/monitoring-metrics/performance-statistics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,80 @@ Ignite provides a built-in tool for cluster profiling.
You can link:#collecting-statistics[collect] performance statistics from the cluster and then
link:#building-the-report[build] the performance report.

== What Is Collected

Performance statistics includes the following records:

[cols="1,3",opts="header"]
|===
|Category | Records
|Cache operations | Cache API and cache store operations: `get`, `put`, `remove`, `invoke`, `lock`, bulk operations,
conflict operations, and cache store load, write, and delete operations
|Cache metadata | Cache start records with cache IDs and cache names
|Transactions | Transaction commit and rollback records with affected cache IDs, start time, and duration
|Queries | SQL, SCAN, and INDEX query records with query text or cache/index details, query ID, start time, duration,
and success flag
|Query details | Query logical and physical reads, processed rows, and custom query properties
|Tasks and jobs | Compute task and job records with task names, session IDs, execution time, queue time,
affinity partition, and timeout flag
|Persistence | Checkpoint timing records and page write throttling records
|System views | System view schemas and rows captured when statistics collection starts (some views are skipped, see
the list below)
|===

The following system views are skipped because they are large or duplicate data that is already collected as
performance records:

* `baseline.node.attributes`
* `node.attributes`
* `metrics`
* `caches`
* `sql.queries`
* `nodes`
* `cacheGroupPageLists`
* `dataRegionPageLists`
* `partitionStates`
* `statisticsPartitionData`
* `metastorage`
* `distributed.metastorage`

== Collecting Statistics

link:#jmx[JMX interface] and link:#control-script[Control Script] are used to start and stop statistics collecting.
link:#jmx[JMX interface] and link:#control-script[Control Script] are used to manage statistics collecting.

Each node collects performance statistics in binary files. These files are placed under
the `Ignite_work_directory/perf_stat/` directory. Regular performance events are written to
`node-{nodeId}.prf`. If a file with the same name already exists, Ignite adds an index to the file name:
`node-{nodeId}-{index}.prf`.

When statistics collection starts, each node also writes a snapshot of system views
(link:monitoring-metrics/system-views[System Views]) to `node-{nodeId}-system-views.prf`.
If a file with the same name already exists, Ignite adds an index:
`node-{nodeId}-system-views-{index}.prf`. The `rotate` operation rotates only regular performance event files.

[WARNING]
====
[discrete]
=== Sensitive Information

Each node collects performance statistics in a binary file. This file is placed under
the `Ignite_work_directory/perf_stat/` directory. The name mask is `node-{nodeId}-{index}.prf`.
Performance statistics files can contain sensitive information, including query text, cache names, task names,
node attributes, and values exposed by system views. Store and share these files according to your security policies.
====

Performance statistics files are used to build the report offline.

Nodes use off-heap cyclic buffer to temporarily store serialized statistics. The writer thread flushes buffer to the
file when the flush size is reached. Some statistics are skipped if the buffer overflows due to a slow disk. See
the link:#system-properties[properties] section for customization.
For regular performance events, nodes use off-heap cyclic buffer to temporarily store serialized statistics. The writer
thread flushes buffer to the file when the flush size is reached. Some statistics are skipped if the buffer overflows
due to a slow disk. See the link:#system-properties[properties] section for customization.

Each statistics collection process creates a new file on nodes. Each next file has the same name with the
corresponding index. See the examples below:
Each statistics collection process creates new files on nodes. Each next file has the same name with the corresponding
index. See the examples below:

* `node-faedc6c9-3542-4610-ae10-4ff7e0600000.prf`
* `node-faedc6c9-3542-4610-ae10-4ff7e0600000-1.prf`
* `node-faedc6c9-3542-4610-ae10-4ff7e0600000-2.prf`
* `node-faedc6c9-3542-4610-ae10-4ff7e0600000-system-views.prf`
* `node-faedc6c9-3542-4610-ae10-4ff7e0600000-system-views-1.prf`

== Building the Report

Expand All @@ -54,8 +109,11 @@ Follow these steps to build the performance report:

/path_to_files/
├── node-162c7147-fef8-4ea2-bd25-8653c41fc7fa.prf
├── node-162c7147-fef8-4ea2-bd25-8653c41fc7fa-system-views.prf
├── node-7b8a7c5c-f3b7-46c3-90da-e66103c00001.prf
└── node-faedc6c9-3542-4610-ae10-4ff7e0600000.prf
├── node-7b8a7c5c-f3b7-46c3-90da-e66103c00001-system-views.prf
├── node-faedc6c9-3542-4610-ae10-4ff7e0600000.prf
└── node-faedc6c9-3542-4610-ae10-4ff7e0600000-system-views.prf

2. Run the script from the release package of the tool:

Expand All @@ -65,6 +123,22 @@ The performance report is created in the new directory under the performance sta
path: `path_to_files/report_yyyy-MM-dd_HH-mm-ss/`. Open `report_yyyy-MM-dd_HH-mm-ss/index.html` in the browser to see
the report.

You can also print raw performance statistics records to the console or to a file by using `print-statistics.sh` from
the `performance-statistics-ext` release package. For example, to print checkpoint timing and page write throttling
records:

[source,shell]
----
performance-statistics-tool/print-statistics.sh path_to_files --ops CHECKPOINT,PAGES_WRITE_THROTTLE
----

To write the output to a file, use the `--out` parameter:

[source,shell]
----
performance-statistics-tool/print-statistics.sh path_to_files --ops CHECKPOINT,PAGES_WRITE_THROTTLE --out checkpoints.json
----

== Management

The following section provides information on JMX, Control Script and system properties.
Expand Down Expand Up @@ -122,6 +196,6 @@ Parameters:
statistics collection is stopped when the file size is exceeded.
|IGNITE_PERF_STAT_BUFFER_SIZE | Integer | 32 Mb | Performance statistics off heap buffer size in bytes.
|IGNITE_PERF_STAT_FLUSH_SIZE | Integer | 8 Mb | Minimal performance statistics batch size to be flushed in bytes.
|IGNITE_PERF_STAT_CACHED_STRINGS_THRESHOLD | Integer | 1024 | Maximum performance statistics cached strings threshold.
|IGNITE_PERF_STAT_CACHED_STRINGS_THRESHOLD | Integer | 10240 | Maximum performance statistics cached strings threshold.
String caching is stopped when the threshold is exceeded.
|===