Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Hints to overload protection and ANALYZE command #325

Merged
merged 2 commits into from
Nov 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from crate.theme.rtd.conf.crate_howtos import *

linkcheck_ignore = [
# Server not available on 2023-11-16.
r"https://.+\.r-project.org/.*",
# Forbidden by WordPress
"https://crate.io/wp-content/uploads/2018/11/copy_from_population_data.zip",
r'http://localhost:\d+/',
Expand Down
5 changes: 5 additions & 0 deletions docs/performance/inserts/methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,10 @@ To test :ref:`bulk operations <inserts_bulk_operations>`, you should:

Try out different setups and re-run the test.

Please note that ``INSERT INTO`` using a ``query`` and the ``COPY FROM`` statement
are using overload protection to ensure performance of other queries in parallel.
Refer to the `Overload Protection`_ documentation on how to modify these parameters.

At the end of this process, you will have a better understanding of the
throughput of your cluster with different setups and under different loads.

Expand All @@ -307,3 +311,4 @@ throughput of your cluster with different setups and under different loads.
.. _translog.durability: https://crate.io/docs/crate/reference/en/latest/sql/reference/create_table.html#translog-durability
.. _UNNEST reference documentation: https://crate.io/docs/crate/reference/en/latest/sql/statements/insert.html?highlight=unnest#description
.. _UNNEST: https://crate.io/docs/crate/reference/en/latest/sql/table_functions.html#unnest-array-array
.. _Overload Protection: https://cratedb.com/docs/crate/reference/en/latest/config/cluster.html#overload-protection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI: We usually sort those alphabetically, instead of just adding new links at the bottom.

30 changes: 30 additions & 0 deletions docs/performance/inserts/tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,20 @@ gets flushed after every operation. Setting this to ``ASYNC`` will improve
insert performance, but it also worsens durability. If a node crashes before a
translog has been synced, those opperations will be lost.

Overload Protection
-------------------

The `Overload Protection`_ settings control how many resources operations like
amotl marked this conversation as resolved.
Show resolved Hide resolved
``INSERT INTO FROM QUERY`` or ``COPY`` can use.

The default values serve as a starting point for an algorithm that dynamically
adapts the effective concurrency limit based on the round-trip time of requests.
Whenever one of these settings is updated, the previously calculated effective
concurrency is reset.

Please update the settings accordingly, especially if you are benchmarking insert
performance.

Comment on lines +103 to +116
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few trailing spaces sneaked in here. Can I humbly ask you to configure your editor to strip those, or take extra care when committing?

image

Refresh interval
----------------

Expand All @@ -113,6 +127,20 @@ If you know that your client application can tollerate a higher refresh
interval, you can expect to see performance improvements if you increase this
value.

Calculating statistics
----------------------

After loading larger amounts of data into new or existing tables, it is recommended
to re-calculate the statistics by executing the ``ANALYZE`` command.
The statistics will be used by the query optimizer to generate better execution plans.

The calculation of statistics happens periodically. The bandwidth used for collecting statistics
is limited by applying throttling based on the maximum amount of bytes per second that can
be read from data nodes.

Please refer to the `ANALYZE`_ documentation for further information how to change the
calculation interval, and how to configure throttling settings.

Manual optimizing
-----------------

Expand All @@ -132,9 +160,11 @@ even specific partitions) on your own schedule. If so, you can use the
.. _fulltext indexes: https://crate.io/docs/crate/reference/en/latest/sql/fulltext.html
.. _natural primary key: https://en.wikipedia.org/wiki/Natural_key
.. _OPTIMIZE: https://crate.io/docs/crate/reference/en/latest/sql/reference/optimize.html
.. _ANALYZE: https://cratedb.com/docs/crate/reference/en/latest/sql/statements/analyze.html
.. _refresh_interval: https://crate.io/docs/crate/reference/en/latest/sql/reference/create_table.html#refresh-interval
.. _Solid-State Drives: https://en.wikipedia.org/wiki/Solid-state_drive
.. _surrogate primary key: https://en.wikipedia.org/wiki/Surrogate_key
.. _system column: https://crate.io/docs/crate/reference/en/latest/sql/administration/system_columns.html
.. _translog.durability: https://crate.io/docs/crate/reference/en/latest/sql/reference/create_table.html#translog-durability
.. _turning column indexes off: https://crate.io/docs/crate/reference/en/latest/sql/ddl/indices_full_search.html#disable-indexing
.. _Overload Protection: https://cratedb.com/docs/crate/reference/en/latest/config/cluster.html#overload-protection