Skip to content

Commit

Permalink
Performance: Add hints to overload protection and ANALYZE command
Browse files Browse the repository at this point in the history
  • Loading branch information
Christian Kurze authored and amotl committed Nov 18, 2023
1 parent 3cc03e2 commit b1c5922
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 1 deletion.
5 changes: 5 additions & 0 deletions docs/handbook/performance/inserts/methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,11 @@ To test :ref:`bulk operations <inserts_bulk_operations>`, you should:

Try out different setups and re-run the test.

Please note that ``INSERT INTO`` statements using a query, and the ``COPY FROM``
statement, are using overload protection to ensure performance of other queries
in parallel. Refer to the :ref:`Overload Protection <crate-reference:overload_protection>`
documentation on how to modify these parameters.

At the end of this process, you will have a better understanding of the
throughput of your cluster with different setups and under different loads.

Expand Down
33 changes: 32 additions & 1 deletion docs/handbook/performance/inserts/tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,22 @@ Translog
If `translog.durability`_ is set to ``REQUEST`` (the default), the translog
gets flushed after every operation. Setting this to ``ASYNC`` will improve
insert performance, but it also worsens durability. If a node crashes before a
translog has been synced, those opperations will be lost.
translog has been synced, those operations will be lost.

Overload Protection
-------------------

The :ref:`Overload Protection <crate-reference:overload_protection>` settings
control how many resources operations like ``INSERT INTO FROM ...`` or ``COPY``
can use.

The default values serve as a starting point for an algorithm that dynamically
adapts the effective concurrency limit based on the round-trip time of requests.
Whenever one of these settings is updated, the previously calculated effective
concurrency is reset.

Please update the settings accordingly, especially if you are benchmarking insert
performance.

Refresh interval
----------------
Expand All @@ -113,6 +128,21 @@ If you know that your client application can tollerate a higher refresh
interval, you can expect to see performance improvements if you increase this
value.

Calculating statistics
----------------------

After loading larger amounts of data into new or existing tables, it is
recommended to re-calculate the statistics by executing the ``ANALYZE``
command. The statistics will be used by the query optimizer to generate
better execution plans.

The calculation of statistics happens periodically. The bandwidth used for
collecting statistics is limited by applying throttling based on the maximum
amount of bytes per second that can be read from data nodes.

Please refer to the `ANALYZE`_ documentation for further information how to
change the calculation interval, and how to configure throttling settings.

Manual optimizing
-----------------

Expand All @@ -129,6 +159,7 @@ However, if you are doing a lot of inserts, you may want to optimize tables (or
even specific partitions) on your own schedule. If so, you can use the
`OPTIMIZE`_ command.

.. _ANALYZE: https://cratedb.com/docs/crate/reference/en/latest/sql/statements/analyze.html
.. _fulltext indexes: https://crate.io/docs/crate/reference/en/latest/sql/fulltext.html
.. _natural primary key: https://en.wikipedia.org/wiki/Natural_key
.. _OPTIMIZE: https://crate.io/docs/crate/reference/en/latest/sql/reference/optimize.html
Expand Down

0 comments on commit b1c5922

Please sign in to comment.