Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Add hints to overload protection and ANALYZE command #6

Merged
merged 1 commit into from
Nov 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/handbook/performance/inserts/methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,11 @@ To test :ref:`bulk operations <inserts_bulk_operations>`, you should:

Try out different setups and re-run the test.

Please note that ``INSERT INTO`` statements using a query, and the ``COPY FROM``
statement, are using overload protection to ensure performance of other queries
in parallel. Refer to the :ref:`Overload Protection <crate-reference:overload_protection>`
documentation on how to modify these parameters.

At the end of this process, you will have a better understanding of the
throughput of your cluster with different setups and under different loads.

Expand Down
33 changes: 32 additions & 1 deletion docs/handbook/performance/inserts/tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,22 @@ Translog
If `translog.durability`_ is set to ``REQUEST`` (the default), the translog
gets flushed after every operation. Setting this to ``ASYNC`` will improve
insert performance, but it also worsens durability. If a node crashes before a
translog has been synced, those opperations will be lost.
translog has been synced, those operations will be lost.

Overload Protection
-------------------

The :ref:`Overload Protection <crate-reference:overload_protection>` settings
Comment on lines +103 to +106
Copy link
Member Author

@amotl amotl Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI @ckurze: This shows how you can directly link to another documentation section which is also based on Sphinx, maintaining an index of its references for consumption by others. It is more accurate than using plain HTTP links.

control how many resources operations like ``INSERT INTO FROM ...`` or ``COPY``
can use.

The default values serve as a starting point for an algorithm that dynamically
adapts the effective concurrency limit based on the round-trip time of requests.
Whenever one of these settings is updated, the previously calculated effective
concurrency is reset.

Please update the settings accordingly, especially if you are benchmarking insert
performance.

Refresh interval
----------------
Expand All @@ -113,6 +128,21 @@ If you know that your client application can tollerate a higher refresh
interval, you can expect to see performance improvements if you increase this
value.

Calculating statistics
----------------------

After loading larger amounts of data into new or existing tables, it is
recommended to re-calculate the statistics by executing the ``ANALYZE``
command. The statistics will be used by the query optimizer to generate
better execution plans.

The calculation of statistics happens periodically. The bandwidth used for
collecting statistics is limited by applying throttling based on the maximum
amount of bytes per second that can be read from data nodes.

Please refer to the `ANALYZE`_ documentation for further information how to
change the calculation interval, and how to configure throttling settings.

Manual optimizing
-----------------

Expand All @@ -129,6 +159,7 @@ However, if you are doing a lot of inserts, you may want to optimize tables (or
even specific partitions) on your own schedule. If so, you can use the
`OPTIMIZE`_ command.

.. _ANALYZE: https://cratedb.com/docs/crate/reference/en/latest/sql/statements/analyze.html
.. _fulltext indexes: https://crate.io/docs/crate/reference/en/latest/sql/fulltext.html
.. _natural primary key: https://en.wikipedia.org/wiki/Natural_key
.. _OPTIMIZE: https://crate.io/docs/crate/reference/en/latest/sql/reference/optimize.html
Expand Down