Skip to content

Commit

Permalink
Document citus.enable_binary_protocol (#952)
Browse files Browse the repository at this point in the history
Document citus.enable_binary_protocol

Co-authored-by: Joe Nelson <jonels@microsoft.com>
  • Loading branch information
JelteF and jonels-msft committed Sep 5, 2020
1 parent 3993aec commit e12faa2
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 0 deletions.
15 changes: 15 additions & 0 deletions develop/api_guc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,21 @@ citus.enable_repartitioned_insert_select (boolean)

By default, an INSERT INTO … SELECT statement that cannot be pushed down will attempt to repartition rows from the SELECT statement and transfer them between workers for insertion. However, if the target table has too many shards then repartitioning will probably not perform well. The overhead of processing the shard intervals when determining how to partition the results is too great. Repartitioning can be disabled manually by setting ``citus.enable_repartitioned_insert_select`` to false.

citus.enable_binary_protocol (boolean)
**************************************

Setting this parameter to true instructs the coordinator node to use
PostgreSQL's binary serialization format (when applicable) to transfer data
with workers. Some column types do not support binary serialization.

Enabling this parameter is mostly useful when the workers must return large
amounts of data. Examples are when a lot of rows are requested, the rows have
many columns, or they use big types such as ``hll`` from the postgresql-hll
extension.

The default value is false, which means all results are encoded and transferred
in text format.

Adaptive executor configuration
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Expand Down
14 changes: 14 additions & 0 deletions performance/performance_tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,20 @@ There are two configuration parameters which relate to the format in which inter

However, for certain data types like hll or hstore arrays, the cost of serializing and deserializing data is pretty high. In such cases, using binary format for transferring intermediate data can improve query performance due to reduced CPU usage. There are two configuration parameters which can be used to tune this behaviour, citus.binary_master_copy_format and citus.binary_worker_copy_format. Enabling the former uses binary format to transfer intermediate query results from the workers to the coordinator while the latter is useful in queries which require dynamic shuffling of intermediate data between workers.

Binary protocol
---------------

In some cases, a large part of query time is spent in sending query results
from workers to the coordinator. This mostly happens when queries request many
rows (such as ``select * from table``), or when result columns use big types
(like ``hll`` or ``tdigest`` from the postgresql-hll and tdigest extensions).

In those cases it can be beneficial to set ``citus.enable_binary_protocol`` to
``true``, which will change the encoding of the results to binary, rather than
using text encoding. Binary encoding significantly reduce bandwidth for types
that have a compact binary representation, such as ``hll``, ``tdigest``,
``timestamp`` and ``double precision``.

Adaptive Executor
-------------------------------

Expand Down

0 comments on commit e12faa2

Please sign in to comment.