Skip to content

Commit

Permalink
CTE pushdown (#908)
Browse files Browse the repository at this point in the history
  • Loading branch information
jonels-msft committed Mar 11, 2020
1 parent 1f811d0 commit 3272d89
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 12 deletions.
2 changes: 1 addition & 1 deletion develop/api_guc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ Use the binary copy format to transfer data between coordinator and the workers.
citus.max_intermediate_result_size (integer)
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

The maximum size in KB of intermediate results for CTEs and complex subqueries. The default is 1GB, and a value of -1 means no limit. Queries exceeding the limit will be canceled and produce an error message.
The maximum size in KB of intermediate results for CTEs that are unable to be pushed down to worker nodes for execution, and for complex subqueries. The default is 1GB, and a value of -1 means no limit. Queries exceeding the limit will be canceled and produce an error message.

DDL
-------------------------------------------------------------------
Expand Down
16 changes: 5 additions & 11 deletions performance/performance_tuning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -212,19 +212,13 @@ If the query has subqueries or CTEs that exceed this limit, the query will be ca
DETAIL: Citus restricts the size of intermediate results of complex subqueries and CTEs to avoid accidentally pulling large result sets into once place.
HINT: To run the current query, set citus.max_intermediate_result_size to a higher value or -1 to disable.

If this happens, consider whether you can move limits or filters inside CTEs/subqueries. For instance
When using CTEs, or joins between CTEs and distributed tables, you can avoid push-pull execution by following these rules:

.. code-block:: sql
-- It's slow to retrieve all rows and limit afterward
WITH cte_slow AS (SELECT * FROM users_table)
SELECT * FROM cte_slow LIMIT 10;
-- Limiting inside makes the intermediate results small
* Tables should be colocated
* The CTE queries should not require any merge steps (e.g., LIMIT or GROUP BY on a non-distribution key)
* Tables and CTEs should be joined on distribution keys

WITH cte_fast AS (SELECT * FROM users_table LIMIT 10)
SELECT * FROM cte_fast;
Also PostgreSQL 12 or above allows Citus to take advantage of *CTE inlining* to push CTEs down to workers in more circumstances. The inlining behavior can be controlled with the ``MATERIALIZED`` keyword -- see the `PostgreSQL docs <https://www.postgresql.org/docs/current/queries-with.html>`_ for details.

.. _advanced_performance_tuning:

Expand Down
1 change: 1 addition & 0 deletions sharding/data_modeling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ The full list of Citus features that are unlocked by co-location are:
* Aggregation through INSERT..SELECT
* Foreign keys
* Distributed outer joins
* Pushdown CTEs (requires PostgreSQL >=12)

Data co-location is a powerful technique for providing both horizontal scale and support to relational data models. The cost of migrating or building applications using a distributed database that enables relational operations through co-location is often substantially lower than moving to a restrictive data model (e.g. NoSQL) and, unlike a single-node database, it can scale out with the size of your business. For more information about migrating an existing database see :ref:`Transitioning to a Multi-Tenant Data Model <transitioning_mt>`.

Expand Down

0 comments on commit 3272d89

Please sign in to comment.