0.202/admin/properties.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>4.3. Properties Reference &#8212; Presto 0.202 Documentation</title>
    
    <link rel="stylesheet" href="../_static/presto.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '0.202',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="top" title="Presto 0.202 Documentation" href="../index.html" />
    <link rel="up" title="4. Administration" href="../admin.html" />
    <link rel="next" title="4.4. Spill to Disk" href="spill.html" />
    <link rel="prev" title="4.2. Tuning Presto" href="tuning.html" /> 
  </head>
  <body role="document">
<div class="header">
    <h1 class="heading"><a href="../index.html">
        <span>Presto 0.202 Documentation</span></a></h1>
    <h2 class="heading"><span>4.3. Properties Reference</span></h2>
</div>
<div class="topnav">
    
<p class="nav">
    <span class="left">
        &laquo; <a href="tuning.html">4.2. Tuning Presto</a>
    </span>
    <span class="right">
        <a href="spill.html">4.4. Spill to Disk</a> &raquo;
    </span>
</p>

</div>
<div class="content">
    
  <div class="section" id="properties-reference">
<h1>4.3. Properties Reference</h1>
<p>This section describes the most important config properties that
may be used to tune Presto or alter its behavior when required.</p>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#general-properties" id="id2">General Properties</a></li>
<li><a class="reference internal" href="#spilling-properties" id="id3">Spilling Properties</a></li>
<li><a class="reference internal" href="#exchange-properties" id="id4">Exchange Properties</a></li>
<li><a class="reference internal" href="#task-properties" id="id5">Task Properties</a></li>
<li><a class="reference internal" href="#node-scheduler-properties" id="id6">Node Scheduler Properties</a></li>
<li><a class="reference internal" href="#optimizer-properties" id="id7">Optimizer Properties</a></li>
<li><a class="reference internal" href="#regular-expression-function-properties" id="id8">Regular Expression Function Properties</a></li>
</ul>
</div>
<div class="section" id="general-properties">
<h2>General Properties</h2>
<div class="section" id="distributed-joins-enabled">
<h3><code class="docutils literal"><span class="pre">distributed-joins-enabled</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>Use hash distributed joins instead of broadcast joins. Distributed joins
require redistributing both tables using a hash of the join key. This can
be slower (sometimes substantially) than broadcast joins, but allows much
larger joins. Broadcast joins require that the tables on the right side of
the join after filtering fit in memory on each node, whereas distributed joins
only need to fit in distributed memory across all nodes. This can also be
specified on a per-query basis using the <code class="docutils literal"><span class="pre">distributed_join</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="redistribute-writes">
<h3><code class="docutils literal"><span class="pre">redistribute-writes</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>This property enables redistribution of data before writing. This can
eliminate the performance impact of data skew when writing by hashing it
across nodes in the cluster. It can be disabled when it is known that the
output data set is not skewed in order to avoid the overhead of hashing and
redistributing all the data across the network. This can also be specified
on a per-query basis using the <code class="docutils literal"><span class="pre">redistribute_writes</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="resources-reserved-system-memory">
<h3><code class="docutils literal"><span class="pre">resources.reserved-system-memory</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">JVM</span> <span class="pre">max</span> <span class="pre">memory</span> <span class="pre">*</span> <span class="pre">0.4</span></code></li>
</ul>
<p>The amount of JVM memory reserved, for accounting purposes, for things
that are not directly attributable to or controllable by a user query.
For example, output buffers, code caches, etc. This also accounts for
memory that is not tracked by the memory tracking system.</p>
<p>The purpose of this property is to prevent the JVM from running out of
memory (OOM). The default value is suitable for smaller JVM heap sizes or
clusters with many concurrent queries. If running fewer queries with a
large heap, a smaller value may work. Basically, set this value large
enough that the JVM does not fail with <code class="docutils literal"><span class="pre">OutOfMemoryError</span></code>.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="spilling-properties">
<span id="tuning-spilling"></span><h2>Spilling Properties</h2>
<div class="section" id="experimental-spill-enabled">
<h3><code class="docutils literal"><span class="pre">experimental.spill-enabled</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">false</span></code></li>
</ul>
<p>Try spilling memory to disk to avoid exceeding memory limits for the query.</p>
<p>Spilling works by offloading memory to disk. This process can allow a query with a large memory
footprint to pass at the cost of slower execution times. Currently, spilling is supported only for
aggregations and joins (inner and outer), so this property will not reduce memory usage required for
window functions, sorting and other join types.</p>
<p>Be aware that this is an experimental feature and should be used with care.</p>
<p>This config property can be overridden by the <code class="docutils literal"><span class="pre">spill_enabled</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-spiller-spill-path">
<h3><code class="docutils literal"><span class="pre">experimental.spiller-spill-path</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">string</span></code></li>
<li><strong>No default value.</strong> Must be set when spilling is enabled</li>
</ul>
<p>Directory where spilled content will be written. It can be a comma separated
list to spill simultaneously to multiple directories, which helps to utilize
multiple drives installed in the system.</p>
<p>It is not recommended to spill to system drives. Most importantly, do not spill
to the drive on which the JVM logs are written, as disk overutilization might
cause JVM to pause for lengthy periods, causing queries to fail.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-spiller-max-used-space-threshold">
<h3><code class="docutils literal"><span class="pre">experimental.spiller-max-used-space-threshold</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">double</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">0.9</span></code></li>
</ul>
<p>If disk space usage ratio of a given spill path is above this threshold,
this spill path will not be eligible for spilling.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-spiller-threads">
<h3><code class="docutils literal"><span class="pre">experimental.spiller-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">4</span></code></li>
</ul>
<p>Number of spiller threads. Increase this value if the default is not able
to saturate the underlying spilling device (for example, when using RAID).</p>
</div></blockquote>
</div>
<div class="section" id="experimental-max-spill-per-node">
<h3><code class="docutils literal"><span class="pre">experimental.max-spill-per-node</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span> <span class="pre">GB</span></code></li>
</ul>
<p>Max spill space to be used by all queries on a single node.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-query-max-spill-per-node">
<h3><code class="docutils literal"><span class="pre">experimental.query-max-spill-per-node</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span> <span class="pre">GB</span></code></li>
</ul>
<p>Max spill space to be used by a single query on a single node.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-aggregation-operator-unspill-memory-limit">
<h3><code class="docutils literal"><span class="pre">experimental.aggregation-operator-unspill-memory-limit</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">4</span> <span class="pre">MB</span></code></li>
</ul>
<p>Limit for memory used for unspilling a single aggregation operator instance.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="exchange-properties">
<h2>Exchange Properties</h2>
<p>Exchanges transfer data between Presto nodes for different stages of
a query. Adjusting these properties may help to resolve inter-node
communication issues or improve network utilization.</p>
<div class="section" id="exchange-client-threads">
<h3><code class="docutils literal"><span class="pre">exchange.client-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">25</span></code></li>
</ul>
<p>Number of threads used by exchange clients to fetch data from other Presto
nodes. A higher value can improve performance for large clusters or clusters
with very high concurrency, but excessively high values may cause a drop
in performance due to context switches and additional memory usage.</p>
</div></blockquote>
</div>
<div class="section" id="exchange-concurrent-request-multiplier">
<h3><code class="docutils literal"><span class="pre">exchange.concurrent-request-multiplier</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">3</span></code></li>
</ul>
<p>Multiplier determining the number of concurrent requests relative to
available buffer memory. The maximum number of requests is determined
using a heuristic of the number of clients that can fit into available
buffer space based on average buffer usage per request times this
multiplier. For example, with an <code class="docutils literal"><span class="pre">exchange.max-buffer-size</span></code> of <code class="docutils literal"><span class="pre">32</span> <span class="pre">MB</span></code>
and <code class="docutils literal"><span class="pre">20</span> <span class="pre">MB</span></code> already used and average size per request being <code class="docutils literal"><span class="pre">2MB</span></code>,
the maximum number of clients is
<code class="docutils literal"><span class="pre">multiplier</span> <span class="pre">*</span> <span class="pre">((32MB</span> <span class="pre">-</span> <span class="pre">20MB)</span> <span class="pre">/</span> <span class="pre">2MB)</span> <span class="pre">=</span> <span class="pre">multiplier</span> <span class="pre">*</span> <span class="pre">6</span></code>. Tuning this
value adjusts the heuristic, which may increase concurrency and improve
network utilization.</p>
</div></blockquote>
</div>
<div class="section" id="exchange-max-buffer-size">
<h3><code class="docutils literal"><span class="pre">exchange.max-buffer-size</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">32MB</span></code></li>
</ul>
<p>Size of buffer in the exchange client that holds data fetched from other
nodes before it is processed. A larger buffer can increase network
throughput for larger clusters and thus decrease query processing time,
but will reduce the amount of memory available for other usages.</p>
</div></blockquote>
</div>
<div class="section" id="exchange-max-response-size">
<h3><code class="docutils literal"><span class="pre">exchange.max-response-size</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1MB</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">16MB</span></code></li>
</ul>
<p>Maximum size of a response returned from an exchange request. The response
will be placed in the exchange client buffer which is shared across all
concurrent requests for the exchange.</p>
<p>Increasing the value may improve network throughput if there is high
latency. Decreasing the value may improve query performance for large
clusters as it reduces skew due to the exchange client buffer holding
responses for more tasks (rather than hold more data from fewer tasks).</p>
</div></blockquote>
</div>
<div class="section" id="sink-max-buffer-size">
<h3><code class="docutils literal"><span class="pre">sink.max-buffer-size</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">32MB</span></code></li>
</ul>
<p>Output buffer size for task data that is waiting to be pulled by upstream
tasks. If the task output is hash partitioned, then the buffer will be
shared across all of the partitioned consumers. Increasing this value may
improve network throughput for data transferred between stages if the
network has high latency or if there are many nodes in the cluster.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="task-properties">
<span id="id1"></span><h2>Task Properties</h2>
<div class="section" id="task-concurrency">
<h3><code class="docutils literal"><span class="pre">task.concurrency</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Restrictions:</strong> must be a power of two</li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">16</span></code></li>
</ul>
<p>Default local concurrency for parallel operators such as joins and aggregations.
This value should be adjusted up or down based on the query concurrency and worker
resource utilization. Lower values are better for clusters that run many queries
concurrently because the cluster will already be utilized by all the running
queries, so adding more concurrency will result in slow downs due to context
switching and other overhead. Higher values are better for clusters that only run
one or a few queries at a time. This can also be specified on a per-query basis
using the <code class="docutils literal"><span class="pre">task_concurrency</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="task-http-response-threads">
<h3><code class="docutils literal"><span class="pre">task.http-response-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span></code></li>
</ul>
<p>Maximum number of threads that may be created to handle HTTP responses. Threads are
created on demand and are cleaned up when idle, thus there is no overhead to a large
value if the number of requests to be handled is small. More threads may be helpful
on clusters with a high number of concurrent queries, or on clusters with hundreds
or thousands of workers.</p>
</div></blockquote>
</div>
<div class="section" id="task-http-timeout-threads">
<h3><code class="docutils literal"><span class="pre">task.http-timeout-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">3</span></code></li>
</ul>
<p>Number of threads used to handle timeouts when generating HTTP responses. This value
should be increased if all the threads are frequently in use. This can be monitored
via the <code class="docutils literal"><span class="pre">com.facebook.presto.server:name=AsyncHttpExecutionMBean:TimeoutExecutor</span></code>
JMX object. If <code class="docutils literal"><span class="pre">ActiveCount</span></code> is always the same as <code class="docutils literal"><span class="pre">PoolSize</span></code>, increase the
number of threads.</p>
</div></blockquote>
</div>
<div class="section" id="task-info-update-interval">
<h3><code class="docutils literal"><span class="pre">task.info-update-interval</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">duration</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1ms</span></code></li>
<li><strong>Maximum value:</strong> <code class="docutils literal"><span class="pre">10s</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">3s</span></code></li>
</ul>
<p>Controls staleness of task information, which is used in scheduling. Larger values
can reduce coordinator CPU load, but may result in suboptimal split scheduling.</p>
</div></blockquote>
</div>
<div class="section" id="task-max-partial-aggregation-memory">
<h3><code class="docutils literal"><span class="pre">task.max-partial-aggregation-memory</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">16MB</span></code></li>
</ul>
<p>Maximum size of partial aggregation results for distributed aggregations. Increasing this
value can result in less network transfer and lower CPU utilization by allowing more
groups to be kept locally before being flushed, at the cost of additional memory usage.</p>
</div></blockquote>
</div>
<div class="section" id="task-max-worker-threads">
<h3><code class="docutils literal"><span class="pre">task.max-worker-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">Node</span> <span class="pre">CPUs</span> <span class="pre">*</span> <span class="pre">2</span></code></li>
</ul>
<p>Sets the number of threads used by workers to process splits. Increasing this number
can improve throughput if worker CPU utilization is low and all the threads are in use,
but will cause increased heap space usage. Setting the value too high may cause a drop
in performance due to a context switching. The number of active threads is available
via the <code class="docutils literal"><span class="pre">RunningSplits</span></code> property of the
<code class="docutils literal"><span class="pre">com.facebook.presto.execution.executor:name=TaskExecutor.RunningSplits</span></code> JXM object.</p>
</div></blockquote>
</div>
<div class="section" id="task-min-drivers">
<h3><code class="docutils literal"><span class="pre">task.min-drivers</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">task.max-worker-threads</span> <span class="pre">*</span> <span class="pre">2</span></code></li>
</ul>
<p>The target number of running leaf splits on a worker. This is a minimum value because
each leaf task is guaranteed at least <code class="docutils literal"><span class="pre">3</span></code> running splits. Non-leaf tasks are also
guaranteed to run in order to prevent deadlocks. A lower value may improve responsiveness
for new tasks, but can result in underutilized resources. A higher value can increase
resource utilization, but uses additional memory.</p>
</div></blockquote>
</div>
<div class="section" id="task-writer-count">
<h3><code class="docutils literal"><span class="pre">task.writer-count</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Restrictions:</strong> must be a power of two</li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
</ul>
<p>The number of concurrent writer threads per worker per query. Increasing this value may
increase write speed, especially when a query is not I/O bound and can take advantage
of additional CPU for parallel writes (some connectors can be bottlenecked on CPU when
writing due to compression or other factors). Setting this too high may cause the cluster
to become overloaded due to excessive resource utilization. This can also be specified on
a per-query basis using the <code class="docutils literal"><span class="pre">task_writer_count</span></code> session property.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="node-scheduler-properties">
<h2>Node Scheduler Properties</h2>
<div class="section" id="node-scheduler-max-splits-per-node">
<h3><code class="docutils literal"><span class="pre">node-scheduler.max-splits-per-node</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span></code></li>
</ul>
<p>The target value for the total number of splits that can be running for
each worker node.</p>
<p>Using a higher value is recommended if queries are submitted in large batches
(e.g., running a large group of reports periodically) or for connectors that
produce many splits that complete quickly. Increasing this value may improve
query latency by ensuring that the workers have enough splits to keep them
fully utilized.</p>
<p>Setting this too high will waste memory and may result in lower performance
due to splits not being balanced across workers. Ideally, it should be set
such that there is always at least one split waiting to be processed, but
not higher.</p>
</div></blockquote>
</div>
<div class="section" id="node-scheduler-max-pending-splits-per-task">
<h3><code class="docutils literal"><span class="pre">node-scheduler.max-pending-splits-per-task</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">10</span></code></li>
</ul>
<p>The number of outstanding splits that can be queued for each worker node
for a single stage of a query, even when the node is already at the limit for
total number of splits. Allowing a minimum number of splits per stage is
required to prevent starvation and deadlocks.</p>
<p>This value must be smaller than <code class="docutils literal"><span class="pre">node-scheduler.max-splits-per-node</span></code>,
will usually be increased for the same reasons, and has similar drawbacks
if set too high.</p>
</div></blockquote>
</div>
<div class="section" id="node-scheduler-min-candidates">
<h3><code class="docutils literal"><span class="pre">node-scheduler.min-candidates</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">10</span></code></li>
</ul>
<p>The minimum number of candidate nodes that will be evaluated by the
node scheduler when choosing the target node for a split. Setting
this value too low may prevent splits from being properly balanced
across all worker nodes. Setting it too high may increase query
latency and increase CPU usage on the coordinator.</p>
</div></blockquote>
</div>
<div class="section" id="node-scheduler-network-topology">
<h3><code class="docutils literal"><span class="pre">node-scheduler.network-topology</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">string</span></code></li>
<li><strong>Allowed values:</strong> <code class="docutils literal"><span class="pre">legacy</span></code>, <code class="docutils literal"><span class="pre">flat</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">legacy</span></code></li>
</ul>
</div></blockquote>
</div>
</div>
<div class="section" id="optimizer-properties">
<h2>Optimizer Properties</h2>
<div class="section" id="optimizer-dictionary-aggregation">
<h3><code class="docutils literal"><span class="pre">optimizer.dictionary-aggregation</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">false</span></code></li>
</ul>
<p>Enables optimization for aggregations on dictionaries. This can also be specified
on a per-query basis using the <code class="docutils literal"><span class="pre">dictionary_aggregation</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-optimize-hash-generation">
<h3><code class="docutils literal"><span class="pre">optimizer.optimize-hash-generation</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>Compute hash codes for distribution, joins, and aggregations early during execution,
allowing result to be shared between operations later in the query. This can reduce
CPU usage by avoiding computing the same hash multiple times, but at the cost of
additional network transfer for the hashes. In most cases it will decrease overall
query processing time. This can also be specified on a per-query basis using the
<code class="docutils literal"><span class="pre">optimize_hash_generation</span></code> session property.</p>
<p>It is often helpful to disable this property when using <a class="reference internal" href="../sql/explain.html"><span class="doc">EXPLAIN</span></a> in order
to make the query plan easier to read.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-optimize-metadata-queries">
<h3><code class="docutils literal"><span class="pre">optimizer.optimize-metadata-queries</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">false</span></code></li>
</ul>
<p>Enable optimization of some aggregations by using values that are stored as metadata.
This allows Presto to execute some simple queries in constant time. Currently, this
optimization applies to <code class="docutils literal"><span class="pre">max</span></code>, <code class="docutils literal"><span class="pre">min</span></code> and <code class="docutils literal"><span class="pre">approx_distinct</span></code> of partition
keys and other aggregation insensitive to the cardinality of the input (including
<code class="docutils literal"><span class="pre">DISTINCT</span></code> aggregates). Using this may speed up some queries significantly.</p>
<p>The main drawback is that it can produce incorrect results if the connector returns
partition keys for partitions that have no rows. In particular, the Hive connector
can return empty partitions if they were created by other systems (Presto cannot
create them).</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-optimize-single-distinct">
<h3><code class="docutils literal"><span class="pre">optimizer.optimize-single-distinct</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>The single distinct optimization will try to replace multiple <code class="docutils literal"><span class="pre">DISTINCT</span></code> clauses
with a single <code class="docutils literal"><span class="pre">GROUP</span> <span class="pre">BY</span></code> clause, which can be substantially faster to execute.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-push-aggregation-through-join">
<h3><code class="docutils literal"><span class="pre">optimizer.push-aggregation-through-join</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>When an aggregation is above an outer join and all columns from the outer side of the join
are in the grouping clause, the aggregation is pushed below the outer join. This optimization
is particularly useful for correlated scalar subqueries, which get rewritten to an aggregation
over an outer join. For example:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">item</span> <span class="n">i</span>
    <span class="k">WHERE</span> <span class="n">i</span><span class="p">.</span><span class="n">i_current_price</span> <span class="o">&gt;</span> <span class="p">(</span>
        <span class="k">SELECT</span> <span class="k">AVG</span><span class="p">(</span><span class="n">j</span><span class="p">.</span><span class="n">i_current_price</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">item</span> <span class="n">j</span>
            <span class="k">WHERE</span> <span class="n">i</span><span class="p">.</span><span class="n">i_category</span> <span class="o">=</span> <span class="n">j</span><span class="p">.</span><span class="n">i_category</span><span class="p">);</span>
</pre></div>
</div>
<p>Enabling this optimization can substantially speed up queries by reducing
the amount of data that needs to be processed by the join.  However, it may slow down some
queries that have very selective joins. This can also be specified on a per-query basis using
the <code class="docutils literal"><span class="pre">push_aggregation_through_join</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-push-table-write-through-union">
<h3><code class="docutils literal"><span class="pre">optimizer.push-table-write-through-union</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>Parallelize writes when using <code class="docutils literal"><span class="pre">UNION</span> <span class="pre">ALL</span></code> in queries that write data. This improves the
speed of writing output tables in <code class="docutils literal"><span class="pre">UNION</span> <span class="pre">ALL</span></code> queries because these writes do not require
additional synchronization when collecting results. Enabling this optimization can improve
<code class="docutils literal"><span class="pre">UNION</span> <span class="pre">ALL</span></code> speed when write speed is not yet saturated. However, it may slow down queries
in an already heavily loaded system. This can also be specified on a per-query basis
using the <code class="docutils literal"><span class="pre">push_table_write_through_union</span></code> session property.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="regular-expression-function-properties">
<h2>Regular Expression Function Properties</h2>
<p>The following properties allow tuning the <a class="reference internal" href="../functions/regexp.html"><span class="doc">Regular Expression Functions</span></a>.</p>
<div class="section" id="regex-library">
<h3><code class="docutils literal"><span class="pre">regex-library</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">string</span></code></li>
<li><strong>Allowed values:</strong> <code class="docutils literal"><span class="pre">JONI</span></code>, <code class="docutils literal"><span class="pre">RE2J</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">JONI</span></code></li>
</ul>
<p>Which library to use for regular expression functions.
<code class="docutils literal"><span class="pre">JONI</span></code> is generally faster for common usage, but can require exponential
time for certain expression patterns. <code class="docutils literal"><span class="pre">RE2J</span></code> uses a different algorithm
which guarantees linear time, but is often slower.</p>
</div></blockquote>
</div>
<div class="section" id="re2j-dfa-states-limit">
<h3><code class="docutils literal"><span class="pre">re2j.dfa-states-limit</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">2</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">2147483647</span></code></li>
</ul>
<p>The maximum number of states to use when RE2J builds the fast
but potentially memory intensive deterministic finite automaton (DFA)
for regular expression matching. If the limit is reached, RE2J will fall
back to the algorithm that uses the slower, but less memory intensive
non-deterministic finite automaton (NFA). Decreasing this value decreases the
maximum memory footprint of a regular expression search at the cost of speed.</p>
</div></blockquote>
</div>
<div class="section" id="re2j-dfa-retries">
<h3><code class="docutils literal"><span class="pre">re2j.dfa-retries</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">0</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">5</span></code></li>
</ul>
<p>The number of times that RE2J will retry the DFA algorithm when
it reaches a states limit before using the slower, but less memory
intensive NFA algorithm for all future inputs for that search. If hitting the
limit for a given input row is likely to be an outlier, you want to be able
to process subsequent rows using the faster DFA algorithm. If you are likely
to hit the limit on matches for subsequent rows as well, you want to use the
correct algorithm from the beginning so as not to waste time and resources.
The more rows you are processing, the larger this value should be.</p>
</div></blockquote>
</div>
</div>
</div>


</div>
<div class="bottomnav">
    
<p class="nav">
    <span class="left">
        &laquo; <a href="tuning.html">4.2. Tuning Presto</a>
    </span>
    <span class="right">
        <a href="spill.html">4.4. Spill to Disk</a> &raquo;
    </span>
</p>

</div>

    <div class="footer" role="contentinfo">
    </div>
  </body>
</html>