-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathproperties.html
645 lines (632 loc) · 38.6 KB
/
properties.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>4.3. Properties Reference — Presto 0.202 Documentation</title>
<link rel="stylesheet" href="../_static/presto.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '../',
VERSION: '0.202',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="top" title="Presto 0.202 Documentation" href="../index.html" />
<link rel="up" title="4. Administration" href="../admin.html" />
<link rel="next" title="4.4. Spill to Disk" href="spill.html" />
<link rel="prev" title="4.2. Tuning Presto" href="tuning.html" />
</head>
<body role="document">
<div class="header">
<h1 class="heading"><a href="../index.html">
<span>Presto 0.202 Documentation</span></a></h1>
<h2 class="heading"><span>4.3. Properties Reference</span></h2>
</div>
<div class="topnav">
<p class="nav">
<span class="left">
« <a href="tuning.html">4.2. Tuning Presto</a>
</span>
<span class="right">
<a href="spill.html">4.4. Spill to Disk</a> »
</span>
</p>
</div>
<div class="content">
<div class="section" id="properties-reference">
<h1>4.3. Properties Reference</h1>
<p>This section describes the most important config properties that
may be used to tune Presto or alter its behavior when required.</p>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#general-properties" id="id2">General Properties</a></li>
<li><a class="reference internal" href="#spilling-properties" id="id3">Spilling Properties</a></li>
<li><a class="reference internal" href="#exchange-properties" id="id4">Exchange Properties</a></li>
<li><a class="reference internal" href="#task-properties" id="id5">Task Properties</a></li>
<li><a class="reference internal" href="#node-scheduler-properties" id="id6">Node Scheduler Properties</a></li>
<li><a class="reference internal" href="#optimizer-properties" id="id7">Optimizer Properties</a></li>
<li><a class="reference internal" href="#regular-expression-function-properties" id="id8">Regular Expression Function Properties</a></li>
</ul>
</div>
<div class="section" id="general-properties">
<h2>General Properties</h2>
<div class="section" id="distributed-joins-enabled">
<h3><code class="docutils literal"><span class="pre">distributed-joins-enabled</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>Use hash distributed joins instead of broadcast joins. Distributed joins
require redistributing both tables using a hash of the join key. This can
be slower (sometimes substantially) than broadcast joins, but allows much
larger joins. Broadcast joins require that the tables on the right side of
the join after filtering fit in memory on each node, whereas distributed joins
only need to fit in distributed memory across all nodes. This can also be
specified on a per-query basis using the <code class="docutils literal"><span class="pre">distributed_join</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="redistribute-writes">
<h3><code class="docutils literal"><span class="pre">redistribute-writes</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>This property enables redistribution of data before writing. This can
eliminate the performance impact of data skew when writing by hashing it
across nodes in the cluster. It can be disabled when it is known that the
output data set is not skewed in order to avoid the overhead of hashing and
redistributing all the data across the network. This can also be specified
on a per-query basis using the <code class="docutils literal"><span class="pre">redistribute_writes</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="resources-reserved-system-memory">
<h3><code class="docutils literal"><span class="pre">resources.reserved-system-memory</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">JVM</span> <span class="pre">max</span> <span class="pre">memory</span> <span class="pre">*</span> <span class="pre">0.4</span></code></li>
</ul>
<p>The amount of JVM memory reserved, for accounting purposes, for things
that are not directly attributable to or controllable by a user query.
For example, output buffers, code caches, etc. This also accounts for
memory that is not tracked by the memory tracking system.</p>
<p>The purpose of this property is to prevent the JVM from running out of
memory (OOM). The default value is suitable for smaller JVM heap sizes or
clusters with many concurrent queries. If running fewer queries with a
large heap, a smaller value may work. Basically, set this value large
enough that the JVM does not fail with <code class="docutils literal"><span class="pre">OutOfMemoryError</span></code>.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="spilling-properties">
<span id="tuning-spilling"></span><h2>Spilling Properties</h2>
<div class="section" id="experimental-spill-enabled">
<h3><code class="docutils literal"><span class="pre">experimental.spill-enabled</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">false</span></code></li>
</ul>
<p>Try spilling memory to disk to avoid exceeding memory limits for the query.</p>
<p>Spilling works by offloading memory to disk. This process can allow a query with a large memory
footprint to pass at the cost of slower execution times. Currently, spilling is supported only for
aggregations and joins (inner and outer), so this property will not reduce memory usage required for
window functions, sorting and other join types.</p>
<p>Be aware that this is an experimental feature and should be used with care.</p>
<p>This config property can be overridden by the <code class="docutils literal"><span class="pre">spill_enabled</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-spiller-spill-path">
<h3><code class="docutils literal"><span class="pre">experimental.spiller-spill-path</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">string</span></code></li>
<li><strong>No default value.</strong> Must be set when spilling is enabled</li>
</ul>
<p>Directory where spilled content will be written. It can be a comma separated
list to spill simultaneously to multiple directories, which helps to utilize
multiple drives installed in the system.</p>
<p>It is not recommended to spill to system drives. Most importantly, do not spill
to the drive on which the JVM logs are written, as disk overutilization might
cause JVM to pause for lengthy periods, causing queries to fail.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-spiller-max-used-space-threshold">
<h3><code class="docutils literal"><span class="pre">experimental.spiller-max-used-space-threshold</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">double</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">0.9</span></code></li>
</ul>
<p>If disk space usage ratio of a given spill path is above this threshold,
this spill path will not be eligible for spilling.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-spiller-threads">
<h3><code class="docutils literal"><span class="pre">experimental.spiller-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">4</span></code></li>
</ul>
<p>Number of spiller threads. Increase this value if the default is not able
to saturate the underlying spilling device (for example, when using RAID).</p>
</div></blockquote>
</div>
<div class="section" id="experimental-max-spill-per-node">
<h3><code class="docutils literal"><span class="pre">experimental.max-spill-per-node</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span> <span class="pre">GB</span></code></li>
</ul>
<p>Max spill space to be used by all queries on a single node.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-query-max-spill-per-node">
<h3><code class="docutils literal"><span class="pre">experimental.query-max-spill-per-node</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span> <span class="pre">GB</span></code></li>
</ul>
<p>Max spill space to be used by a single query on a single node.</p>
</div></blockquote>
</div>
<div class="section" id="experimental-aggregation-operator-unspill-memory-limit">
<h3><code class="docutils literal"><span class="pre">experimental.aggregation-operator-unspill-memory-limit</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">4</span> <span class="pre">MB</span></code></li>
</ul>
<p>Limit for memory used for unspilling a single aggregation operator instance.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="exchange-properties">
<h2>Exchange Properties</h2>
<p>Exchanges transfer data between Presto nodes for different stages of
a query. Adjusting these properties may help to resolve inter-node
communication issues or improve network utilization.</p>
<div class="section" id="exchange-client-threads">
<h3><code class="docutils literal"><span class="pre">exchange.client-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">25</span></code></li>
</ul>
<p>Number of threads used by exchange clients to fetch data from other Presto
nodes. A higher value can improve performance for large clusters or clusters
with very high concurrency, but excessively high values may cause a drop
in performance due to context switches and additional memory usage.</p>
</div></blockquote>
</div>
<div class="section" id="exchange-concurrent-request-multiplier">
<h3><code class="docutils literal"><span class="pre">exchange.concurrent-request-multiplier</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">3</span></code></li>
</ul>
<p>Multiplier determining the number of concurrent requests relative to
available buffer memory. The maximum number of requests is determined
using a heuristic of the number of clients that can fit into available
buffer space based on average buffer usage per request times this
multiplier. For example, with an <code class="docutils literal"><span class="pre">exchange.max-buffer-size</span></code> of <code class="docutils literal"><span class="pre">32</span> <span class="pre">MB</span></code>
and <code class="docutils literal"><span class="pre">20</span> <span class="pre">MB</span></code> already used and average size per request being <code class="docutils literal"><span class="pre">2MB</span></code>,
the maximum number of clients is
<code class="docutils literal"><span class="pre">multiplier</span> <span class="pre">*</span> <span class="pre">((32MB</span> <span class="pre">-</span> <span class="pre">20MB)</span> <span class="pre">/</span> <span class="pre">2MB)</span> <span class="pre">=</span> <span class="pre">multiplier</span> <span class="pre">*</span> <span class="pre">6</span></code>. Tuning this
value adjusts the heuristic, which may increase concurrency and improve
network utilization.</p>
</div></blockquote>
</div>
<div class="section" id="exchange-max-buffer-size">
<h3><code class="docutils literal"><span class="pre">exchange.max-buffer-size</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">32MB</span></code></li>
</ul>
<p>Size of buffer in the exchange client that holds data fetched from other
nodes before it is processed. A larger buffer can increase network
throughput for larger clusters and thus decrease query processing time,
but will reduce the amount of memory available for other usages.</p>
</div></blockquote>
</div>
<div class="section" id="exchange-max-response-size">
<h3><code class="docutils literal"><span class="pre">exchange.max-response-size</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1MB</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">16MB</span></code></li>
</ul>
<p>Maximum size of a response returned from an exchange request. The response
will be placed in the exchange client buffer which is shared across all
concurrent requests for the exchange.</p>
<p>Increasing the value may improve network throughput if there is high
latency. Decreasing the value may improve query performance for large
clusters as it reduces skew due to the exchange client buffer holding
responses for more tasks (rather than hold more data from fewer tasks).</p>
</div></blockquote>
</div>
<div class="section" id="sink-max-buffer-size">
<h3><code class="docutils literal"><span class="pre">sink.max-buffer-size</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">32MB</span></code></li>
</ul>
<p>Output buffer size for task data that is waiting to be pulled by upstream
tasks. If the task output is hash partitioned, then the buffer will be
shared across all of the partitioned consumers. Increasing this value may
improve network throughput for data transferred between stages if the
network has high latency or if there are many nodes in the cluster.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="task-properties">
<span id="id1"></span><h2>Task Properties</h2>
<div class="section" id="task-concurrency">
<h3><code class="docutils literal"><span class="pre">task.concurrency</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Restrictions:</strong> must be a power of two</li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">16</span></code></li>
</ul>
<p>Default local concurrency for parallel operators such as joins and aggregations.
This value should be adjusted up or down based on the query concurrency and worker
resource utilization. Lower values are better for clusters that run many queries
concurrently because the cluster will already be utilized by all the running
queries, so adding more concurrency will result in slow downs due to context
switching and other overhead. Higher values are better for clusters that only run
one or a few queries at a time. This can also be specified on a per-query basis
using the <code class="docutils literal"><span class="pre">task_concurrency</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="task-http-response-threads">
<h3><code class="docutils literal"><span class="pre">task.http-response-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span></code></li>
</ul>
<p>Maximum number of threads that may be created to handle HTTP responses. Threads are
created on demand and are cleaned up when idle, thus there is no overhead to a large
value if the number of requests to be handled is small. More threads may be helpful
on clusters with a high number of concurrent queries, or on clusters with hundreds
or thousands of workers.</p>
</div></blockquote>
</div>
<div class="section" id="task-http-timeout-threads">
<h3><code class="docutils literal"><span class="pre">task.http-timeout-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">3</span></code></li>
</ul>
<p>Number of threads used to handle timeouts when generating HTTP responses. This value
should be increased if all the threads are frequently in use. This can be monitored
via the <code class="docutils literal"><span class="pre">com.facebook.presto.server:name=AsyncHttpExecutionMBean:TimeoutExecutor</span></code>
JMX object. If <code class="docutils literal"><span class="pre">ActiveCount</span></code> is always the same as <code class="docutils literal"><span class="pre">PoolSize</span></code>, increase the
number of threads.</p>
</div></blockquote>
</div>
<div class="section" id="task-info-update-interval">
<h3><code class="docutils literal"><span class="pre">task.info-update-interval</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">duration</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1ms</span></code></li>
<li><strong>Maximum value:</strong> <code class="docutils literal"><span class="pre">10s</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">3s</span></code></li>
</ul>
<p>Controls staleness of task information, which is used in scheduling. Larger values
can reduce coordinator CPU load, but may result in suboptimal split scheduling.</p>
</div></blockquote>
</div>
<div class="section" id="task-max-partial-aggregation-memory">
<h3><code class="docutils literal"><span class="pre">task.max-partial-aggregation-memory</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">data</span> <span class="pre">size</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">16MB</span></code></li>
</ul>
<p>Maximum size of partial aggregation results for distributed aggregations. Increasing this
value can result in less network transfer and lower CPU utilization by allowing more
groups to be kept locally before being flushed, at the cost of additional memory usage.</p>
</div></blockquote>
</div>
<div class="section" id="task-max-worker-threads">
<h3><code class="docutils literal"><span class="pre">task.max-worker-threads</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">Node</span> <span class="pre">CPUs</span> <span class="pre">*</span> <span class="pre">2</span></code></li>
</ul>
<p>Sets the number of threads used by workers to process splits. Increasing this number
can improve throughput if worker CPU utilization is low and all the threads are in use,
but will cause increased heap space usage. Setting the value too high may cause a drop
in performance due to a context switching. The number of active threads is available
via the <code class="docutils literal"><span class="pre">RunningSplits</span></code> property of the
<code class="docutils literal"><span class="pre">com.facebook.presto.execution.executor:name=TaskExecutor.RunningSplits</span></code> JXM object.</p>
</div></blockquote>
</div>
<div class="section" id="task-min-drivers">
<h3><code class="docutils literal"><span class="pre">task.min-drivers</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">task.max-worker-threads</span> <span class="pre">*</span> <span class="pre">2</span></code></li>
</ul>
<p>The target number of running leaf splits on a worker. This is a minimum value because
each leaf task is guaranteed at least <code class="docutils literal"><span class="pre">3</span></code> running splits. Non-leaf tasks are also
guaranteed to run in order to prevent deadlocks. A lower value may improve responsiveness
for new tasks, but can result in underutilized resources. A higher value can increase
resource utilization, but uses additional memory.</p>
</div></blockquote>
</div>
<div class="section" id="task-writer-count">
<h3><code class="docutils literal"><span class="pre">task.writer-count</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Restrictions:</strong> must be a power of two</li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
</ul>
<p>The number of concurrent writer threads per worker per query. Increasing this value may
increase write speed, especially when a query is not I/O bound and can take advantage
of additional CPU for parallel writes (some connectors can be bottlenecked on CPU when
writing due to compression or other factors). Setting this too high may cause the cluster
to become overloaded due to excessive resource utilization. This can also be specified on
a per-query basis using the <code class="docutils literal"><span class="pre">task_writer_count</span></code> session property.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="node-scheduler-properties">
<h2>Node Scheduler Properties</h2>
<div class="section" id="node-scheduler-max-splits-per-node">
<h3><code class="docutils literal"><span class="pre">node-scheduler.max-splits-per-node</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">100</span></code></li>
</ul>
<p>The target value for the total number of splits that can be running for
each worker node.</p>
<p>Using a higher value is recommended if queries are submitted in large batches
(e.g., running a large group of reports periodically) or for connectors that
produce many splits that complete quickly. Increasing this value may improve
query latency by ensuring that the workers have enough splits to keep them
fully utilized.</p>
<p>Setting this too high will waste memory and may result in lower performance
due to splits not being balanced across workers. Ideally, it should be set
such that there is always at least one split waiting to be processed, but
not higher.</p>
</div></blockquote>
</div>
<div class="section" id="node-scheduler-max-pending-splits-per-task">
<h3><code class="docutils literal"><span class="pre">node-scheduler.max-pending-splits-per-task</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">10</span></code></li>
</ul>
<p>The number of outstanding splits that can be queued for each worker node
for a single stage of a query, even when the node is already at the limit for
total number of splits. Allowing a minimum number of splits per stage is
required to prevent starvation and deadlocks.</p>
<p>This value must be smaller than <code class="docutils literal"><span class="pre">node-scheduler.max-splits-per-node</span></code>,
will usually be increased for the same reasons, and has similar drawbacks
if set too high.</p>
</div></blockquote>
</div>
<div class="section" id="node-scheduler-min-candidates">
<h3><code class="docutils literal"><span class="pre">node-scheduler.min-candidates</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">1</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">10</span></code></li>
</ul>
<p>The minimum number of candidate nodes that will be evaluated by the
node scheduler when choosing the target node for a split. Setting
this value too low may prevent splits from being properly balanced
across all worker nodes. Setting it too high may increase query
latency and increase CPU usage on the coordinator.</p>
</div></blockquote>
</div>
<div class="section" id="node-scheduler-network-topology">
<h3><code class="docutils literal"><span class="pre">node-scheduler.network-topology</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">string</span></code></li>
<li><strong>Allowed values:</strong> <code class="docutils literal"><span class="pre">legacy</span></code>, <code class="docutils literal"><span class="pre">flat</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">legacy</span></code></li>
</ul>
</div></blockquote>
</div>
</div>
<div class="section" id="optimizer-properties">
<h2>Optimizer Properties</h2>
<div class="section" id="optimizer-dictionary-aggregation">
<h3><code class="docutils literal"><span class="pre">optimizer.dictionary-aggregation</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">false</span></code></li>
</ul>
<p>Enables optimization for aggregations on dictionaries. This can also be specified
on a per-query basis using the <code class="docutils literal"><span class="pre">dictionary_aggregation</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-optimize-hash-generation">
<h3><code class="docutils literal"><span class="pre">optimizer.optimize-hash-generation</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>Compute hash codes for distribution, joins, and aggregations early during execution,
allowing result to be shared between operations later in the query. This can reduce
CPU usage by avoiding computing the same hash multiple times, but at the cost of
additional network transfer for the hashes. In most cases it will decrease overall
query processing time. This can also be specified on a per-query basis using the
<code class="docutils literal"><span class="pre">optimize_hash_generation</span></code> session property.</p>
<p>It is often helpful to disable this property when using <a class="reference internal" href="../sql/explain.html"><span class="doc">EXPLAIN</span></a> in order
to make the query plan easier to read.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-optimize-metadata-queries">
<h3><code class="docutils literal"><span class="pre">optimizer.optimize-metadata-queries</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">false</span></code></li>
</ul>
<p>Enable optimization of some aggregations by using values that are stored as metadata.
This allows Presto to execute some simple queries in constant time. Currently, this
optimization applies to <code class="docutils literal"><span class="pre">max</span></code>, <code class="docutils literal"><span class="pre">min</span></code> and <code class="docutils literal"><span class="pre">approx_distinct</span></code> of partition
keys and other aggregation insensitive to the cardinality of the input (including
<code class="docutils literal"><span class="pre">DISTINCT</span></code> aggregates). Using this may speed up some queries significantly.</p>
<p>The main drawback is that it can produce incorrect results if the connector returns
partition keys for partitions that have no rows. In particular, the Hive connector
can return empty partitions if they were created by other systems (Presto cannot
create them).</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-optimize-single-distinct">
<h3><code class="docutils literal"><span class="pre">optimizer.optimize-single-distinct</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>The single distinct optimization will try to replace multiple <code class="docutils literal"><span class="pre">DISTINCT</span></code> clauses
with a single <code class="docutils literal"><span class="pre">GROUP</span> <span class="pre">BY</span></code> clause, which can be substantially faster to execute.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-push-aggregation-through-join">
<h3><code class="docutils literal"><span class="pre">optimizer.push-aggregation-through-join</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>When an aggregation is above an outer join and all columns from the outer side of the join
are in the grouping clause, the aggregation is pushed below the outer join. This optimization
is particularly useful for correlated scalar subqueries, which get rewritten to an aggregation
over an outer join. For example:</p>
<div class="highlight-sql"><div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">item</span> <span class="n">i</span>
<span class="k">WHERE</span> <span class="n">i</span><span class="p">.</span><span class="n">i_current_price</span> <span class="o">></span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="k">AVG</span><span class="p">(</span><span class="n">j</span><span class="p">.</span><span class="n">i_current_price</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">item</span> <span class="n">j</span>
<span class="k">WHERE</span> <span class="n">i</span><span class="p">.</span><span class="n">i_category</span> <span class="o">=</span> <span class="n">j</span><span class="p">.</span><span class="n">i_category</span><span class="p">);</span>
</pre></div>
</div>
<p>Enabling this optimization can substantially speed up queries by reducing
the amount of data that needs to be processed by the join. However, it may slow down some
queries that have very selective joins. This can also be specified on a per-query basis using
the <code class="docutils literal"><span class="pre">push_aggregation_through_join</span></code> session property.</p>
</div></blockquote>
</div>
<div class="section" id="optimizer-push-table-write-through-union">
<h3><code class="docutils literal"><span class="pre">optimizer.push-table-write-through-union</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">boolean</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">true</span></code></li>
</ul>
<p>Parallelize writes when using <code class="docutils literal"><span class="pre">UNION</span> <span class="pre">ALL</span></code> in queries that write data. This improves the
speed of writing output tables in <code class="docutils literal"><span class="pre">UNION</span> <span class="pre">ALL</span></code> queries because these writes do not require
additional synchronization when collecting results. Enabling this optimization can improve
<code class="docutils literal"><span class="pre">UNION</span> <span class="pre">ALL</span></code> speed when write speed is not yet saturated. However, it may slow down queries
in an already heavily loaded system. This can also be specified on a per-query basis
using the <code class="docutils literal"><span class="pre">push_table_write_through_union</span></code> session property.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="regular-expression-function-properties">
<h2>Regular Expression Function Properties</h2>
<p>The following properties allow tuning the <a class="reference internal" href="../functions/regexp.html"><span class="doc">Regular Expression Functions</span></a>.</p>
<div class="section" id="regex-library">
<h3><code class="docutils literal"><span class="pre">regex-library</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">string</span></code></li>
<li><strong>Allowed values:</strong> <code class="docutils literal"><span class="pre">JONI</span></code>, <code class="docutils literal"><span class="pre">RE2J</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">JONI</span></code></li>
</ul>
<p>Which library to use for regular expression functions.
<code class="docutils literal"><span class="pre">JONI</span></code> is generally faster for common usage, but can require exponential
time for certain expression patterns. <code class="docutils literal"><span class="pre">RE2J</span></code> uses a different algorithm
which guarantees linear time, but is often slower.</p>
</div></blockquote>
</div>
<div class="section" id="re2j-dfa-states-limit">
<h3><code class="docutils literal"><span class="pre">re2j.dfa-states-limit</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">2</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">2147483647</span></code></li>
</ul>
<p>The maximum number of states to use when RE2J builds the fast
but potentially memory intensive deterministic finite automaton (DFA)
for regular expression matching. If the limit is reached, RE2J will fall
back to the algorithm that uses the slower, but less memory intensive
non-deterministic finite automaton (NFA). Decreasing this value decreases the
maximum memory footprint of a regular expression search at the cost of speed.</p>
</div></blockquote>
</div>
<div class="section" id="re2j-dfa-retries">
<h3><code class="docutils literal"><span class="pre">re2j.dfa-retries</span></code></h3>
<blockquote>
<div><ul class="simple">
<li><strong>Type:</strong> <code class="docutils literal"><span class="pre">integer</span></code></li>
<li><strong>Minimum value:</strong> <code class="docutils literal"><span class="pre">0</span></code></li>
<li><strong>Default value:</strong> <code class="docutils literal"><span class="pre">5</span></code></li>
</ul>
<p>The number of times that RE2J will retry the DFA algorithm when
it reaches a states limit before using the slower, but less memory
intensive NFA algorithm for all future inputs for that search. If hitting the
limit for a given input row is likely to be an outlier, you want to be able
to process subsequent rows using the faster DFA algorithm. If you are likely
to hit the limit on matches for subsequent rows as well, you want to use the
correct algorithm from the beginning so as not to waste time and resources.
The more rows you are processing, the larger this value should be.</p>
</div></blockquote>
</div>
</div>
</div>
</div>
<div class="bottomnav">
<p class="nav">
<span class="left">
« <a href="tuning.html">4.2. Tuning Presto</a>
</span>
<span class="right">
<a href="spill.html">4.4. Spill to Disk</a> »
</span>
</p>
</div>
<div class="footer" role="contentinfo">
</div>
</body>
</html>