Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
f0be7e8
Document REPLACE() and SPLIT()
jnturton Aug 9, 2020
2cbc324
Document BOOL_AND and BOOL_OR
jnturton Aug 9, 2020
2e8fdd9
Document the SUBSTR(string, regexp) variant.
jnturton Aug 9, 2020
d100238
Document STDDEV and VARIANCE.
jnturton Aug 9, 2020
e4963ac
Document APPROX_COUNT_DUPS.
jnturton Aug 10, 2020
27db5a7
Document that a FROM clause is now optional.
jnturton Aug 10, 2020
e6f7a74
Migrate _tools/createdatadocs.py to Python 3
jnturton Oct 29, 2020
70ed038
Document those of Calcite's SQL dialect compat functions which are im…
jnturton Oct 29, 2020
e56068a
Move format plugins to the Data Source and File Formats section
jnturton Oct 30, 2020
1094b24
Version to 1.18, remove deprecated redcarpet Markdown processor
jnturton Oct 30, 2020
376830c
Move _docs/img to images/doc
jnturton Oct 30, 2020
291260a
Correct instances of Markdown heading chars (#) not followed by a space.
jnturton Oct 30, 2020
00016fa
Fix syntax in 2020-09-05-drill-1.18-released.md and add author Abishek.
jnturton Nov 1, 2020
1215789
Change query output table chars to Markdown.
jnturton Nov 1, 2020
35ab19e
Document the SPSS format plugin.
jnturton Nov 2, 2020
bab91e2
Document the ESRI Shapefile format plugin.
jnturton Nov 2, 2020
e9eb691
Document the Excel Format Plugin.
jnturton Nov 2, 2020
9ba39fb
Document the HDF5 Format Plugin.
jnturton Nov 2, 2020
b7637af
Provide a link to the KramDown quick reference in README.md
jnturton Nov 2, 2020
922e4df
Document the Druid Storage Plugin.
jnturton Nov 2, 2020
044d864
Document the HTTP Storage Plugin.
jnturton Nov 2, 2020
9ae8a20
Document GIS SQL functions.
jnturton Nov 2, 2020
8f57664
Document time series analysis fucnctions.
jnturton Nov 2, 2020
d90f6b5
Remove Developer Notes from Druid Storage Plugin page.
jnturton Nov 2, 2020
ed2b865
Populate release notes for 1.18 from JIRA. Remove executable bit from
jnturton Nov 3, 2020
6ee8dad
Remove executable flag from *.{md,html}
jnturton Nov 3, 2020
b2ccd6f
Add HTTP Storage Plugin images to images/docs/
jnturton Nov 3, 2020
9fc9961
Remove date line broken link to docpage.css from docpage.html
jnturton Nov 3, 2020
6483b10
Change permalinks from docs/:title to docs/:slug
jnturton Nov 3, 2020
e86dd37
Add a slug: line to, and remove the date: line from, every docpage.
jnturton Nov 3, 2020
be7ae3f
Reflect that the HTTP Storage Plugin can also accept CSV response data.
jnturton Nov 3, 2020
48d9214
Add Markdown code block language tags.
jnturton Nov 24, 2020
47e89b3
Troubleshooting info for JDBC connection with non-default cluster-id.
jnturton Dec 9, 2020
0f64d32
Tidy up image-metadata-format-plugin.md
jnturton Dec 9, 2020
e496e7a
Update httpd-format-plugin.md using the latest README.md in the Drill
jnturton Dec 9, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Please make sure that specific versions of libraries are installed since buildin

# Documentation Guidelines

The documentation pages are placed under `_docs`. You can modify existing .md files, or you can create new .md files to add to the Apache Drill documentation site. Create pull requests to submit your documentation updates.
The documentation pages are placed under `_docs`. You can modify existing .md files, or you can create new .md files to add to the Apache Drill documentation site. Create pull requests to submit your documentation updates. The Kramdown MarkDown processor employed by Jekyll supports [a dialect of MarkDown](https://kramdown.gettalong.org/quickref.html) which is a superset of standard MarkDown.

## Creating New MarkDown Files

Expand Down
14 changes: 7 additions & 7 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,25 +11,25 @@ description: > # this means to ignore newlines until "baseurl:"

baseurl: "/drill" # Base URL when hosted on GitHub Pages (drill repository under apache account)
noindex: 1
markdown: redcarpet

redcarpet:
extensions: ["no_intra_emphasis", "fenced_code_blocks", "autolink", "tables", "with_toc_data"]
# Uncomment to use the redcarpet Markdown processor (not supported by GH Pages) instead of Kramdown
#markdown: redcarpet
#redcarpet:
# extensions: ["no_intra_emphasis", "fenced_code_blocks", "autolink", "tables", "with_toc_data"]

collections:
docs:
output: true
permalink: /docs/:title/
permalink: /docs/:slug/

defaults:
-
scope:
- scope:
type: docs # This defines the default for anything in the docs collection. An alternative would be to use "path: _docs" here.
values:
layout: docpage

sass:
style: :compressed

gems:
plugins:
- jekyll-redirect-from
6 changes: 6 additions & 0 deletions _data/authors.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,11 @@
"title": "Committer",
"org": "MapR Technologies",
"email": "bbevens@mapr.com"
},
"agirish": {
"name": "Abhishek Girish",
"title": "Committer",
"org": "MapR Technologies",
"email": "agirish@mapr.com"
}
}
29,421 changes: 14,973 additions & 14,448 deletions _data/docs.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _docs/010-getting-started.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "Getting Started"
slug: "Getting Started"
---
1 change: 1 addition & 0 deletions _docs/020-architecture.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "Architecture"
slug: "Architecture"
---
1 change: 1 addition & 0 deletions _docs/030-tutorials.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
title: "Tutorials"
slug: "Tutorials"
nocontent: true
---
1 change: 1 addition & 0 deletions _docs/031-drill-on-yarn.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
title: "Drill-on-YARN"
slug: "Drill-on-YARN"
nocontent: true
---
1 change: 1 addition & 0 deletions _docs/040-install-drill.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Install Drill"
slug: "Install Drill"
---


Expand Down
1 change: 1 addition & 0 deletions _docs/045-configure-drill.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Configure Drill"
slug: "Configure Drill"
---


Expand Down
1 change: 1 addition & 0 deletions _docs/050-connect-a-data-source.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "Connect a Data Source"
slug: "Connect a Data Source"
---
1 change: 1 addition & 0 deletions _docs/060-odbc-jdbc-interfaces.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "ODBC/JDBC Interfaces"
slug: "ODBC/JDBC Interfaces"
---
1 change: 1 addition & 0 deletions _docs/070-query-data.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Query Data"
slug: "Query Data"
---


1 change: 1 addition & 0 deletions _docs/072-performance-tuning.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Performance Tuning"
slug: "Performance Tuning"
---


1 change: 1 addition & 0 deletions _docs/073-log-and-debug.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "Log and Debug"
slug: "Log and Debug"
---
1 change: 1 addition & 0 deletions _docs/080-sql-reference.md
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "SQL Reference"
slug: "SQL Reference"
---
1 change: 1 addition & 0 deletions _docs/090-data-sources-and-file-formats.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Data Sources and File Formats"
slug: "Data Sources and File Formats"
---


Expand Down
1 change: 1 addition & 0 deletions _docs/100-develop-custom-functions.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
title: "Develop Custom Functions"
slug: "Develop Custom Functions"
---

6 changes: 4 additions & 2 deletions _docs/110-troubleshooting.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Troubleshooting"
date: 2017-02-24 22:12:36 UTC
slug: "Troubleshooting"
---

You may experience certain known issues when using Drill. This document lists some known issues and resolutions for each.
Expand Down Expand Up @@ -182,8 +182,10 @@ Turn on ODBC driver debug logging to better understand failure.
### JDBC/ODBC Connection Issues with ZooKeeper

Symptom: Client cannot resolve ZooKeeper host names for JDBC/ODBC.
Symptom: "IllegalStateException: No active Drillbit endpoint found from ZooKeeper. Check connection parameters?"

Solution: Ensure that Zookeeper is up and running. Verify that Drill has the correct `drill-override.conf` settings for the Zookeeper quorum. If `cluster-id` in file drill-override.conf is not the default value, it must be specified in the JDBC connection string. See [Using the JDBC Driver]({{ site.baseurl }}/docs/using-the-jdbc-driver/).

Solution: Ensure that Zookeeper is up and running. Verify that Drill has the correct `drill-override.conf` settings for the Zookeeper quorum.

### Metadata Queries Take a Long Time to Return

Expand Down
1 change: 1 addition & 0 deletions _docs/120-developer-information.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Developer Information"
slug: "Developer Information"
---


1 change: 1 addition & 0 deletions _docs/130-rn.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "Release Notes"
slug: "Release Notes"
---
1 change: 1 addition & 0 deletions _docs/140-sample-datasets.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
---
title: "Sample Datasets"
slug: "Sample Datasets"
---
2 changes: 1 addition & 1 deletion _docs/170-bylaws.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Project Bylaws"
date: 2018-11-02
slug: "Project Bylaws"
---
## Introduction

Expand Down
1 change: 1 addition & 0 deletions _docs/171-ecosystem.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Ecosystem"
slug: "Ecosystem"
---


3 changes: 2 additions & 1 deletion _docs/architecture/010-architecture-introduction.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: "Architecture Introduction"
date: 2018-12-08
slug: "Architecture Introduction"
slug: "Architecture Introduction"
parent: "Architecture"
---
Apache Drill is a low latency distributed query engine for large-scale
Expand Down
16 changes: 8 additions & 8 deletions _docs/architecture/015-drill-query-execution.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
---
title: "Drill Query Execution"
date: 2018-12-08
slug: "Drill Query Execution"
parent: "Architecture"
---

When you submit a Drill query, a client or an application sends the query in the form of an SQL statement to a Drillbit in the Drill cluster. A Drillbit is the process running on each active Drill node that coordinates, plans, and executes queries, as well as distributes query work across the cluster to maximize data locality.

The following image represents the communication between clients, applications, and Drillbits:

![]({{ site.baseurl }}/docs/img/query-flow-client.png)
![]({{ site.baseurl }}/images/docs/query-flow-client.png)

The Drillbit that receives the query from a client or application becomes the Foreman for the query and drives the entire query. A parser in the Foreman parses the SQL, applying custom rules to convert specific SQL operators into a specific logical operator syntax that Drill understands. This collection of logical operators forms a logical plan. The logical plan describes the work required to generate the query results and defines which data sources and operations to apply.

The Foreman sends the logical plan into a cost-based optimizer to optimize the order of SQL operators in a statement and read the logical plan. The optimizer applies various types of rules to rearrange operators and functions into an optimal plan. The optimizer converts the logical plan into a physical plan that describes how to execute the query.

![]({{ site.baseurl }}/docs/img/client-phys-plan.png)
![]({{ site.baseurl }}/images/docs/client-phys-plan.png)

A parallelizer in the Foreman transforms the physical plan into multiple phases, called major and minor fragments. These fragments create a multi-level execution tree that rewrites the query and executes it in parallel against the configured data sources, sending the results back to the client or application.

![]({{ site.baseurl }}/docs/img/execution-tree.PNG)
![]({{ site.baseurl }}/images/docs/execution-tree.PNG)


## Major Fragments
A major fragment is a concept that represents a phase of the query execution. A phase can consist of one or multiple operations that Drill must perform to execute the query. Drill assigns each major fragment a MajorFragmentID.

For example, to perform a hash aggregation of two files, Drill may create a plan with two major phases (major fragments) where the first phase is dedicated to scanning the two files and the second phase is dedicated to the aggregation of the data.

![]({{ site.baseurl }}/docs/img/ex-operator.png)
![]({{ site.baseurl }}/images/docs/ex-operator.png)

Drill uses an exchange operator to separate major fragments. An exchange is a change in data location and/or parallelization of the physical plan. An exchange is composed of a sender and a receiver to allow data to move between nodes.

Expand All @@ -37,15 +37,15 @@ You can work with major fragments within the physical plan by capturing a JSON r
## Minor Fragments
Each major fragment is parallelized into minor fragments. A minor fragment is a logical unit of work that runs inside a thread. A logical unit of work in Drill is also referred to as a slice. The execution plan that Drill creates is composed of minor fragments. Drill assigns each minor fragment a MinorFragmentID.

![]({{ site.baseurl }}/docs/img/min-frag.png)
![]({{ site.baseurl }}/images/docs/min-frag.png)

The parallelizer in the Foreman creates one or more minor fragments from a major fragment at execution time, by breaking a major fragment into as many minor fragments as it can usefully run at the same time on the cluster.

Drill executes each minor fragment in its own thread as quickly as possible based on its upstream data requirements. Drill schedules the minor fragments on nodes with data locality. Otherwise, Drill schedules them in a round-robin fashion on the existing, available Drillbits.

Minor fragments contain one or more relational operators. An operator performs a relational operation, such as scan, filter, join, or group by. Each operator has a particular operator type and an OperatorID. Each OperatorID defines its relationship within the minor fragment to which it belongs. See [Physical Operators]({{ site.baseurl }}/docs/physical-operators/).

![]({{ site.baseurl }}/docs/img/operators.png)
![]({{ site.baseurl }}/images/docs/operators.png)

For example, when performing a hash aggregation of two files, Drill breaks the first phase dedicated to scanning into two minor fragments. Each minor fragment contains scan operators that scan the files. Drill breaks the second phase dedicated to aggregation into four minor fragments. Each of the four minor fragments contain hash aggregate operators that perform the hash aggregation operations on the data.

Expand All @@ -60,7 +60,7 @@ Intermediate fragments start work when data is available or fed to them from oth

The leaf fragments scan tables in parallel and communicate with the storage layer or access data on local disk. The leaf fragments pass partial results to the intermediate fragments, which perform parallel operations on intermediate results.

![]({{ site.baseurl }}/docs/img/leaf-frag.png)
![]({{ site.baseurl }}/images/docs/leaf-frag.png)

Drill only plans queries that have concurrent running fragments. For example, if 20 available slices exist in the cluster, Drill plans a query that runs no more than 20 minor fragments in a particular major fragment. Drill is optimistic and assumes that it can complete all of the work in parallel. All minor fragments for a particular major fragment start at the same time based on their upstream data dependency.

4 changes: 2 additions & 2 deletions _docs/architecture/020-core-modules.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: "Core Modules"
date: 2018-11-02
slug: "Core Modules"
parent: "Architecture"
---
The following image represents components within each Drillbit:

![drill query flow]({{ site.baseurl }}/docs/img/DrillbitModules.png)
![drill query flow]({{ site.baseurl }}/images/docs/DrillbitModules.png)

The following list describes the key components of a Drillbit:

Expand Down
4 changes: 2 additions & 2 deletions _docs/architecture/030-performance.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Performance"
date: 2018-11-02
slug: "Performance"
parent: "Architecture"
---
Drill is designed from the ground up for high performance on large datasets.
Expand Down Expand Up @@ -42,7 +42,7 @@ generates highly efficient custom code for every single query.
The following image shows the Drill compilation/code generation
process:

![drill compiler]({{ site.baseurl }}/docs/img/58.png)
![drill compiler]({{ site.baseurl }}/images/docs/58.png)

**_Optimistic and pipelined query execution_**

Expand Down
2 changes: 1 addition & 1 deletion _docs/configure-drill/010-configure-drill-introduction.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Configure Drill Introduction"
date: 2018-12-08
slug: "Configure Drill Introduction"
parent: "Configure Drill"
---

Expand Down
6 changes: 3 additions & 3 deletions _docs/configure-drill/020-configuring-drill-memory.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Configuring Drill Memory"
date: 2018-03-26 17:37:50 UTC
slug: "Configuring Drill Memory"
parent: "Configure Drill"
---

Expand Down Expand Up @@ -90,7 +90,7 @@ Additionally, if the available free memory is less than the allocation, the foll
[WARN] Drillbit will start up, but can potentially crash due to oversubscribing of system memory.


##Modifying Memory Allocated to Queries
## Modifying Memory Allocated to Queries

You can configure the amount of memory that Drill allocates to each query as a hard limit or a percentage of the total direct memory. The `planner.memory.max_query_memory_per_node` and `planner.memory.percent_per_query` options set the amount of memory that Drill can allocate to a query on a node. Both options are enabled by default. Of these two options, Drill picks the setting that provides the most memory. For more information about these options, see [Sort-Based and Hash-Based Memory Constrained Operators](https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/).

Expand All @@ -109,7 +109,7 @@ Use the ALTER SYSTEM SET command to change the settings, as shown:

ALTER SYSTEM SET `drill.exec.memory.operator.output_batch_size` = <value>;

##Bounds Checking
## Bounds Checking

If performance is an issue, add -Dbounds=false, as shown in the following example:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Configuring a Multitenant Cluster Introduction"
date: 2016-11-21 21:27:46 UTC
slug: "Configuring a Multitenant Cluster Introduction"
parent: "Configuring a Multitenant Cluster"
---

Expand Down
1 change: 1 addition & 0 deletions _docs/configure-drill/031-securing-drill.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Securing Drill"
slug: "Securing Drill"
parent: "Configure Drill"
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Configuring a Multitenant Cluster"
slug: "Configuring a Multitenant Cluster"
parent: "Configure Drill"
---

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Configuring Multitenant Resources"
date: 2015-12-29 01:12:25 UTC
slug: "Configuring Multitenant Resources"
parent: "Configuring a Multitenant Cluster"
---
Drill operations are memory and CPU-intensive. Currently, Drill resources are managed outside of any cluster management service. In a multitenant or any other type of cluster, YARN-enabled or not, you configure memory and memory usage limits for Drill by modifying the `drill-env.sh` file as described in ["Configuring Drill Memory"]({{site.baseurl}}/docs/configuring-drill-memory).
Expand Down
4 changes: 2 additions & 2 deletions _docs/configure-drill/060-configuring-a-shared-drillbit.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
title: "Configuring Resources for a Shared Drillbit"
date: 2016-11-21 22:42:10 UTC
slug: "Configuring Resources for a Shared Drillbit"
parent: "Configuring a Multitenant Cluster"
---
To manage a cluster in which multiple users share a Drillbit, you configure Drill queuing and parallelization in addition to memory, as described in the previous section, ["Configuring Drill Memory"]({{site.baseurl}}/docs/configuring-drill-memory/).

##Configuring Query Queuing
## Configuring Query Queuing

Set [options in sys.options]({{site.baseurl}}/docs/configuration-options-introduction/) to enable and manage query queuing, which is turned off by default. There are two types of queues: large and small. You configure a maximum number of queries that each queue allows by configuring the following options in the `sys.options` table:

Expand Down
1 change: 1 addition & 0 deletions _docs/configure-drill/080-configuration-options.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Configuration Options"
slug: "Configuration Options"
parent: "Configure Drill"
---

Expand Down
2 changes: 1 addition & 1 deletion _docs/configure-drill/100-ports-used-by-drill.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Ports Used by Drill"
date: 2018-12-08
slug: "Ports Used by Drill"
parent: "Configure Drill"
---

Expand Down
Loading