From aead20b6e1a94ffe540772ec97b85765fd5808e6 Mon Sep 17 00:00:00 2001 From: Hyunsik Choi Date: Tue, 23 Sep 2014 18:29:07 -0700 Subject: [PATCH 1/3] TAJO-1068: Add SQL Query documentation. --- .../main/sphinx/sql_language/data_model.rst | 8 +- .../src/main/sphinx/sql_language/queries.rst | 215 +++++++++++++++++- 2 files changed, 212 insertions(+), 11 deletions(-) diff --git a/tajo-docs/src/main/sphinx/sql_language/data_model.rst b/tajo-docs/src/main/sphinx/sql_language/data_model.rst index e01c6d43e1..a0c5856614 100644 --- a/tajo-docs/src/main/sphinx/sql_language/data_model.rst +++ b/tajo-docs/src/main/sphinx/sql_language/data_model.rst @@ -7,7 +7,7 @@ Data Types =============== +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ -| Supported | SQL Type Name | Alias | Size (byte) | Description | Range | +| Support | SQL Type Name | Alias | Size (byte) | Description | Range | +===========+================+============================+=============+===================================================+==========================================================================+ | O | boolean | bool | 1 | | true/false | +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ @@ -41,13 +41,13 @@ Data Types +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ | O | blob | bytea | | variable-length binary string | | +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ -| | date | | | | | +| O | date | | | | | +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ -| | time | | | | | +| O | time | | | | | +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ | | timetz | time with time zone | | | | +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ -| | timestamp | | | | | +| O | timestamp | | | | | +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ | | timestamptz | | | | | +-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ diff --git a/tajo-docs/src/main/sphinx/sql_language/queries.rst b/tajo-docs/src/main/sphinx/sql_language/queries.rst index 8a212a59b2..c2e897c364 100644 --- a/tajo-docs/src/main/sphinx/sql_language/queries.rst +++ b/tajo-docs/src/main/sphinx/sql_language/queries.rst @@ -11,7 +11,7 @@ Overview .. code-block:: sql SELECT [distinct [all]] * | [[AS] ] [, ...] - [FROM [[AS]
] [, ...]] + [FROM
[[AS]
] [, ...]] [WHERE ] [GROUP BY [, ...]] [HAVING ] @@ -23,16 +23,217 @@ Overview From Clause ===================== +*Synopsis* + +.. code-block:: sql + + [FROM
[[AS]
] [, ...]] + + +The ``FROM`` clause specifies one or more other tables given in a comma-separated table reference list. +A table reference can be a relation name , or a subquery, a table join, or complex combinations of them. + +---------------------- +Table and Table Aliase +---------------------- + +A temporary name can be given to tables and complex table references to be used +for references to the derived table in the rest of the query. This is called a table alias. + +To create a a table alias, please use ``AS``: + +.. code-block:: sql + + FROM table_reference AS alias + +or + +.. code-block:: sql + + FROM table_reference alias + +The ``AS`` keyword can be omitted, and *Alias* can be any identifier. + +A typical application of table aliases is to give short names to a long table references. For example: + +.. code-block:: sql + + SELECT * FROM long_table_name_1234 s JOIN another_long_table_name_5678 a ON s.id = a.num; + +------------- +Joined Tables +------------- + +Tajo supports all kinds of join types. + +Join Type +~~~~~~~~~ + +Cross Join +^^^^^^^^^^ + +.. code-block:: sql + + FROM T1 CROSS JOIN T2 + +Cross join, also called *Catesian product*, results in every possible combination of rows from T1 and T2. + +``FROM T1 CROSS JOIN T2`` is equivalent to ``FROM T1, T2``. + +Qualified joins +^^^^^^^^^^^^^^^ + +Qualified joins implicitly or explicitly have join conditions. Inner/Outer/Natural Joins all are qualified joins. +Except for natural join, ``ON`` or ``USING`` clause in each join is used to specify a join condition. +A join condition must include at least one boolean expression, and it can also include just filter conditions. + +**Inner Join** + +.. code-block:: sql + + T1 [INNER] JOIN T2 ON boolean_expression + T1 [INNER] JOIN T2 USING (join column list) + +``INNER`` keyword is the default, and so ``INNER`` can be omitted when you use inner join. + +**Outer Join** + +.. code-block:: sql + + T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 ON boolean_expression + T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 USING (join column list) + +One of ``LEFT``, ``RIGHT``, or ``FULL`` must be specified for outer joins. +Join conditions in outer join will have different behavior according to corresponding table references of join conditions. +To know outer join behavior in more detail, please refer to +`Advanced outer join constructs `_. + +**Natural Join** + +.. code-block:: sql + + T1 NATURAL JOIN T2 + +``NATURAL`` is a short form of ``USING``. It forms a ``USING`` list consisting of all common column names that appear in +both join tables. These common columns appear only once in the output table. If there are no common columns, +``NATURAL`` behaves like ``CROSS JOIN``. + +**Subqueries** + +Subqueries allow users to specify a derived table. It requires enclosing a SQL statement in parentheses and an alias name. +For example: + +.. code-block:: sql + + FROM (SELECT * FROM table1) AS alias_name ===================== Where Clause ===================== +The syntax of the WHERE Clause is -===================== -Groupby Clause -===================== +*Synopsis* -===================== -Select list -===================== \ No newline at end of file +.. code-block:: sql + + WHERE search_condition + +``search_condition`` can be any boolean expression. +In order to know additional predicates, please refer to :doc:`/sql_language/predicates`. + +========================== +Groupby and Having Clauses +========================== + +*Synopsis* + +.. code-block:: sql + + SELECT select_list + FROM ... + [WHERE ...] + GROUP BY grouping_column_reference [, grouping_column_reference]... + [HAVING boolean_expression] + +The rows which passes ``WHERE`` filter may be subject to grouping, specified by ``GROUP BY`` clause. +Grouping combines a set of rows having common values into one group, and then computes rows in the group with aggregation functions. +``HAVING`` clause can be used with only ``GROUP BY`` clause. It eliminates the unqualified result rows of grouping. + +``grouping_column_reference`` can be a column reference, a complex expression including scalar functions and arithmetic operations. + +========================== +Orderby and Limit Clauses +========================== + +*Synopsis* + +.. code-block:: sql + + FROM ... ORDER BY [(ASC|DESC)] [NULL (FIRST|LAST) [,...] + +``sort_expr`` can be a column reference, aliased column reference, or a complex expression. +``ASC`` indicates an ascending order of ``sort_expr`` values. ``DESC`` indicates an descending order of ``sort_expr`` values. +``ASC`` is the default order. + +``NULLS FIRST`` and ``NULLS LAST`` options can be used to determine whether nulls values appear +before or after non-null values in the sort ordering. By default, null values are dealt as if larger than any non-null value; +that is, ``NULLS FIRST`` is the default for ``DESC`` order, and ``NULLS LAST`` otherwise. + +========================== +Window Functions +========================== + +A window function performs a calculation across multiple table rows that belong to some window frame. + +*Synopsis* + +.. code-block:: sql + + SELECT ...., func(param) OVER ([PARTITION BY partition-expr [, ...]] [ORDER BY sort-expr [, ...]]), ...., FROM + +The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of +the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into +the same partition as the current row. + +We will briefly explain some examples using window functions. + +--------- +Examples +--------- + +Multiple window functions can be used in a SQL statement as follows: + +.. code-block:: sql + + SELECT l_orderkey, sum(l_discount) OVER (PARTITION BY l_orderkey), sum(l_quantity) OVER (PARTITION BY l_orderkey) FROM LINEITEM; + +If ``OVER()`` clause is empty as following, it makes all table rows into one window frame. + +.. code-block:: sql + + SELECT salary, sum(salary) OVER () FROM empsalary; + +Also, ``ORDER BY`` clause can be used without ``PARTITION BY`` clause as follows: + +.. code-block:: sql + + SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; + +Also, any expressions and aggregation functions are allowed in ``ORDER BY`` clause as follows: + +.. code-block:: sql + + select + l_orderkey, + count(*) as cnt, + row_number() over (partition by l_orderkey order by count(*) desc) + row_num + from + lineitem + group by + l_orderkey + +.. note:: + + Currently, Tajo does not support multiple different partition-expressions in one SQL statement. \ No newline at end of file From fe07894499a53b5fcd31ac296768ee15ee346039 Mon Sep 17 00:00:00 2001 From: Hyunsik Choi Date: Thu, 25 Sep 2014 22:31:10 -0700 Subject: [PATCH 2/3] Improved the explanation of GROUPBY. --- .../src/main/sphinx/sql_language/queries.rst | 21 +++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/tajo-docs/src/main/sphinx/sql_language/queries.rst b/tajo-docs/src/main/sphinx/sql_language/queries.rst index c2e897c364..b30b2a6f62 100644 --- a/tajo-docs/src/main/sphinx/sql_language/queries.rst +++ b/tajo-docs/src/main/sphinx/sql_language/queries.rst @@ -157,11 +157,28 @@ Groupby and Having Clauses [HAVING boolean_expression] The rows which passes ``WHERE`` filter may be subject to grouping, specified by ``GROUP BY`` clause. -Grouping combines a set of rows having common values into one group, and then computes rows in the group with aggregation functions. -``HAVING`` clause can be used with only ``GROUP BY`` clause. It eliminates the unqualified result rows of grouping. +Grouping combines a set of rows having common values into one group, and then computes rows in the group with aggregation functions. ``HAVING`` clause can be used with only ``GROUP BY`` clause. It eliminates the unqualified result rows of grouping. ``grouping_column_reference`` can be a column reference, a complex expression including scalar functions and arithmetic operations. +.. code-block:: sql + + SELECT l_orderkey, SUM(l_quantity) AS quantity FROM lineitem GROUP BY l_orderkey; + + SELECT substr(l_shipdate,1,4) as year, SUM(l_orderkey) AS total2 FROM lineitem GROUP BY substr(l_shipdate,1,4); + +If a SQL statement includes ``GROUP BY`` clause, expressions in select list must be either grouping_column_reference or aggregation function. For example, the following example query is not allowed because ``l_orderkey`` does not occur in ``GROUP BY`` clause. + +.. code-block:: sql + + SELECT l_orderkey, l_partkey, SUM(l_orderkey) AS total FROM lineitem GROUP BY l_partkey; + +Aggregation functions can be used with ``DISTINCT`` keywords. It forces an individual aggregate function to take only distinct values of the argument expression. ``DISTINCT`` keyword is used as follows: + +.. code-block:: sql + + SELECT l_partkey, COUNT(distinct l_quantity), SUM(distinct l_extendedprice) AS total FROM lineitem GROUP BY l_partkey; + ========================== Orderby and Limit Clauses ========================== From 2cbf3ef25a3b7d1d2908f35fb222e136db8ee59c Mon Sep 17 00:00:00 2001 From: Hyunsik Choi Date: Wed, 1 Oct 2014 14:54:33 -0700 Subject: [PATCH 3/3] Reflected Mai and Henry's comments. --- .../src/main/sphinx/sql_language/queries.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/tajo-docs/src/main/sphinx/sql_language/queries.rst b/tajo-docs/src/main/sphinx/sql_language/queries.rst index b30b2a6f62..5adfdaa793 100644 --- a/tajo-docs/src/main/sphinx/sql_language/queries.rst +++ b/tajo-docs/src/main/sphinx/sql_language/queries.rst @@ -33,9 +33,9 @@ From Clause The ``FROM`` clause specifies one or more other tables given in a comma-separated table reference list. A table reference can be a relation name , or a subquery, a table join, or complex combinations of them. ----------------------- -Table and Table Aliase ----------------------- +----------------------- +Table and Table Aliases +----------------------- A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias. @@ -54,7 +54,7 @@ or The ``AS`` keyword can be omitted, and *Alias* can be any identifier. -A typical application of table aliases is to give short names to a long table references. For example: +A typical application of table aliases is to give short names to long table references. For example: .. code-block:: sql @@ -66,8 +66,8 @@ Joined Tables Tajo supports all kinds of join types. -Join Type -~~~~~~~~~ +Join Types +~~~~~~~~~~ Cross Join ^^^^^^^^^^ @@ -76,7 +76,7 @@ Cross Join FROM T1 CROSS JOIN T2 -Cross join, also called *Catesian product*, results in every possible combination of rows from T1 and T2. +Cross join, also called *Cartesian product*, results in every possible combination of rows from T1 and T2. ``FROM T1 CROSS JOIN T2`` is equivalent to ``FROM T1, T2``. @@ -190,7 +190,7 @@ Orderby and Limit Clauses FROM ... ORDER BY [(ASC|DESC)] [NULL (FIRST|LAST) [,...] ``sort_expr`` can be a column reference, aliased column reference, or a complex expression. -``ASC`` indicates an ascending order of ``sort_expr`` values. ``DESC`` indicates an descending order of ``sort_expr`` values. +``ASC`` indicates an ascending order of ``sort_expr`` values. ``DESC`` indicates a descending order of ``sort_expr`` values. ``ASC`` is the default order. ``NULLS FIRST`` and ``NULLS LAST`` options can be used to determine whether nulls values appear @@ -237,7 +237,7 @@ Also, ``ORDER BY`` clause can be used without ``PARTITION BY`` clause as follows SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; -Also, any expressions and aggregation functions are allowed in ``ORDER BY`` clause as follows: +Also, all expressions and aggregation functions are allowed in ``ORDER BY`` clause as follows: .. code-block:: sql