Skip to content

Conversation

@haohui
Copy link

@haohui haohui commented Jun 1, 2017

This first iteration of the documentation. Comments are appreciated.

Copy link
Contributor

@alpinegizmo alpinegizmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly quite clear. Just a few suggestions.

-->

SQL queries are specified using the `sql()` method of the `TableEnvironment`. The method returns the result of the SQL query as a `Table` which can be converted into a `DataSet` or `DataStream`, used in subsequent Table API queries, or written to a `TableSink` (see [Writing Tables to External Sinks](#writing-tables-to-external-sinks)). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single DataStream or DataSet program.
Flink supports specifying DataStream or DataSet programs with SQL queries using the `sql()` method of the `TableEnvironment`. The method returns the result of the SQL query as a `Table`. A `Table` can be used in the subsequent SQL / Table API queries, be converted into a `DataSet` or `DataStream`, used in subsequent Table API queries or written to a `TableSink` (see [Writing Tables to External Sinks](common.html#emit-to-a-tablesink)). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single program.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Table can be used in subsequent SQL / Table API queries, be converted into a DataSet or DataStream, or written to a TableSink ...

(drop "the" and the redundant "used in subsequent Table API queries")

To access the data in the SQL queries, users must register data sources, including `Table`, `DataSet`, `DataStream` or external `TableSource`, in the `TableEnvironment` (see [Registering Tables](common.html#register-a-table-in-the-catalog)). Alternatively, users can also register external catalogs in the `TableEnvironment` to specify the location of the data sources.

*Note: Flink's SQL support is not feature complete, yet. Queries that include unsupported SQL features will cause a `TableException`. The limitations of SQL on batch and streaming tables are listed in the following sections.*
For convenience `Table.toString()` will automatically register an unique table name under the `Table`'s `TableEnvironment` and return the table name. So it allows to call SQL directly on tables in a string concatenation (see examples below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... a unique table name ...

To access the data in the SQL queries, users must register data sources, including `Table`, `DataSet`, `DataStream` or external `TableSource`, in the `TableEnvironment` (see [Registering Tables](common.html#register-a-table-in-the-catalog)). Alternatively, users can also register external catalogs in the `TableEnvironment` to specify the location of the data sources.

*Note: Flink's SQL support is not feature complete, yet. Queries that include unsupported SQL features will cause a `TableException`. The limitations of SQL on batch and streaming tables are listed in the following sections.*
For convenience `Table.toString()` will automatically register an unique table name under the `Table`'s `TableEnvironment` and return the table name. So it allows to call SQL directly on tables in a string concatenation (see examples below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows SQL to be called directly on tables in a string concatenation (see examples below).

For convenience `Table.toString()` will automatically register an unique table name under the `Table`'s `TableEnvironment` and return the table name. So it allows to call SQL directly on tables in a string concatenation (see examples below).

**TODO: Rework intro. Move some parts below. **
*Note: Flink's SQL support is not feature complete, yet. Queries that include unsupported SQL features will cause a `TableException`. The limitations of SQL on batch and streaming tables are listed in the following sections.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flink's SQL support is not yet feature complete.

Specifying a Query
---------------

Here are a few examples on how to specify a DataStream / DataSet program using SQL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples of how to

----------------

Flink uses [Apache Calcite](https://calcite.apache.org/docs/reference.html) for SQL parsing. Currently, Flink SQL only supports query-related SQL syntax and only a subset of the comprehensive SQL standard. The following BNF-grammar describes the supported SQL features:
Flink parses SQL using [Apache Calcite](https://calcite.apache.org/docs/reference.html). Flink supports the standard ANSI SQL but it provides no supports for DML and DDL. The following BNF-grammar describes the supported SQL features:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flink supports standard ANSI SQL, but it provides no support for DML or DDL.

Flink supports specifying DataStream or DataSet programs with SQL queries using the `sql()` method of the `TableEnvironment`. The method returns the result of the SQL query as a `Table`. A `Table` can be used in the subsequent SQL / Table API queries, be converted into a `DataSet` or `DataStream`, used in subsequent Table API queries or written to a `TableSink` (see [Writing Tables to External Sinks](common.html#emit-to-a-tablesink)). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single program.

A `Table`, `DataSet`, `DataStream`, or external `TableSource` must be registered in the `TableEnvironment` in order to be accessible by a SQL query (see [Registering Tables](#registering-tables)). For convenience `Table.toString()` will automatically register an unique table name under the `Table`'s `TableEnvironment` and return the table name. So it allows to call SQL directly on tables in a string concatenation (see examples below).
To access the data in the SQL queries, users must register data sources, including `Table`, `DataSet`, `DataStream` or external `TableSource`, in the `TableEnvironment` (see [Registering Tables](common.html#register-a-table-in-the-catalog)). Alternatively, users can also register external catalogs in the `TableEnvironment` to specify the location of the data sources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit awkwardly. How about this?

Before using data in a SQL query, the data source(s) must first be registered in the TableEnvironment (see Registering Tables). Possible data sources include Tables, DataSets, DataStreams, and external TableSources. Alternatively, ...

Copy link
Contributor

@fhueske fhueske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @haohui! I left a few comments and suggestions.

It would also be good, if you could check if the Built-in Function limitations are still up to date. Some of the functions might have been added in the mean time.

What do you think?
Thanks, Fabian

-->

SQL queries are specified using the `sql()` method of the `TableEnvironment`. The method returns the result of the SQL query as a `Table` which can be converted into a `DataSet` or `DataStream`, used in subsequent Table API queries, or written to a `TableSink` (see [Writing Tables to External Sinks](#writing-tables-to-external-sinks)). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single DataStream or DataSet program.
Flink supports specifying DataStream or DataSet programs with SQL queries using the `sql()` method of the `TableEnvironment`. The method returns the result of the SQL query as a `Table`. A `Table` can be used in subsequent SQL / Table API queries, be converted into a `DataSet` or `DataStream`, or written to a `TableSink` (see [Writing Tables to External Sinks](common.html#emit-to-a-tablesink)). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single program.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not mention the translation into DataStream and DataSet programs (see another comment below).

Flink supports specifying DataStream or DataSet programs with SQL queries using the `sql()` method of the `TableEnvironment`. The method returns the result of the SQL query as a `Table`. A `Table` can be used in subsequent SQL / Table API queries, be converted into a `DataSet` or `DataStream`, or written to a `TableSink` (see [Writing Tables to External Sinks](common.html#emit-to-a-tablesink)). SQL and Table API queries can seamlessly mixed and are holistically optimized and translated into a single program.

A `Table`, `DataSet`, `DataStream`, or external `TableSource` must be registered in the `TableEnvironment` in order to be accessible by a SQL query (see [Registering Tables](#registering-tables)). For convenience `Table.toString()` will automatically register an unique table name under the `Table`'s `TableEnvironment` and return the table name. So it allows to call SQL directly on tables in a string concatenation (see examples below).
Before using data in a SQL query, the data source(s) must first be registered in the `TableEnvironment` (see see [Registering Tables](common.html#register-a-table-in-the-catalog)). Possible data sources include Tables, DataSets, DataStreams, and external TableSources. Alternatively, users can also register external catalogs in the `TableEnvironment` to specify the location of the data sources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before using data in a SQL query, the data source(s) must first be registered in the `TableEnvironment` (see see [Registering Tables](common.html#register-a-table-in-the-catalog)). Possible data sources include Tables, DataSets, DataStreams, and external TableSources. Alternatively, users can also register external catalogs in the `TableEnvironment` to specify the location of the data sources.

*Note: Flink's SQL support is not feature complete, yet. Queries that include unsupported SQL features will cause a `TableException`. The limitations of SQL on batch and streaming tables are listed in the following sections.*
For convenience `Table.toString()` will automatically register a unique table name under the `Table`'s `TableEnvironment` and return the table name. This allows SQL to be called directly on tables in a string concatenation (see examples below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... will automatically register the table under a unique name in its TableEnvironment and ...?

Specifying a Query
---------------

Here are a few examples on specifying a DataStream / DataSet program using SQL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not mention that SQL queries are translated into DataStream / DataSet programs. This is useful information for some users and it should be mentioned in an internals section. I would assume that most users do not care about this and might be even confused.

What do you think @haohui ?

----------------

Flink uses [Apache Calcite](https://calcite.apache.org/docs/reference.html) for SQL parsing. Currently, Flink SQL only supports query-related SQL syntax and only a subset of the comprehensive SQL standard. The following BNF-grammar describes the supported SQL features:
Flink parses SQL using [Apache Calcite](https://calcite.apache.org/docs/reference.html). Flink supports standard ANSI SQL, but it provides no supports for DML and DDL. The following BNF-grammar describes the supported SQL features:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention that not all features are supported by batch and streaming. The Operations section will show which features are supported by batch and streaming.

</thead>
<tbody>
<tr>
<td><strong>Inner Equi-join / Outer Equi-join</strong>(Batch only)</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that join must have at least one conjunctive equality predicate. CROSS or Theta joins are not supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that Flink does not optimize join order yet and joins in the same order as specified in the query.

<tr>
<td><strong>User Defined Table Function (UDTF)</strong></td>
<td>
<p>SQL queries can refer to UDTFs to expand a value into a relation provided that they are registered in the <pre>TableEnvironment</pre>.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<pre>TableFunction</pre> is not nicely rendered in my setup. Can you check on yours?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableFunctions do also accept multiple values. Add a link to the UDF docs page.

{% top %}

### GroupBy Windows
### Aggregations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line for User defined aggregation function?

</table>

For SQL queries on streaming tables, the `time_attr` argument of the group window function must be one of the `rowtime()` or `proctime()` time-indicators, which distinguish between event or processing time, respectively. For SQL on batch tables, the `time_attr` argument of the group window function must be an attribute of type `TIMESTAMP`.
For SQL queries on streaming tables, the `time_attr` argument of the group window function must refer to the virtual column that specifies the processing time or the event time. For SQL on batch tables, the `time_attr` argument of the group window function must be an attribute of type `TIMESTAMP`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to the time attribute section on the Table API Streaming doc page.


**TODO: Integrate this with the examples**

#### Batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the Batch and Streaming limitations sections.
Instead, I would integrate the relevant information with the Operations sections, e.g., add a comments to UNNEST that WITH ORDINALITY is not supported, etc.

@fhueske
Copy link
Contributor

fhueske commented Jun 8, 2017

I've seen several questions on the mailing list and Stack Overflow related to breaking changes in the 1.3 release (mostly about the time indicator). That's why I would like to update the documentation ASAP.

I will integrate my comments and merge this PR.

@fhueske
Copy link
Contributor

fhueske commented Jun 8, 2017

Hi @haohui, I merged the PR to the tableDocs branch.
Can you close it? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants