Skip to content

[BEAM-2528] BeamSql: DDL: create table#3481

Closed
xumingming wants to merge 7 commits intoapache:DSL_SQLfrom
xumingming:BEAM-2528-create-table
Closed

[BEAM-2528] BeamSql: DDL: create table#3481
xumingming wants to merge 7 commits intoapache:DSL_SQLfrom
xumingming:BEAM-2528-create-table

Conversation

@xumingming
Copy link
Contributor

I started this PR as an initial attempt to implement the create table statement. The implementation might not be so mature, but I hope this could be a place we can discuss deeper about the create table. I will introduce this PR in the following 3 aspects:

  • MetaStore
  • TableProvider
  • Grammar

MetaStore

Metastore is responsible for handling the CRUD of table during a session. e.g. create a table, query all tables, query a table by the specified name etc. When a table is created, the table meta info can be persisted by the metastore, but the default InMemoryMetaStore will only store the meta info in memory, so it will NOT be persisted, but user can implement the MetaStore interface to make a persistent implementation.

The table names in MetaStore need to be unique.

TableProvider

The tables in MetaStore can come from many different sources, the construction of a usable table is the responsibility of a TableProvider, TableProvider have the similar interface like MetaStore, but it only handles a specific type of table, e.g. TextTableProvider only handle text tables, while KafakaTableProvider only handle kafka tables.

Grammar

The grammar for create table is:

CREATE TABLE ORDERS(
   ID INT PRIMARY KEY COMMENT 'this is the primary key',
   NAME VARCHAR(127) COMMENT 'this is the name'
)
COMMENT 'this is the table orders'
LOCATION 'text://home/admin/orders'
TBLPROPERTIES '{"format": "Excel"}'

LOCATION dictates where the data of the table is stored. The scheme of the LOCATION dictate the table type, e.g. in the above example, the table type is text, using the table type we can find the corresponding TextTableProvider using the ServiceLoader merchanism.

TBLPROPERTIES is used to specify some other properties of the table, in the above example, we specified the format of each line of text file: Excel(one variant of CSV format).

@xumingming
Copy link
Contributor Author

R: @xumingmin @takidau

@xumingming
Copy link
Contributor Author

About "how to tell whether a table is bounded or unbounded", my current thinking is using the location scheme, e.g. hbase-bounded means a bounded hbase table, while hbase-unbounded means a unbounded hbase table, it should work, but not beautiful, looking for suggestions for this one.

@xumingming
Copy link
Contributor Author

Entry point of the new feature is: BeamSqlCli#execute.

@xumingming xumingming force-pushed the BEAM-2528-create-table branch from 9bbb08f to 1f9c423 Compare July 1, 2017 13:32
Copy link

@mingmxu mingmxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there's a schema header in location, that could tell whether it's BOUNDED or UNBOUNDED I think. For example text(refer to TextIO) is always BOUNDED, and Kafka(refer to KafkaIO) may be BOUNDED or UNBOUNDED depends on TBLPROPERTIES.

@mingmxu
Copy link

mingmxu commented Jul 10, 2017

would like to understand the definition of MetaStore.

In my mind, MetaStore is only an access layer on physical metadata repository:

  1. In runtime BeamSqlEnv.SchemaPlus holds tables loaded from various sources. The load flow looks like:
    physical_metadata_store --> meta_store --> TableProvider --> BeamSqlEnv.SchemaPlus

  2. MetaStore is responsible to 1) convert a table metadata entry to BeamSqlTable with TableProvider, 2) save a BeamSqlTable as a metadata entry.

  3. BeamSqlCli should bind one configurable MetaStore by default, and be able to load tables from other MetaStores;

@xumingming
Copy link
Contributor Author

Agree with @xumingmin that use location + tblproperites(sometimes) to determine the table type(bounded vs unbounded)

@xumingming
Copy link
Contributor Author

  1. TableProvider's responsibility:
    a) Generate a BeamSqlTable from a create table statement.
    b) Provides API to query all the tables of this kind of tables. e.g. For txt tables, it is always an empty list, since there is no persistent storage to hold the meta. While if we have a HiveTableProvider, all the tables in hive is provided by the Hive Meta database.

  2. MetaStore aggregates all the tables from all the table providers and expose a facade api to BeamSqlEnv.SchemaPlus to use. So there will be only one MetaStore for one BeamSqlCli, you can define your own MetaStore, but you can only use one every time.

  3. So the load flow is:

physical_metadata_store --> TableProvider(there can be many providers) -> meta_store --> BeamSqlEnv.SchemaPlus

@xumingming xumingming force-pushed the BEAM-2528-create-table branch 2 times, most recently from 01c37d0 to 23f41ab Compare July 19, 2017 03:43
@mingmxu
Copy link

mingmxu commented Jul 26, 2017

retest this please

@mingmxu
Copy link

mingmxu commented Jul 26, 2017

@xumingming can you rebase this PR?
Overall LGTM, let's use this as the initial for DDL support.

@xumingming xumingming force-pushed the BEAM-2528-create-table branch from 601bc13 to 1b52f0a Compare July 27, 2017 06:39
@xumingming
Copy link
Contributor Author

retest this please.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling ae53639 on xumingming:BEAM-2528-create-table into ** on apache:DSL_SQL**.

@xumingming
Copy link
Contributor Author

Rebased & ignored findbugs and javadoc plugin for auto-generated parser classes.

@xumingming xumingming force-pushed the BEAM-2528-create-table branch from ae53639 to 184aa0f Compare August 2, 2017 07:41
@xumingming
Copy link
Contributor Author

Rebased with the lastest DSL_SQL branch.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 3c73f52 on xumingming:BEAM-2528-create-table into ** on apache:DSL_SQL**.

@reuvenlax
Copy link
Contributor

R: @takidau

@xumingming
Copy link
Contributor Author

Closing this one, will open another PR.

@xumingming xumingming closed this Oct 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants