[BEAM-2436] table is not regeisted in BeamSql.query by mingmxu · Pull Request #3342 · apache/beam

mingmxu · 2017-06-10T20:49:04Z

coveralls · 2017-06-10T22:09:02Z

Changes Unknown when pulling 0950a8d on XuMingmin:BEAM-2436 into ** on apache:DSL_SQL**.

xumingming · 2017-06-11T06:44:14Z

dsls/sql/src/main/java/org/apache/beam/dsls/sql/example/BeamSqlExample.java

-    PCollection<BeamSQLRow> outputStream = inputTable.apply(BeamSql.simpleQuery(sql));
+    //Case 1. run a simple SQL query over input PCollection with BeamSql.simpleQuery;
+    PCollection<BeamSQLRow> outputStream = inputTable.apply(
+        BeamSql.simpleQuery("select c2, c3 from TABLE_A where c1=1"));


The usage of BeamSql.simpleQuery seems a little weird: user have never specified the table name: TABLE_A.

simpleQuery is added for single table SQL. As it's not possible to name a PCollection here, it's automately set to table name used in query.

This brings a potential issue of table name conflict, the solution may be limiting the scope of table schema, will open a new task to talk.

Ok, let's talk further in the new task.

+1 to James' concerns.

Its extremely convenient to not have to name your tables in the simpleQuery approach.

@takidau as talking about the design of interface, the default method is public static PTransform<PCollectionTuple, PCollection<BeamSqlRow>> query(String sqlQuery), which relies on the named TupleTag to specify table name.

public static PTransform<PCollection<BeamSqlRow>, PCollection<BeamSqlRow>> simpleQuery(String sqlQuery) is a special case which runs on single table/PCollection. There's no exiting method to name it, so the table name in query is take as granted.

For both methods, potentially the table name would be mixed up, that's why I said a further task is needed to have a separated schema namespace for each query.

@lukecwik any comments?

@lukecwik do you mean pcollection.simpleQuery('SELECT C1, C2'), to replace pcollection.simpleQuery('SELECT C1, C2 FROM TABLE_NAME')?

@xumingmin Having the ability to use 'SELECT C1, C2 FROM TABLE_NAME' or 'SELECT C1, C2' doesn't matter to me. What matters to me is that they don't need to use a PCollectionTuple with a single PCollection and TupleTag pair.

I was thinking that the whole idea of global registration whenever someone calls query or simpleQuery will impact the users future pipeline construction depending on the order in which they apply parts of their pipeline is not a good idea. I can see how its useful in a CLI where they aren't building a pipeline programmatically but should be limited to paths which the CLI code handles.

Agree, a global schema namespace doesn't sound good in DSL, would address the issue in a new task.

xumingming · 2017-06-12T02:06:50Z

LGTM

mingmxu · 2017-06-12T21:48:34Z

rebase to fix conflict.

coveralls · 2017-06-12T23:00:24Z

Changes Unknown when pulling e859df4 on XuMingmin:BEAM-2436 into ** on apache:DSL_SQL**.

mingmxu · 2017-06-13T21:43:03Z

@lukecwik @takidau any pending requests here?
If not, could you help to merge it?

lukecwik · 2017-06-13T22:34:05Z

I don't think you want to register tables with BeamSqlEnv within expand(). BeamSqlEnv is currently the global namespace.

mingmxu · 2017-06-13T22:40:04Z

As discussed above, there should be a BeamSqlEnv per query, to limit the scope of resisted tables. I would prefer to do it in another task, as it impacts both BeamSql and BeamSqlCli.

lukecwik · 2017-06-14T00:05:03Z

SGTM
Merged

This closes #3342

mingmxu · 2017-06-14T00:17:08Z

Thanks @lukecwik, @xumingming

Created BEAM2446 to track the next step.

xumingming reviewed Jun 11, 2017

View reviewed changes

xumingming approved these changes Jun 12, 2017

View reviewed changes

register table for both BeamSql.simpleQuery and BeamSql.query

e859df4

mingmxu force-pushed the BEAM-2436 branch from 0950a8d to e859df4 Compare June 12, 2017 21:46

asfgit pushed a commit that referenced this pull request Jun 14, 2017

[BEAM-2436] table is not regeisted in BeamSql.query

1e080e2

This closes #3342

mingmxu closed this Jun 14, 2017

mingmxu deleted the BEAM-2436 branch June 14, 2017 00:17

Conversation

mingmxu commented Jun 10, 2017

Uh oh!

coveralls commented Jun 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xumingming commented Jun 12, 2017

Uh oh!

mingmxu commented Jun 12, 2017

Uh oh!

coveralls commented Jun 12, 2017

Uh oh!

mingmxu commented Jun 13, 2017

Uh oh!

lukecwik commented Jun 13, 2017

Uh oh!

mingmxu commented Jun 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukecwik commented Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mingmxu commented Jun 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mingmxu commented Jun 13, 2017 •

edited

Loading

lukecwik commented Jun 14, 2017 •

edited

Loading