[SPARK-20845][SQL] Support specification of column names in INSERT INTO command. #22532

misutoth · 2018-09-23T20:57:04Z

What changes were proposed in this pull request?

One can specify a list of columns for an INSERT INTO command. The columns shall be listed in parenthesis just following the table name. Query columns are then matched to this very same order.

scala> sql("CREATE TABLE t (s string, i int)")
scala> sql("INSERT INTO t values ('first', 1)")
scala> sql("INSERT INTO t (i, s) values (2, 'second')")
scala> sql("SELECT * FROM t").show
+------+---+
|     s|  i|
+------+---+
| first|  1|
|second|  2|
+------+---+


scala>

In the above example the second insertion utilizes the new functionality. The number and its associated string is given in reverse order (2, 'second') according to the column list specified for the table (i, s). The result can be seen at the end of the command list. Intermediate output of the commands are omitted for the sake of brevity.

How was this patch tested?

InsertSuite (both in source and in hive sub-packages) were extended with tests exercising specification of column names listing in INSERT INTO commands.

Also ran the above sample, and ran tests in sql.

…TO command.

misutoth · 2018-09-26T13:25:37Z

@janewangfb, @gatorsmile could you please possibly review this change?

vanzin · 2018-10-05T18:51:23Z

ok to test

SparkQA · 2018-10-05T20:55:34Z

Test build #97006 has finished for PR 22532 at commit 1dda672.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-05T21:29:28Z

Thanks for submitting the PR! I quickly scan the code changes. It sounds like the general direction is right but the quality is not ready.

I would suggest to write the test plan before doing the code review. Could you try your best to write down what we should test for supporting this feature? Both negative and positive cases.

misutoth · 2018-10-08T13:13:19Z

Many thanks for the feedback. I will list the test scenarios that I had in mind and collected while I implemented this item.

And sorry about the failure, seems like I did not rerun all the tests in my last step... For example when the same field is queried multiple times it is not handled properly. I will fix them also ...

weixiuli · 2019-09-02T09:21:29Z

Is there any progress? @misutoth

weixiuli · 2019-09-03T06:43:36Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+
+  override def visitNamedExpressionSeq(ctx: NamedExpressionSeqContext): Seq[Expression] = {
+    ctx.namedExpression.asScala.map(visitNamedExpression)
+  }


Why does it need to be overrided ?

AmplabJenkins · 2019-09-16T18:19:01Z

Can one of the admins verify this patch?

HyukjinKwon · 2019-09-17T00:39:52Z

Closing this due to author's inactivity.

igreenfield · 2019-12-18T09:49:27Z

@misutoth Can we push that forward? hence hive support that so thrift is not compatible?

chrisknoll · 2020-02-05T12:28:13Z

Our project will also not be able to use Apache Spark unless the standard insert with column names syntax is supported, so I would also be interested in this change being applied.

igreenfield · 2020-12-08T18:14:44Z

@weixiuli @misutoth I found bug in that PR and also have the fix. can we continue with that one?

[SPARK-20845][SQL] Support specification of column names in INSERT IN…

1dda672

…TO command.

dongjoon-hyun added the SQL label Jun 14, 2019

weixiuli reviewed Sep 3, 2019

View reviewed changes

HyukjinKwon closed this Sep 17, 2019

maropu mentioned this pull request Feb 5, 2020

[SPARK-27209][SQL] Split parsing of SELECT and INSERT into two top-level rules in the grammar file. #24150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20845][SQL] Support specification of column names in INSERT INTO command. #22532

[SPARK-20845][SQL] Support specification of column names in INSERT INTO command. #22532

misutoth commented Sep 23, 2018

misutoth commented Sep 26, 2018

vanzin commented Oct 5, 2018

SparkQA commented Oct 5, 2018

gatorsmile commented Oct 5, 2018

misutoth commented Oct 8, 2018

weixiuli commented Sep 2, 2019

weixiuli Sep 3, 2019

AmplabJenkins commented Sep 16, 2019

HyukjinKwon commented Sep 17, 2019

igreenfield commented Dec 18, 2019 •

edited

chrisknoll commented Feb 5, 2020

igreenfield commented Dec 8, 2020

[SPARK-20845][SQL] Support specification of column names in INSERT INTO command. #22532

[SPARK-20845][SQL] Support specification of column names in INSERT INTO command. #22532

Conversation

misutoth commented Sep 23, 2018

What changes were proposed in this pull request?

How was this patch tested?

misutoth commented Sep 26, 2018

vanzin commented Oct 5, 2018

SparkQA commented Oct 5, 2018

gatorsmile commented Oct 5, 2018

misutoth commented Oct 8, 2018

weixiuli commented Sep 2, 2019

weixiuli Sep 3, 2019

Choose a reason for hiding this comment

AmplabJenkins commented Sep 16, 2019

HyukjinKwon commented Sep 17, 2019

igreenfield commented Dec 18, 2019 • edited

chrisknoll commented Feb 5, 2020

igreenfield commented Dec 8, 2020

igreenfield commented Dec 18, 2019 •

edited