Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20845][SQL] Support specification of column names in INSERT INTO command. #22532

Closed
wants to merge 1 commit into from

Conversation

misutoth
Copy link
Contributor

What changes were proposed in this pull request?

One can specify a list of columns for an INSERT INTO command. The columns shall be listed in parenthesis just following the table name. Query columns are then matched to this very same order.

scala> sql("CREATE TABLE t (s string, i int)")
scala> sql("INSERT INTO t values ('first', 1)")
scala> sql("INSERT INTO t (i, s) values (2, 'second')")
scala> sql("SELECT * FROM t").show
+------+---+
|     s|  i|
+------+---+
| first|  1|
|second|  2|
+------+---+


scala>

In the above example the second insertion utilizes the new functionality. The number and its associated string is given in reverse order (2, 'second') according to the column list specified for the table (i, s). The result can be seen at the end of the command list. Intermediate output of the commands are omitted for the sake of brevity.

How was this patch tested?

InsertSuite (both in source and in hive sub-packages) were extended with tests exercising specification of column names listing in INSERT INTO commands.

Also ran the above sample, and ran tests in sql.

@misutoth
Copy link
Contributor Author

@janewangfb, @gatorsmile could you please possibly review this change?

@vanzin
Copy link
Contributor

vanzin commented Oct 5, 2018

ok to test

@SparkQA
Copy link

SparkQA commented Oct 5, 2018

Test build #97006 has finished for PR 22532 at commit 1dda672.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks for submitting the PR! I quickly scan the code changes. It sounds like the general direction is right but the quality is not ready.

I would suggest to write the test plan before doing the code review. Could you try your best to write down what we should test for supporting this feature? Both negative and positive cases.

@misutoth
Copy link
Contributor Author

misutoth commented Oct 8, 2018

Many thanks for the feedback. I will list the test scenarios that I had in mind and collected while I implemented this item.

And sorry about the failure, seems like I did not rerun all the tests in my last step... For example when the same field is queried multiple times it is not handled properly. I will fix them also ...

@weixiuli
Copy link
Contributor

weixiuli commented Sep 2, 2019

Is there any progress? @misutoth


override def visitNamedExpressionSeq(ctx: NamedExpressionSeqContext): Seq[Expression] = {
ctx.namedExpression.asScala.map(visitNamedExpression)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be overrided ?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

Closing this due to author's inactivity.

@igreenfield
Copy link

igreenfield commented Dec 18, 2019

@misutoth Can we push that forward? hence hive support that so thrift is not compatible?

@chrisknoll
Copy link

Our project will also not be able to use Apache Spark unless the standard insert with column names syntax is supported, so I would also be interested in this change being applied.

@igreenfield
Copy link

@weixiuli @misutoth I found bug in that PR and also have the fix. can we continue with that one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
10 participants