Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION #20999

Closed
wants to merge 16 commits into from

Conversation

mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Apr 6, 2018

What changes were proposed in this pull request?

Hive has been supporting for a while the ability of dropping partitions using any kind of comparison operator on them. Spark so far is supporting only dropping partitions by exact values. For instance, Spark doesn't support:

ALTER TABLE mytable DROP PARTITION(mydate < '2018-04-06')

The PR adds the support to this syntax too.

The PR takes input from the effort in #19691 by @DazhuangSu. As such, this closes #19691.

How was this patch tested?

UTs to be added

@SparkQA
Copy link

SparkQA commented Apr 6, 2018

Test build #88996 has finished for PR 20999 at commit b57a5d1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Apr 6, 2018

thanks @gatorsmile , I missed them. I see that #19691 is still open and waiting for review. Probably I should close this one and we can go on on that PR. But I have seen no activity on it for a while, is there any reason?

Thanks.

;

dropPartitionVal
: identifier (comparisonOperator constant)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be in this format? partCol1 > 2 How about 2 > partCol1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in Hive it has to be like this. 2 > partCol1 is not supported by Hive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive also throws antler errors for the case 2 > partCol1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive does throw an error in that case, you mean asking that error is a parsing or another kind of exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, yes. I like user-understandable error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive throws this parser exception:

hive> alter table test1 drop partition(1 > c);
NoViableAltException(368@[])
	at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:12014)
	at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionVal(HiveParser_IdentifiersParser.java:11684)
	at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionSpec(HiveParser_IdentifiersParser.java:11563)
	at org.apache.hadoop.hive.ql.parse.HiveParser.dropPartitionSpec(HiveParser.java:44851)
	at org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixDropPartitions(HiveParser.java:11564)
	at org.apache.hadoop.hive.ql.parse.HiveParser.alterTableStatementSuffix(HiveParser.java:8000)
	at org.apache.hadoop.hive.ql.parse.HiveParser.alterStatement(HiveParser.java:7450)
	at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:4340)
	at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:2497)
	at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1423)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:209)
	at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:74)
	at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:67)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:615)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1829)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1776)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1771)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:832)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:770)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:694)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:33 cannot recognize input near '1' '>' 'c' in drop partition statement

so yes, it is analogous to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the check. I still like meaningful messages though, we shold wait for other reviewer's comments.

@gatorsmile
Copy link
Member

Let us start reviewing that PR.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Apr 8, 2018

ok, thanks @gatorsmile. Then I am closing this.

@mgaido91 mgaido91 closed this Apr 8, 2018
@mgaido91 mgaido91 reopened this Sep 5, 2018
@mgaido91
Copy link
Contributor Author

mgaido91 commented Sep 5, 2018

I am reopening this according to the discussion in #19691.

cc @maropu @DazhuangSu

@mgaido91 mgaido91 changed the title [WIP][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION [WIP][SPARK-14922][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION Sep 5, 2018
@SparkQA
Copy link

SparkQA commented Sep 5, 2018

Test build #95725 has finished for PR 20999 at commit 148f477.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 6, 2018

still WIP? btw, it seems currently credits can go to multiple developers;
#22324 (comment)

@mgaido91 mgaido91 changed the title [WIP][SPARK-14922][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION [SPARK-14922][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION Sep 6, 2018
@mgaido91
Copy link
Contributor Author

mgaido91 commented Sep 6, 2018

it seems currently credits can go to multiple developers;

Yes, but I don't know how to do that. Probably committers can do it in the merging process, so I think the only thing I can do is write in the description who should be credited and then committers can do accordingly. Please correct me if I am wrong. Thanks.

@mgaido91 mgaido91 changed the title [SPARK-14922][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Support partition filters in ALTER TABLE DROP PARTITION Sep 6, 2018
@SparkQA
Copy link

SparkQA commented Sep 6, 2018

Test build #95758 has finished for PR 20999 at commit 7d3cf0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 7, 2018

you're not wrong, but I just meant you don't need to describe "Credit for that should be given to him." in the PR description.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Sep 7, 2018

Thanks @maropu , updated.

;

dropPartitionVal
: identifier (comparisonOperator constant)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive also throws antler errors for the case 2 > partCol1?

test("SPARK-14922: Partition filter is not allowed in ADD PARTITION") {
withTable("sales") {
sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
intercept[AnalysisException] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz check all the error messages above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

@@ -861,7 +861,8 @@ class DDLParserSuite extends PlanTest with SharedSQLContext {
assertUnsupported(sql2_view)

val tableIdent = TableIdentifier("table_name", None)
val expected1_table = AlterTableDropPartitionCommand(

val expected1_table = AlterTableDropPartitionCommand.fromSpecs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests case to check if the parser can accept the comparators added by this pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do, thanks.

import org.apache.spark.sql.catalyst.catalog._
import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference}
import org.apache.spark.sql.catalyst.expressions._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many imports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean here. The list of imports would be very long, as I use, EqualTo, And, Literal, Cast, BinaryComparison, etc. I can list all them, but I am not sure it is worth. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to check if your IDE wrongly folded this import, or not. It's ok.

ctx.dropPartitionVal().asScala.map { pFilter =>
if (pFilter.identifier() == null || pFilter.constant() == null ||
pFilter.comparisonOperator() == null) {
throw new ParseException(s"Invalid partition spec: ${pFilter.getText}", ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for this exception in DDLParserSuite.scala?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do ASAP, thanks.

ctx: DropPartitionSpecContext): Seq[Expression] = {
withOrigin(ctx) {
ctx.dropPartitionVal().asScala.map { pFilter =>
if (pFilter.identifier() == null || pFilter.constant() == null ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no chance pFilter.identifier() == null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. The other 2 conditions can definitely be true, but I am not sure about this. I think it is safer to check it.

Copy link
Member

@maropu maropu Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// running the command. The type is not relevant, it is replaced during the real resolution
val partition =
AttributeReference(pFilter.identifier().getText, StringType)()
val value = Literal(visitStringConstant(pFilter.constant()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val value = Literal(visitStringConstant(pFilter.constant()), StringType) for better readablilty?

Copy link
Contributor Author

@mgaido91 mgaido91 Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will update

sorry, I'd need to do Literal.create(visitStringConstant(pFilter.constant()), StringType). Everywhere else here we are using Literal and not Literal.create. So I am not sure it makes sense to change it, but I have not a strong opinion on this. If you prefer, I can switch to Literal.create. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the check. It's ok to keep the current one.

// We cannot use UnresolvedAttribute because resolution is performed after Analysis, when
// running the command. The type is not relevant, it is replaced during the real resolution
val partition =
AttributeReference(pFilter.identifier().getText, StringType)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks weird and why can we use UnresolvedAttribute in #19691?
https://github.com/apache/spark/pull/19691/files#diff-9847f5cef7cf7fbc5830fbc6b779ee10R293
(Sorry, but probably I miss something?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the answer is in def expressions in QueryPlan. In #19691, we end up with a Seq[(TablePartitionSpec, Seq[Expression])], so the expressions there are not recognized/considered by the Analyzer. In this PR we have Seq[Expression] (which is way cleaner IMHO and address comment https://github.com/apache/spark/pull/19691/files#r193002268), so these expressions are considered by the Analyzer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me have more time to check this behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, how about this approach? (you tried already?) 52506f1

It added unresolved a logical plan (an input relation and filters) for AlterTableDropPartitionCommand, then resolved the plan in AlterTableDropPartitionCommand.run and computed partition specs based on the resolved expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your approach has the same issue, ie. would fail because the LogicalPlan you added is not resolved after analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something here, so sorry if I am not understanding something, but I think the issue is that the analyzer is called anyway before the AlterTableDropPartitionCommand.run command and it fails because of the unresolved attributes. Moreover, in your code I don't see it being part neither of children nor of innerChildren.

I think the alternative here is to add a rule to the analyzer for this, but it seems an overkill to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, if you put Expression in the class fields of AlterTableDropPartitionCommand (https://github.com/apache/spark/pull/20999/files#diff-54979ed5797b4a6193cf663dc23baca5R524), it fails in Analyzer.executeAndCheck. But, if we put LogicalPlan in the class fields, IIUC it doesn't fail there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, now I see, sorry. What about then having a Seq[Filter] instead? In order to avoid the splitDisjunctivePredicates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, looks good to me. But, I'm not sure which one is the right approach, so we'd be better to wait for other reviewer's comments here, too. cc: @gatorsmile @viirya

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks @maropu.

* - Less then or Equal: '<='
* - Greater than: '>'
* - Greater then or Equal: '>='
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive also supports all the comparators above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it does

Copy link
Contributor Author

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments @maropu. I'll update the PR accordingly asap.

;

dropPartitionVal
: identifier (comparisonOperator constant)?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive does throw an error in that case, you mean asking that error is a parsing or another kind of exception?

ctx: DropPartitionSpecContext): Seq[Expression] = {
withOrigin(ctx) {
ctx.dropPartitionVal().asScala.map { pFilter =>
if (pFilter.identifier() == null || pFilter.constant() == null ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. The other 2 conditions can definitely be true, but I am not sure about this. I think it is safer to check it.

ctx.dropPartitionVal().asScala.map { pFilter =>
if (pFilter.identifier() == null || pFilter.constant() == null ||
pFilter.comparisonOperator() == null) {
throw new ParseException(s"Invalid partition spec: ${pFilter.getText}", ctx)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do ASAP, thanks.

// We cannot use UnresolvedAttribute because resolution is performed after Analysis, when
// running the command. The type is not relevant, it is replaced during the real resolution
val partition =
AttributeReference(pFilter.identifier().getText, StringType)()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the answer is in def expressions in QueryPlan. In #19691, we end up with a Seq[(TablePartitionSpec, Seq[Expression])], so the expressions there are not recognized/considered by the Analyzer. In this PR we have Seq[Expression] (which is way cleaner IMHO and address comment https://github.com/apache/spark/pull/19691/files#r193002268), so these expressions are considered by the Analyzer.

// running the command. The type is not relevant, it is replaced during the real resolution
val partition =
AttributeReference(pFilter.identifier().getText, StringType)()
val value = Literal(visitStringConstant(pFilter.constant()))
Copy link
Contributor Author

@mgaido91 mgaido91 Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will update

sorry, I'd need to do Literal.create(visitStringConstant(pFilter.constant()), StringType). Everywhere else here we are using Literal and not Literal.create. So I am not sure it makes sense to change it, but I have not a strong opinion on this. If you prefer, I can switch to Literal.create. Thanks.

* - Less then or Equal: '<='
* - Greater than: '>'
* - Greater then or Equal: '>='
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it does

import org.apache.spark.sql.catalyst.catalog._
import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference}
import org.apache.spark.sql.catalyst.expressions._
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean here. The list of imports would be very long, as I use, EqualTo, And, Literal, Cast, BinaryComparison, etc. I can list all them, but I am not sure it is worth. What do you think?

@@ -861,7 +861,8 @@ class DDLParserSuite extends PlanTest with SharedSQLContext {
assertUnsupported(sql2_view)

val tableIdent = TableIdentifier("table_name", None)
val expected1_table = AlterTableDropPartitionCommand(

val expected1_table = AlterTableDropPartitionCommand.fromSpecs(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do, thanks.

test("SPARK-14922: Partition filter is not allowed in ADD PARTITION") {
withTable("sales") {
sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
intercept[AnalysisException] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

*/
case class PartitioningAttribute(name: String)
extends Attribute with Unevaluable {
override val exprId: ExprId = NamedExpression.newExprId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even it's a fake attribute, we should not change the exprId when this expression gets copied. Can we move exprId to the constructor?

// Not really needed and used. We just need a dataType to be used during analysis for resolving
// the expressions. The String type is used because all the literals in PARTITION operations are
// parsed as strings and eventually casted later.
override def dataType: DataType = StringType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not needed, can we throw exception here? We may need to override toString though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I should improve the comment then. it's misleading: this is actually needed because otherwise we may hit exceptions since the dataType is checked when running checkInputDataTypes of the comparison operator containing it. I'll improve the comment.

* Create a partition specification map with filters.
*/
override def visitDropPartitionSpec(
ctx: DropPartitionSpecContext): Seq[Expression] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we move withOrigin(ctx) here? i.e.

def xxx(): T = withOrigin(ctx) {
  ...
}

partitionColumns,
table.identifier.quotedString,
sparkSession.sessionState.conf.resolver) :: Nil
}
}

catalog.dropPartitions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does hive have an API to drop partitions with a predicate? I think the current approach is very inefficient with non-equal partition predicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the implementation here is similar to how hive implements it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is my understanding. You can check DDLTaks.dropPartitions.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 15, 2018

Test build #97389 has finished for PR 20999 at commit 2085088.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PartitioningAttribute(

@SparkQA
Copy link

SparkQA commented Oct 15, 2018

Test build #97382 has finished for PR 20999 at commit 2085088.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class PartitioningAttribute(

@SparkQA
Copy link

SparkQA commented Oct 15, 2018

Test build #97398 has finished for PR 20999 at commit 146aa32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

any more comments @cloud-fan @maropu @viirya ?

@cloud-fan
Copy link
Contributor

It seems like Hive can drop partitions directly with a partition expression: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java#L1016

can you double check this part?

@mgaido91
Copy link
Contributor Author

Yes, you're right @cloud-fan. I missed that, thanks. I am going to update this with the required change, but it is going to be a huge change. Thanks.

@mgaido91
Copy link
Contributor Author

@cloud-fan I checked the feasibility of doing what you suggested, but unfortunately that method is very hard to be used (and I am not sure how safe it is). The main issues are:

  • it doesn't accept a string for the filters, but its encoding in a byte array. The point is that the encoding is performed by Hive utils and relying on them might be very hard (especially it is hard to track any change Hive makes there across the versions);
  • a minor issue is that also an integer is required there and I have not been able to understand what it represents and therefore how to use it.

I am not sure if then we can go ahead with this solution, even though it is suboptimal and eventually create a new ticket for leveraging the Hive metastore API you mentioned in a separate PR, as it is not going to be trivial to do that. What do you think?

@jinxing64
Copy link

@mgaido91
This is really helpful feature; Thanks working on this.
Is there any progress now ?

@mgaido91
Copy link
Contributor Author

thanks fir your comment @jinxing64.

I am waiting for an answer for the question above. Currently it is a sub-optimal solution, but I think we can go ahead with it as the Hive API is quite hard to be used, so I'd consider the comment by @cloud-fan as an optimization to be done in a followup PR.

If committers can help providing their opinion on this and keeping in review on this, I am active on this PR and it is my goal as well to make this in for 3.0.

@sleep1661
Copy link
Contributor

Is there any progress? @mgaido91 @cloud-fan

@mgaido91
Copy link
Contributor Author

I am waiting for committers' opinion on the topic above...

@SparkQA
Copy link

SparkQA commented Jul 17, 2019

Test build #107795 has finished for PR 20999 at commit 146aa32.

  • This patch fails Java style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2019

Test build #107939 has finished for PR 20999 at commit 25533a0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2019

Test build #107940 has finished for PR 20999 at commit 9676061.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@weixiuli
Copy link
Contributor

It seems like Hive can drop partitions directly with a partition expression: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java#L1016

can you double check this part?

Is there any progress? @mgaido91 @cloud-fan

@mgaido91
Copy link
Contributor Author

@weixiuli see #20999 (comment)

@weixiuli
Copy link
Contributor

@mgaido91 @cloud-fan Thanks working on this, this PR has NOT been updated for a long time, i have some ideas to solve the above question in #26280, PTAL and kindly review.

@github-actions
Copy link

github-actions bot commented Feb 6, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 6, 2020
@github-actions github-actions bot closed this Feb 7, 2020

override lazy val canonicalized: Expression = this.copy(exprId = ExprId(0))

override def withExprId(newExprId: ExprId): Attribute = throw new UnsupportedOperationException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override def sql: String = name

Can make the error message more clear.

partitionColumns,
table.identifier.quotedString,
sparkSession.sessionState.conf.resolver) :: Nil
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check resolvedSpecs here to throw error message if total resolved spec is empty.

@AngersZhuuuu
Copy link
Contributor

@viirya Any more suggestion on this function? It's really useful for warehouse with so many partition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet