Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Aug 19, 2019

What changes were proposed in this pull request?

Adds support for the V2SessionCatalog for ALTER TABLE statements.
Implementation changes are ~50 loc. The rest is just test refactoring.

Why are the changes needed?

To allow V2 DataSources to plug in through a configurable plugin interface without requiring the explicit use of catalog identifiers, and leverage ALTER TABLE statements.

How was this patch tested?

By re-using existing tests in DataSourceV2SQLSuite.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28668][WIP] Support V2SessionCatalog for ALTER TABLE [SPARK-28668][SQL][WIP] Support V2SessionCatalog for ALTER TABLE Aug 19, 2019
@SparkQA
Copy link

SparkQA commented Aug 19, 2019

Test build #109362 has finished for PR 25502 at commit fd680e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 25, 2019

Test build #109683 has finished for PR 25502 at commit 5b57a4c.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109823 has finished for PR 25502 at commit a925bb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109824 has finished for PR 25502 at commit d034703.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait AlterTableTests extends SharedSparkSession

}
}

test("AlterTable: table does not exist") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests here have been moved to AlterTableTests

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #109831 has finished for PR 25502 at commit 0cdcab6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait AlterTableStatement extends ParsedStatement

@brkyvz
Copy link
Contributor Author

brkyvz commented Aug 28, 2019

cc @cloud-fan @rdblue

}

assert(exc.getMessage.contains("Unsupported table change"))
assert(exc.getMessage.contains("Cannot drop all fields")) // from the implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test dropping all the fields of a struct-type column?

}
}

test("AlterTable: remove table property") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we test updating table property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that these are existing tests. Feel free to address these comments in followup.

@cloud-fan
Copy link
Contributor

LGTM. @brkyvz is there anything still WIP?

@brkyvz brkyvz changed the title [SPARK-28668][SQL][WIP] Support V2SessionCatalog for ALTER TABLE [SPARK-28668][SQL] Support V2SessionCatalog for ALTER TABLE Aug 29, 2019

private def resolveV2Alter(
tableName: Seq[String],
changes: Seq[TableChange]): Option[AlterTable] = {
Copy link
Contributor

@rdblue rdblue Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation to create the lookupV2Relation method, but I don't quite like the effects that it has on these rules because it mixes identifier resolution (which catalog is responsible?) with table resolution (look up the table in a catalog). This leads to some strange changes, like the addition of AlterTableStatement to checkAnalysis that throws "Table or view not found" when the real problem is that the statement wasn't converted to a v1 or v2 plan.

Before this change, only identifier resolution is done. Table resolution is done by the ResolveTables rule so it is fairly well contained. Plans were created with a place-holder UnresolvedRelation, which avoids the need to catch AlterTableStatement in checkAnalysis

But, to add the fallback to v1 for tables that are loaded by the v2 session catalog, we have to look up the table and only convert to the v2 plan if it isn't an UnresolvedTable.

I'm not sure what the right solution is. Maybe instead of avoiding conversion to the v2 plan, we should convert from v2 to v1 for the fallback case. I think that would make the rules cleaner and more orthogonal to one another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a reason to block this commit. I would just like to follow up with a reasonable solution if there isn't a quick fix that can go in this PR. The important thing is getting this PR in so this makes it into 3.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixes identifier resolution (which catalog is responsible?) with table resolution (look up the table in a catalog)
that's fair and makes a lot of sense to keep these separate. I'm in favor of merging this as a first step (which will allow the tests to be there), and then cleaning up the logic, since the logic will look similar for INSERT INTO as well. Since this is already a 900loc PR, I'd like to that in a follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about it before. I think the ideal resolution process is:

  1. rules like ResolveAlterTable are only responsible for converting XYZStatement to v1 or v2 command
  2. ResolveTables and ResolveRelations are responsible for resolving UnresolvedRelation to v1 or v2 relations

However, some commands like ALTER TABLE also need to get the catalog instance, which can't be done by ResolveTables or ResolveRelations. Unlike table resolution which replaces UnresolvedRelation with v1/v2 relation and can be done by a rule separately. Catalog resolution needs to be done during the converting from XYZStatement to v1/v2 command and we can't do it in a separated rule.

I don't have a good idea now but we should definitely revisit it later.

@SparkQA
Copy link

SparkQA commented Aug 30, 2019

Test build #109922 has finished for PR 25502 at commit 1f82198.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 8279693 Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants