Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener #27030

Closed
wants to merge 9 commits into from

Conversation

ayudovin
Copy link
Contributor

What changes were proposed in this pull request?

Add events to CREATE, DROP, RENAME and ALTER partitions in ExternalCatalogWithListener.

Why are the changes needed?

This changes needs for adding hooks for partitions in ExternalCatalogWithListener.

Does this PR introduce any user-facing change?

No

How was this patch tested?

This functionality is covered by unit tests.

@ayudovin ayudovin changed the title [SPARK-30244] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener [SPARK-30244][Resource-Manager][ Kubernetes] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener Dec 28, 2019
@ayudovin ayudovin changed the title [SPARK-30244][Resource-Manager][ Kubernetes] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener [SPARK-30244][Catalyst] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener Dec 28, 2019
@ayudovin ayudovin changed the title [SPARK-30244][Catalyst] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener [SPARK-30244][SQL][Catalyst] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener Dec 28, 2019
@ayudovin
Copy link
Contributor Author

ayudovin commented Jan 4, 2020

@hvanhovell, Could you please review this pull request?

@ayudovin
Copy link
Contributor Author

ayudovin commented Jan 9, 2020

cc @dongjoon-hyun

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30244][SQL][Catalyst] - Emit pre/post events for "Partition" methods in ExternalCatalogWithListener [SPARK-30244][SQL] Emit pre/post events for "Partition" methods in ExternalCatalogWithListener Jan 9, 2020
@dongjoon-hyun
Copy link
Member

ok to test

* Event fired before a partition is created.
*/
case class CreatePartitionPreEvent(database: String, name: String,
parts: Seq[CatalogTablePartition]) extends PartitionEvent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation? You had better follow the existing style used in this file.

@@ -202,3 +203,62 @@ case class RenameFunctionEvent(
name: String,
newName: String)
extends FunctionEvent

trait PartitionEvent extends DatabaseEvent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Partition belongs to Table, TableEvent is better than DatabaseEvent?

/**
* Name of the table that was touched.
*/
val name: String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parts: Seq[CatalogTablePartition] instead of val name: String? If we extends TableEvent, Table name is already inherited in TableEvent.

Copy link
Contributor Author

@ayudovin ayudovin Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to use parts: Seq[TablePartitionSpec]?
CatalogTablePartition contains TablePartitionSpec inside.

  • Create and Alter partition have as parameter Seq[CatalogTablePartition]
  • Rename and Drop partition have as parameter Seq[TablePartitionSpec]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I was referring only a partial part of your code. In that case, we don't need anything here in PartitionEvent.

parts: Seq[CatalogTablePartition]) extends PartitionEvent
case class CreatePartitionPreEvent(
database: String,
name: String) extends PartitionEvent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur? What I meant was we don't need to have the partition information at PartitionEvent. We need partition information in this event, don't we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the indentation, shall we put extends PartitionEvent into a new line?

Copy link
Contributor Author

@ayudovin ayudovin Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to create such a event:

 case class CreatePartitionPreEvent(
    database: String,
    name: String,
    parts: Seq[CatalogTablePartition]) extends PartitionEvent

but PartitionEvent will be empty ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put extends PartitionEvent into a new line

yes, I'll put it into a new line.

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116410 has finished for PR 27030 at commit 5c242dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait PartitionEvent extends DatabaseEvent
  • case class CreatePartitionPreEvent(database: String, name: String,
  • case class CreatePartitionEvent(database: String, name: String,
  • case class DropPartitionPreEvent(database: String, name: String,
  • case class DropPartitionEvent(database: String, name: String,
  • case class RenamePartitionPreEvent(database: String, name: String,
  • case class RenamePartitionEvent(database: String, name: String,
  • case class AlterPartitionPreEvent(database: String, name: String,
  • case class AlterPartitionEvent(database: String, name: String,

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Test build #116418 has finished for PR 27030 at commit af04a62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreatePartitionPreEvent(
  • case class CreatePartitionEvent(
  • case class DropPartitionPreEvent(
  • case class DropPartitionEvent(
  • case class RenamePartitionPreEvent(
  • case class RenamePartitionEvent(
  • case class AlterPartitionPreEvent(
  • case class AlterPartitionEvent(

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Test build #116419 has finished for PR 27030 at commit 0c5edc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Test build #116464 has finished for PR 27030 at commit 3cb79fe.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Test build #116492 has finished for PR 27030 at commit 4d5ba9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val dbDefinition = createDbDefinition()

val storage = CatalogStorageFormat.empty.copy(
locationUri = Option(uri1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this table storage pointing db path? Maybe, do you want the following?

    val path1 = Files.createTempDirectory("db_")
    val path2 = Files.createTempDirectory(path1, "tbl_")
    val uri1 = preparePath(path1)
    val uri2 = preparePath(path2)

    // CREATE
    val dbDefinition = createDbDefinition(uri1)

    val storage = CatalogStorageFormat.empty.copy(
      locationUri = Option(uri2))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116570 has finished for PR 27030 at commit e1220e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

catalog.createTable(tableDefinition, ignoreIfExists = false)
checkEvents(CreateTablePreEvent("db5", "tbl1") :: CreateTableEvent("db5", "tbl1") :: Nil)

catalog.createPartitions("db5", "tbl1", Seq(partition), ignoreIfExists = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, is this a valid test case? What happens when we try to create CatalogTypes.emptyTablePartitionSpec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with you, CatalogTypes.emptyTablePartitionSpec is not good option for unit test. I have changed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite this, I think that empty partition creation should also produce the event.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it happen in the real world? Can you give me an SQL example to create the empty partition?

Despite this, I think that empty partition creation should also produce the event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I cannot find an example, that's why I also think it isn't happening in the real world.

val newPartition = CatalogTablePartition(spec = Map("key1" -> "1", "key2" -> "2"),
storage = CatalogStorageFormat.empty)

catalog.renamePartitions("db5", "tbl1", Seq(partition.spec), Seq(newPartition.spec))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also sounds like empty spec to key1=1 , key2=2. Is this valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it looks like key=0 to key1=1, key2=2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me the real SQL statement for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I could not find the real SQL statement for that, that's why I changed this unit tests.

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116661 has finished for PR 27030 at commit 43aeb9b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jan 15, 2020

Test build #116758 has finished for PR 27030 at commit 8da2c55.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Please don't forget the above comment.

Could you give me the real SQL statement for that?

@ayudovin ayudovin closed this Jan 17, 2020
@ayudovin ayudovin reopened this Jan 17, 2020
@ayudovin
Copy link
Contributor Author

ayudovin commented Jan 17, 2020

Please don't forget the above comment.

Could you give me the real SQL statement for that?

@dongjoon-hyun, sorry, I have added a comment

@ayudovin
Copy link
Contributor Author

Hi @dongjoon-hyun , Do you have still any suggestions or comments on this pull request?

@dongjoon-hyun
Copy link
Member

Sorry for being late, @ayudovin . I'll try to review again tonight.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117321 has finished for PR 27030 at commit 8da2c55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


catalog.createPartitions("db5", "tbl1", Seq(partition), ignoreIfExists = true)
checkEvents(CreatePartitionPreEvent("db5", "tbl1", Seq(partition)) ::
CreatePartitionEvent("db5", "tbl1", Seq(partition)) :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, is this correct? We already created the same partition at line 240 and we tried with ignoreIfExists=true here. I guess CreatePartitionEvent should not occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, it looks incorrect, but we can not check if the partition has been created or not created in case of using ignoreIfExists=true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If then, this is not helpful in that case because we cannot distinguish the difference.

RenamePartitionEvent("db5", "tbl1", Seq(partition.spec), Seq(newPartition.spec)) ::
Nil)

// ALTER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALTER nothing? In this case, I guess no event might be natural.
BTW, if there is no valid SQL query for this, I guess we can remove this test case.

intercept[AnalysisException] {
catalog.dropPartitions("db5", "tbl2", Seq(newPartition.spec),
ignoreIfNotExists = false, purge = true, retainData = false)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always a good practice to check the error message. Sometime, AnalysisException is changed unexpectedly.

@dongjoon-hyun
Copy link
Member

Also, I'm worrying the situation where this implementation causes a sabotage on the event message pipeline. Sometimes, the changed number of partition can be huge in the production.

Hi, @hvanhovell . How do you think about this PR? Could you give us some advice?

@SparkQA
Copy link

SparkQA commented Jan 25, 2020

Test build #117401 has finished for PR 27030 at commit 30686b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 26, 2020

@ayudovin . The current status is that we are waiting the original author of this framework, @hvanhovell . We need to check whether this is not added intentionally or not. Also, #27030 (comment) is another concern no this approach because it can be misleading.

@ayudovin
Copy link
Contributor Author

@ayudovin . The current status is that we are waiting the original author of this framework, @hvanhovell . We need to check whether this is not added intentionally or not. Also, #27030 (comment) is another concern no this approach because it can be misleading.

I have got, It makes sense

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

WDYT @hvanhovell?

@SparkQA
Copy link

SparkQA commented Feb 25, 2020

Test build #118893 has finished for PR 27030 at commit 30686b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @gatorsmile and @hvanhovell .
How do you think about this extention?

@wankunde
Copy link
Contributor

wankunde commented May 2, 2020

@ayudovin @dongjoon-hyun @HyukjinKwon
Can you emit LoadPartitionEvent and LoadDynamicPartitionsEvent ?
Now the user does not know which partitions are loaded, if LoadDynamicPartitionsEvent events are emitted to the listener, the user will know the changed partitions and add a custom callback function.

  def calculateDynamicPartitions(
      loadPath: Path,
      partitionSpec: TablePartitionSpec,
      numDP: Int): Array[TablePartitionSpec] = {
    assert(numDP > 0)
    try {
      val fs = loadPath.getFileSystem(new Configuration())
      fs.listStatus(loadPath).flatMap { fileStatus =>
        val dirName = fileStatus.getPath.getName
        if (dirName.contains("=")) {
          val Array(partitionColumn, partitionValue) = dirName.split("=")
          val newPartitionSpec = partitionSpec + (partitionColumn -> partitionValue)
          if (numDP == 1) {
            Array[TablePartitionSpec](newPartitionSpec)
          } else {
            calculateDynamicPartitions(fileStatus.getPath, newPartitionSpec, numDP - 1)
          }
        } else {
          Array[TablePartitionSpec]()
        }
      }
    } catch {
      case ex: Exception => Array[TablePartitionSpec]()
    }
  }	 
}

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@obogobo
Copy link

obogobo commented Sep 9, 2020

would love a status update on this 😄 is it still possible for Spark 3.1.0? (current tag on the ticket)

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 19, 2020
@github-actions github-actions bot closed this Dec 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants