Support new parameter includeHistoricalMetadata for queryTableChange RPC #214

linzhou-db · 2022-11-14T21:45:24Z

A couple changes:

Support new parameter includeHistoricalMetadata for queryTableChanges.
Update the way SparkStructuredStreaming is put in user agent header.
Added two more tests on service side to verify additional metadata is only returned for queryTableChanges from spark streaming.
Update to real id in DeltaSharingRestClientSuite.scala

zsxwing · 2022-11-15T18:26:22Z

Any reason you need to change these ids? They should be deterministic?

linzhou-db · 2022-11-15T18:41:24Z

Any reason you need to change these ids? They should be deterministic?

Not often, it's just during streaming development, I wanna change the table properties of one table that's already in test, instead of creating another one. And then broke the test, which makes me wonder the id doesn't "have to" be fixed, we are more testing the e2e workflow and functionality.
I'm ok either way. If you prefer keeping the ids, I'll just fix that single test, and revert other changes in the PR.

chakankardb · 2022-11-15T23:59:09Z

I don't have a strong opinion... ok with what you and Ryan decide.

zhuansunxt · 2022-11-16T00:35:11Z

I agree with Ryan that ID should be deterministic. If we need a table with different config we should just create a new one for testing.

linzhou-db · 2022-11-16T01:35:59Z

I reverted the id changes, just fixing the failed tests.

linzhou-db · 2022-11-16T08:02:26Z

Hi all,
This change now is less about "id" in test, but more about streaming logic change. Will look forward more for approval from @chakankardb

spark/src/test/scala/io/delta/sharing/spark/TestDeltaSharingClient.scala

spark/src/test/scala/io/delta/sharing/spark/RemoteDeltaLogSuite.scala

zsxwing · 2022-11-16T18:21:56Z

server/src/test/scala/io/delta/sharing/server/DeltaSharingServiceSuite.scala

    val connection = new URL(url).openConnection().asInstanceOf[HttpsURLConnection]
    connection.setRequestProperty("Authorization", s"Bearer ${TestResource.testAuthorizationToken}")
+    if (isStreamingQuery) {
+      connection.setRequestProperty("User-Agent", "SparkStructuredStreaming")


What is this for? Why does the server need to know about the purpose of the client?

For the server to have different behavior based on the client(i.e., return metadata for queryTableChanges), in order to not break old client.

@chakankardb Had a discussion with Ryan, it's not a good idea to leverage user-agent for different server behavior on RPC request, I'll add a parameter such as 'returnMetadata' for queryTableChanges, and update both Databricks and OSS server/client.

user-agent can still be used for usage tracking, and gate DFF (as it will be removed eventually).

I'll update this PR and send it later.

@zsxwing Can you provide the reasoning here (for documentation)?

User-Agent is usually informational and it's weird to use it as a parameter to request different results. In addition, SparkStructuredStreaming is a special system, but our network protocol should be designed for all systems.

Another point Ryan mentioned is that for column mapping, we may also need to pass metadata to client to render dataframe correctly, which is not a streaming only use case.

… SC-115797

linzhou-db · 2022-11-17T19:22:15Z

The PR is updated with what we discussed, please review again, thanks!

chakankardb · 2022-11-18T22:34:47Z

server/src/main/scala/io/delta/sharing/server/DeltaSharingService.scala

      @Param("share") share: String,
      @Param("schema") schema: String,
      @Param("table") table: String,
      @Param("startingVersion") @Nullable startingVersion: String,
      @Param("endingVersion") @Nullable endingVersion: String,
      @Param("startingTimestamp") @Nullable startingTimestamp: String,
-      @Param("endingTimestamp") @Nullable endingTimestamp: String): HttpResponse = processRequest {
+      @Param("endingTimestamp") @Nullable endingTimestamp: String,
+      @Param("returnMetadata") @Nullable returnMetadata: String): HttpResponse = processRequest {


should we call this returnMetadataChanges since queryTableChanges already returns the latest metadata ?

I don't know.
I'm not fully happy with returnMetadata either. But returnMetadataChanges sounds like the changes of metadata, which we are returning is just the metadata it self. They are the additional metadata seen in the deltaLog.
returnAdditionalMetadata? not as good as returnMetadata even..

chakankardb · 2022-11-18T22:36:45Z

server/src/test/scala/io/delta/sharing/server/DeltaSharingServiceSuite.scala

@@ -426,7 +442,7 @@ class DeltaSharingServiceSuite extends FunSuite with BeforeAndAfterAll {
  integrationTest("getTableVersion - get exceptions") {
    // timestamp can be any string here, it's resolved in DeltaSharedTableLoader
    assertHttpError(
-      url = requestPath("/shares/share2/schemas/default/tables/table2?startingTimestamp=abc"),
+      url = requestPath("/shares/share2/schemas/default/tables/table2/version?startingTimestamp=abc"),


Question: why is version needed?

This is to leave the endpoint without /version for getTable. And this is one of the missing tests not updated in that PR.
Now getTable is not needed, I think it's a good idea to have /version suffix for getTableVersion.

chakankardb · 2022-11-18T22:42:08Z

spark/src/main/scala/io/delta/sharing/spark/DeltaSharingClient.scala

  // to recognize the request for streaming, and take corresponding actions.
  private def getUserAgent(): String = {
-    DeltaSharingRestClient.USER_AGENT + (if (forStreaming) {
-      s" ${DeltaSharingRestClient.SPARK_STRUCTURED_STREAMING}/$STREAMING_VERSION"
+    val sparkAgent = if (forStreaming) {


Why is this still needed? Is it to track the streaming requests on the provider side?

yes. 1. tracking in the usage log 2. gate traffic during private preview.

Fix tests

2fa8bc6

linzhou-db requested review from zsxwing, chakankardb and zhuansunxt November 14, 2022 21:45

merge with main

400c0bb

linzhou-db added 3 commits November 15, 2022 16:54

fix tests

99f8367

fix tests

bd3d16a

revert id changes

61301a2

linzhou-db changed the title ~~Update tests: do not checking id of returned object.~~ Fix failed integration test Nov 16, 2022

linzhou-db added 3 commits November 15, 2022 22:40

revert unnecessary changes

f3ef692

fix lint

a47135a

fix tests

3140eb6

linzhou-db changed the title ~~Fix failed integration test~~ Client changes to handle returned timestamp for queryTable with version Nov 16, 2022

linzhou-db removed request for zsxwing and zhuansunxt November 16, 2022 08:02

linzhou-db self-assigned this Nov 16, 2022

chakankardb reviewed Nov 16, 2022

View reviewed changes

spark/src/test/scala/io/delta/sharing/spark/TestDeltaSharingClient.scala Outdated Show resolved Hide resolved

resolve comments

e854293

linzhou-db requested a review from chakankardb November 16, 2022 17:59

chakankardb approved these changes Nov 16, 2022

View reviewed changes

spark/src/test/scala/io/delta/sharing/spark/RemoteDeltaLogSuite.scala Show resolved Hide resolved

zsxwing reviewed Nov 16, 2022

View reviewed changes

add more comments

10a0d15

linzhou-db requested a review from zsxwing November 16, 2022 19:20

Merge branch 'main' of https://github.com/delta-io/delta-sharing into…

b5730be

… SC-115797

change to returnMetadata

9918f62

linzhou-db changed the title ~~Client changes to handle returned timestamp for queryTable with version~~ Support new parameter returnMetadata for queryTableChange RPC Nov 17, 2022

fix tests

77ea3a8

linzhou-db requested a review from chakankardb November 17, 2022 19:21

chakankardb reviewed Nov 18, 2022

View reviewed changes

update to includeHistoricalMetadata

0375f65

linzhou-db changed the title ~~Support new parameter returnMetadata for queryTableChange RPC~~ Support new parameter includeHistoricalMetadata for queryTableChange RPC Nov 18, 2022

linzhou-db requested a review from chakankardb November 18, 2022 23:56

fix lint

44995ad

chakankardb approved these changes Nov 19, 2022

View reviewed changes

linzhou-db merged commit c477d2d into main Nov 19, 2022

linzhou-db deleted the SC-115797 branch February 16, 2023 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support new parameter includeHistoricalMetadata for queryTableChange RPC #214

Support new parameter includeHistoricalMetadata for queryTableChange RPC #214

linzhou-db commented Nov 14, 2022 •

edited

zsxwing commented Nov 15, 2022

linzhou-db commented Nov 15, 2022

chakankardb commented Nov 15, 2022 •

edited

zhuansunxt commented Nov 16, 2022

linzhou-db commented Nov 16, 2022

linzhou-db commented Nov 16, 2022

zsxwing Nov 16, 2022

linzhou-db Nov 16, 2022

linzhou-db Nov 17, 2022

chakankardb Nov 17, 2022

zsxwing Nov 17, 2022

linzhou-db Nov 17, 2022

linzhou-db commented Nov 17, 2022

chakankardb Nov 18, 2022

linzhou-db Nov 18, 2022

chakankardb Nov 18, 2022

linzhou-db Nov 18, 2022

chakankardb Nov 18, 2022

linzhou-db Nov 18, 2022

Support new parameter includeHistoricalMetadata for queryTableChange RPC #214

Support new parameter includeHistoricalMetadata for queryTableChange RPC #214

Conversation

linzhou-db commented Nov 14, 2022 • edited

zsxwing commented Nov 15, 2022

linzhou-db commented Nov 15, 2022

chakankardb commented Nov 15, 2022 • edited

zhuansunxt commented Nov 16, 2022

linzhou-db commented Nov 16, 2022

linzhou-db commented Nov 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linzhou-db commented Nov 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linzhou-db commented Nov 14, 2022 •

edited

chakankardb commented Nov 15, 2022 •

edited