Nessie: Extract Spark AppId / User from SparkContext and not from Snapshot by nastra · Pull Request #2664 · apache/iceberg

nastra · 2021-06-02T14:10:53Z

No description provided.

nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java

kbendick · 2021-06-02T22:18:10Z

This might wind up being related to #2607, so dropping a link for the reference back 🙂 . Thanks for your contributions @nastra!

kbendick · 2021-06-02T22:20:36Z

I believe that only a subset of that config is being passed down. Only the spark.sql.catalog.catalog_name.* options get passed through.

Also if you have any thoughts on allowing passthrough of options to catalogs @rymurr, please feel free to comment on this issue: #2607

The issue is specifically about hive options but seems like in general the Nessie catalog might have related concerns based on the above quote. cc @nastra as well if you have input.

nastra · 2021-06-07T14:31:16Z

@kbendick thanks for mentioning #2607. I've read through it and I don't have any better suggestions than what was already mentioned on #2607.

Re the similary of #2607 and #2664: both are kind of related, but different at the same time. #2607 is more about overriding hadoop config stuff, whereas #2664 is about extracting some info at some point and passing it down via the existing properties.

nastra · 2021-06-07T14:34:26Z

@rdblue given that this approach came up in https://github.com/apache/iceberg/pull/1587/files#r520825564, would you also like to weigh in here?
We were just discussing with @rymurr whether there's a chance to make the proposed approach from #2664 in a way that is more widely applicable. So far I haven't been able to come up with a better idea unfortunately.

nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java

nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java

rdblue · 2021-06-10T00:39:22Z

nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java

-    return new NessieTableOperations(NessieUtil.toKey(pti.tableIdentifier()), newReference, client, fileIO);
+    NessieTableOperations tableOperations =
+        new NessieTableOperations(NessieUtil.toKey(pti.tableIdentifier()), newReference, client, fileIO);
+    // TODO: is there a better way to pass the catalog options to the TableOperations than this?


Add it to the constructor? The constructor is public, but I don't think there is a need for it to be public. You could create a package-private one that is called here instead. (Maybe @rymurr can also comment)

for some reason I assumed that the signature of the constructor was defined and shouldn't be changed. Definitely better to pass this via the constructor

@rymurr, is there a reason why the constructor was public?

I think it was public because it was created before CatalogUtil.loadCatalog. No real reason for it to be public now.

ahhh excuse me, you meant the TableOps. That was just a mistake I suspect. Even less of a reason for that to be public.

spark3/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

…ot from Snapshot As mentioned in https://github.com/apache/iceberg/pull/1587/files#r520825564 it is probably better to extract the necessary Spark info directly from the `SparkContext` and pass it down via the `CaseInsensitiveStringMap`

…t and not from Snapshot

nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java

spark3/src/main/java/org/apache/iceberg/spark/SparkCatalog.java

…t and not from Snapshot

rdblue

Looks good to me, now. I'll leave open for a little while to give @rymurr a chance to reply and commit if he is okay with the changes.

rymurr

LGTM, thanks @nastra and @rdblue for the feedback!

nastra mentioned this pull request Jun 2, 2021

Nessie: Set author=iceberg when performing a commit #2662

Merged

nastra force-pushed the extract-spark-appid-user branch from f69d25b to 656185c Compare June 2, 2021 14:16

nastra changed the title ~~Extract spark appid user~~ Nessie: Extract Spark AppId / User from SparkContext and not from Snapshot Jun 2, 2021

rymurr reviewed Jun 2, 2021

View reviewed changes

nessie/src/main/java/org/apache/iceberg/nessie/NessieCatalog.java Show resolved Hide resolved

nastra force-pushed the extract-spark-appid-user branch from 656185c to 80f5692 Compare June 2, 2021 14:37

github-actions bot added NESSIE spark labels Jun 2, 2021

nastra force-pushed the extract-spark-appid-user branch from 80f5692 to 95fc018 Compare June 2, 2021 16:38

nastra mentioned this pull request Jun 2, 2021

Nessie support for core #1587

Merged