Skip to content

Nessie: Extract Spark AppId / User from SparkContext and not from Snapshot#2664

Merged
rymurr merged 3 commits intoapache:masterfrom
nastra:extract-spark-appid-user
Jun 10, 2021
Merged

Nessie: Extract Spark AppId / User from SparkContext and not from Snapshot#2664
rymurr merged 3 commits intoapache:masterfrom
nastra:extract-spark-appid-user

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Jun 2, 2021

No description provided.

@nastra nastra force-pushed the extract-spark-appid-user branch from f69d25b to 656185c Compare June 2, 2021 14:16
@nastra nastra changed the title Extract spark appid user Nessie: Extract Spark AppId / User from SparkContext and not from Snapshot Jun 2, 2021
@nastra nastra force-pushed the extract-spark-appid-user branch from 656185c to 80f5692 Compare June 2, 2021 14:37
@nastra nastra force-pushed the extract-spark-appid-user branch from 80f5692 to 95fc018 Compare June 2, 2021 16:38
@nastra nastra mentioned this pull request Jun 2, 2021
@kbendick
Copy link
Contributor

kbendick commented Jun 2, 2021

This might wind up being related to #2607, so dropping a link for the reference back 🙂 . Thanks for your contributions @nastra!

@kbendick
Copy link
Contributor

kbendick commented Jun 2, 2021

I believe that only a subset of that config is being passed down. Only the spark.sql.catalog.catalog_name.* options get passed through.

Also if you have any thoughts on allowing passthrough of options to catalogs @rymurr, please feel free to comment on this issue: #2607

The issue is specifically about hive options but seems like in general the Nessie catalog might have related concerns based on the above quote. cc @nastra as well if you have input.

@nastra
Copy link
Contributor Author

nastra commented Jun 7, 2021

@kbendick thanks for mentioning #2607. I've read through it and I don't have any better suggestions than what was already mentioned on #2607.

Re the similary of #2607 and #2664: both are kind of related, but different at the same time. #2607 is more about overriding hadoop config stuff, whereas #2664 is about extracting some info at some point and passing it down via the existing properties.

@nastra
Copy link
Contributor Author

nastra commented Jun 7, 2021

@rdblue given that this approach came up in https://github.com/apache/iceberg/pull/1587/files#r520825564, would you also like to weigh in here?
We were just discussing with @rymurr whether there's a chance to make the proposed approach from #2664 in a way that is more widely applicable. So far I haven't been able to come up with a better idea unfortunately.

return new NessieTableOperations(NessieUtil.toKey(pti.tableIdentifier()), newReference, client, fileIO);
NessieTableOperations tableOperations =
new NessieTableOperations(NessieUtil.toKey(pti.tableIdentifier()), newReference, client, fileIO);
// TODO: is there a better way to pass the catalog options to the TableOperations than this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add it to the constructor? The constructor is public, but I don't think there is a need for it to be public. You could create a package-private one that is called here instead. (Maybe @rymurr can also comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason I assumed that the signature of the constructor was defined and shouldn't be changed. Definitely better to pass this via the constructor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rymurr, is there a reason why the constructor was public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was public because it was created before CatalogUtil.loadCatalog. No real reason for it to be public now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh excuse me, you meant the TableOps. That was just a mistake I suspect. Even less of a reason for that to be public.

…ot from Snapshot

As mentioned in
https://github.com/apache/iceberg/pull/1587/files#r520825564 it is
probably better to extract the necessary Spark info directly from the
`SparkContext` and pass it down via the `CaseInsensitiveStringMap`
@nastra nastra force-pushed the extract-spark-appid-user branch from 95fc018 to f6bf1ac Compare June 10, 2021 15:24
@nastra nastra force-pushed the extract-spark-appid-user branch from f6bf1ac to 94cc6dd Compare June 10, 2021 15:31
Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, now. I'll leave open for a little while to give @rymurr a chance to reply and commit if he is okay with the changes.

@github-actions github-actions bot added the core label Jun 10, 2021
Copy link
Contributor

@rymurr rymurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @nastra and @rdblue for the feedback!

@rymurr rymurr merged commit 774b3f0 into apache:master Jun 10, 2021
@nastra nastra deleted the extract-spark-appid-user branch June 11, 2021 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants