Skip to content

API: Allow extra options in LocationProvider.#4760

Closed
jfz wants to merge 1 commit intoapache:masterfrom
jfz:location-provider-options
Closed

API: Allow extra options in LocationProvider.#4760
jfz wants to merge 1 commit intoapache:masterfrom
jfz:location-provider-options

Conversation

@jfz
Copy link
Contributor

@jfz jfz commented May 12, 2022

This PR adds 2 newDataLocation APIs with extra option argument compared to existing ones, so that implementation of these APIs can do further customization of data location logic based on the extra options.

@github-actions github-actions bot added the API label May 12, 2022
@jfz
Copy link
Contributor Author

jfz commented May 12, 2022

This API allows further customization of data location with extra options.
An example would be data writers in multiple regions can pass in option like {region:local-region-1} and get a location for the that particular region.

@kbendick
Copy link
Contributor

kbendick commented May 12, 2022

Looking at this original use case, I would love to see some kind of test class that implements this interface and then uses it somehow. #4751

From your comment and the old PR etc, I can understand what you're trying to get at / what you hope to achieve with this interface addition.

But over time, if this code isn't attache to anything in the repo, we'll probably get a lot of PRs trying to remove it. So at least some small test class that implements these functions and makes use of them (we have tests that mock out files to be written for example) would be beneficial.

@kbendick
Copy link
Contributor

Especially as this code is in api, which we will be adding binary / source semantic compatibility guarantees to coming up.

@jfz jfz force-pushed the location-provider-options branch from 108675b to 6f870ed Compare May 13, 2022 01:18
@jfz
Copy link
Contributor Author

jfz commented May 13, 2022

Fair point! Added a unit test.

@rdblue
Copy link
Contributor

rdblue commented May 13, 2022

@jfz, can you add a description that explains what you're adding in this PR, please?

* @param options options for deciding the location
* @return a fully-qualified location URI for a data file
*/
default String newDataLocation(String filename, Map<String, String> options) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these options coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These options is provided when user wants to write data to data file directly and need a valid location, the values would depend on the specific implementation - the unit tests shows a simple example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still isn't clear to me how these new options are passed through an engine like Spark or Trino. I don't think that we can add this until we understand where this data is coming from.

Copy link
Contributor Author

@jfz jfz May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not particularly designed for well integrated engines, for the sample case in unit test, data writer may want to created data files in local region and add to table with iceberg api like AppendFiles.appendFile, with extra options, we can avoid users hard-coding locations by implementing a custom LocationProvider.
I expect user to get location like this: table.locationProvider().newDataLocation(filename, {region: current_region})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can merge this unless we have a clear idea of where the data is coming from. Otherwise, I think it makes more sense to implement your own location provider. There needs to be a use case around how the extra metadata gets to this new method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can workaround this if this is not generic enough for OSS version, closing this PR, thanks for looking at this.

@jfz jfz closed this Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants