API: Allow extra options in LocationProvider.#4760
Conversation
|
This API allows further customization of data location with extra options. |
|
Looking at this original use case, I would love to see some kind of test class that implements this interface and then uses it somehow. #4751 From your comment and the old PR etc, I can understand what you're trying to get at / what you hope to achieve with this interface addition. But over time, if this code isn't attache to anything in the repo, we'll probably get a lot of PRs trying to remove it. So at least some small test class that implements these functions and makes use of them (we have tests that mock out files to be written for example) would be beneficial. |
|
Especially as this code is in |
108675b to
6f870ed
Compare
|
Fair point! Added a unit test. |
|
@jfz, can you add a description that explains what you're adding in this PR, please? |
| * @param options options for deciding the location | ||
| * @return a fully-qualified location URI for a data file | ||
| */ | ||
| default String newDataLocation(String filename, Map<String, String> options) { |
There was a problem hiding this comment.
Where are these options coming from?
There was a problem hiding this comment.
These options is provided when user wants to write data to data file directly and need a valid location, the values would depend on the specific implementation - the unit tests shows a simple example.
There was a problem hiding this comment.
It still isn't clear to me how these new options are passed through an engine like Spark or Trino. I don't think that we can add this until we understand where this data is coming from.
There was a problem hiding this comment.
It's not particularly designed for well integrated engines, for the sample case in unit test, data writer may want to created data files in local region and add to table with iceberg api like AppendFiles.appendFile, with extra options, we can avoid users hard-coding locations by implementing a custom LocationProvider.
I expect user to get location like this: table.locationProvider().newDataLocation(filename, {region: current_region})
There was a problem hiding this comment.
I don't think we can merge this unless we have a clear idea of where the data is coming from. Otherwise, I think it makes more sense to implement your own location provider. There needs to be a use case around how the extra metadata gets to this new method.
There was a problem hiding this comment.
We can workaround this if this is not generic enough for OSS version, closing this PR, thanks for looking at this.
This PR adds 2
newDataLocationAPIs with extraoptionargument compared to existing ones, so that implementation of these APIs can do further customization of data location logic based on the extra options.