-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File I/O Submodule for TableOperations #12
Comments
This was referenced Nov 27, 2018
rdsr
added a commit
to rdsr/incubator-iceberg
that referenced
this issue
Mar 13, 2020
guilload
added a commit
to guilload/iceberg
that referenced
this issue
Jul 9, 2020
HotSushi
pushed a commit
to HotSushi/iceberg
that referenced
this issue
Jul 31, 2020
bkahloon
pushed a commit
to bkahloon/iceberg
that referenced
this issue
Feb 27, 2021
pavibhai
added a commit
to pavibhai/iceberg
that referenced
this issue
Mar 24, 2023
…and batch reader
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In Netflix/iceberg#107 it was discussed that
InputFile
andOutputFile
instances should be pluggable. We discussed the fact that provision ofInputFile
andOutputFile
instances should be handled by theTableOperations
API. However, the Spark data source in particular only usesHadoopInputFile#fromPath
for reading andHadoopOutputFile#fromPath
for writing. UsingTableOperations#newInputFile
andTableOperations#newOutputFile
, would also be difficult because calling these methods on the executors would requireTableOperations
instances to beSerializable
.We propose having the
TableOperations
API provide aFileIO
module that handles the narrow role of reading, creating / writing, and deleting files. We propose the following:Then the following method would be added to
TableOperations
, and we would removeTableOperations#newInputFile
andTableOperations#newMetadataFile
.The need for
resolveNewMetadataPath
is because the newFileIO
abstraction considers all locations as full paths, but the old methodTableOperations#newMetadataFile
assumes the argument is a file name, not a full path. Therefore now callers that used to callTableOperations#newMetadataFile
should first retrieve the full path and then pass that along toFileIO#newOutputFile
. For convenience we could add a helper default method like so:The text was updated successfully, but these errors were encountered: