Apache Iceberg version
1.10.0
Query engine
Spark
Please describe the bug 🐞
Description
When using S3FileIO and RemoveOrphanFiles procedure with parameter prefix_listing=true, a table whose location is a filesystem root (e.g. s3://bucket/) causes a NullPointerException in FileSystemWalker:
java.lang.NullPointerException at org.apache.iceberg.util.FileSystemWalker.isHiddenPath(FileSystemWalker.java:165) at org.apache.iceberg.util.FileSystemWalker.listDirRecursivelyWithFileIO(FileSystemWalker.java:75) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.listedFileDS(DeleteOrphanFilesSparkAction.java:316) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.actualFileIdentDS(DeleteOrphanFilesSparkAction.java:298) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.doExecute(DeleteOrphanFilesSparkAction.java:249)
Root Cause
org.apache.hadoop.fs.Path.getParent() returns null when the path is a filesystem root (e.g. s3://bucket/). This leads to the NPE at FileSystemWalker.java:165:
while (currentPath.getParent().toString().contains(baseDir)) {
It's ok when prefix_listing=false because listDirRecursivelyWithHadoop will be invoked rather than listDirRecursivelyWithFileIO.
Willingness to contribute
Apache Iceberg version
1.10.0
Query engine
Spark
Please describe the bug 🐞
Description
When using S3FileIO and RemoveOrphanFiles procedure with parameter prefix_listing=true, a table whose location is a filesystem root (e.g. s3://bucket/) causes a NullPointerException in FileSystemWalker:
java.lang.NullPointerException at org.apache.iceberg.util.FileSystemWalker.isHiddenPath(FileSystemWalker.java:165) at org.apache.iceberg.util.FileSystemWalker.listDirRecursivelyWithFileIO(FileSystemWalker.java:75) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.listedFileDS(DeleteOrphanFilesSparkAction.java:316) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.actualFileIdentDS(DeleteOrphanFilesSparkAction.java:298) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.doExecute(DeleteOrphanFilesSparkAction.java:249)Root Cause
org.apache.hadoop.fs.Path.getParent() returns null when the path is a filesystem root (e.g. s3://bucket/). This leads to the NPE at FileSystemWalker.java:165:
while (currentPath.getParent().toString().contains(baseDir)) {It's ok when prefix_listing=false because listDirRecursivelyWithHadoop will be invoked rather than listDirRecursivelyWithFileIO.
Willingness to contribute