Skip to content

NPE in RemoveOrphanFiles with S3FileIO when metadata.json location is at filesystem root #16350

@liuliquan-marshal

Description

@liuliquan-marshal

Apache Iceberg version

1.10.0

Query engine

Spark

Please describe the bug 🐞

Description
When using S3FileIO and RemoveOrphanFiles procedure with parameter prefix_listing=true, a table whose location is a filesystem root (e.g. s3://bucket/) causes a NullPointerException in FileSystemWalker:
java.lang.NullPointerException at org.apache.iceberg.util.FileSystemWalker.isHiddenPath(FileSystemWalker.java:165) at org.apache.iceberg.util.FileSystemWalker.listDirRecursivelyWithFileIO(FileSystemWalker.java:75) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.listedFileDS(DeleteOrphanFilesSparkAction.java:316) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.actualFileIdentDS(DeleteOrphanFilesSparkAction.java:298) at org.apache.iceberg.spark.actions.DeleteOrphanFilesSparkAction.doExecute(DeleteOrphanFilesSparkAction.java:249)

Root Cause
org.apache.hadoop.fs.Path.getParent() returns null when the path is a filesystem root (e.g. s3://bucket/). This leads to the NPE at FileSystemWalker.java:165:
while (currentPath.getParent().toString().contains(baseDir)) {

It's ok when prefix_listing=false because listDirRecursivelyWithHadoop will be invoked rather than listDirRecursivelyWithFileIO.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions