Skip to content

Remove hard-coded Iceberg scan class name detection using type-based check #2221

@guixiaowen

Description

@guixiaowen

Background

The current implementation relies on hard-coded class name strings to detect Iceberg scan types:

`
if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
return None
}

if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
return None
}

if (className != "org.apache.iceberg.spark.source.SparkInputPartition") {
return None
}
`

This approach introduces tight coupling to Iceberg internal class naming and has several drawbacks:

Fragile to upstream refactoring (class/package rename)
Lacks type safety
Hard to maintain and extend

Problem

Auron needs to:

✅ Handle Iceberg batch scans
❌ Exclude changelog scans (row-level CDC not supported)

However, the current logic:

Uses string matching for package detection
Explicitly hard-codes SparkChangelogScan

This makes the code:

Non-robust across Iceberg versions
Semantically unclear

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions