Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error happened after deleting a partitioned column #5399

Closed
lvyanquan opened this issue Jul 31, 2022 · 5 comments
Closed

Error happened after deleting a partitioned column #5399

lvyanquan opened this issue Jul 31, 2022 · 5 comments
Labels
bug Something isn't working core

Comments

@lvyanquan
Copy link
Contributor

error message:

 Caused by: java.lang.NullPointerException: Cannot find source column: 3
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:953) ~[iceberg-bundled-guava-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:503) ~[iceberg-api-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.buildFromJsonFields(PartitionSpecParser.java:155) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:78) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:357) ~[iceberg-core-0.13.2.jar:na]
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:288) ~[iceberg-core-0.13.2.jar:na]

json of metadata file contains information of schemas/partition-specs/sort-orders.
But there is no link between schemas and partition-specs, thus deleting a partitioned column will raise error while building history partition-specs, because source-id could not be found in current schema. I think that schema-id should be add to json of partition-specs.
part of metadata file:

    "last-column-id":3,
    "current-schema-id":1,
    "schemas":[
        {
            "type":"struct",
            "schema-id":0,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":3,
                    "name":"name3",
                    "required":false,
                    "type":"string"
                }
            ]
        },
        {
            "type":"struct",
            "schema-id":1,
            "fields":[
                {
                    "id":1,
                    "name":"name1",
                    "required":false,
                    "type":"string"
                },
                {
                    "id":2,
                    "name":"name2",
                    "required":false,
                    "type":"string"
                }
            ]
        }
    ],
    "default-spec-id":1,
    "partition-specs":[
        {
            "spec-id":0,
            "fields":[
                {
                    "name":"name3",
                    "transform":"identity",
                    "source-id":3,
                    "field-id":1000
                }
            ]
        },
        {
            "spec-id":1,
            "fields":[

            ]
        }
    ],
    "last-partition-id":1000
@ajantha-bhat
Copy link
Member

@lvyanquan:

Do you have a testcase or sample SQL to reproduce this?
As I want to know when do we build the history partition-specs

@lvyanquan
Copy link
Contributor Author

@lvyanquan:

Do you have a testcase or sample SQL to reproduce this? As I want to know when do we build the history partition-specs

we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:

CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2');

ALTER TABLE prod.db.sample DROP PARTITION FIELD category;

ALTER TABLE prod.db.sample DROP COLUMN category;
Even though I deleted this column using JAVA API, I met NullPointerException when using this table.

@ajantha-bhat
Copy link
Member

Update:
Just a different exception in the latest code. But the problem still exist

Cannot find source column for partition field: 1000: category: identity(3)
org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: category: identity(3)
	at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
	at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:558)
	at org.apache.iceberg.PartitionSpec$Builder.build(PartitionSpec.java:546)
	at org.apache.iceberg.UnboundPartitionSpec.bind(UnboundPartitionSpec.java:45)
	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:85)
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:390)
	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:311)
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:274)
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:267)
	at org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:98)
	at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:121)
	at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84)
	at org.apache.iceberg.BaseTable.properties(BaseTable.java:119)
	at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:128)
	at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:118)
	at org.apache.iceberg.spark.SparkCatalog.alterTable(SparkCatalog.java:290)

@nastra
Copy link
Contributor

nastra commented Sep 21, 2022

just fyi that we're tracking the same issue in #5676

@nastra nastra added bug Something isn't working core labels Sep 21, 2022
@Fokko
Copy link
Contributor

Fokko commented Sep 21, 2022

Hey all, I have a PR ready: #5707 This doesn't lookup the historical columns anymore.

Fokko added a commit to Fokko/iceberg that referenced this issue Sep 21, 2022
If a fields is being deleted that used to be part of a partition spec,
that will throw an error because it cannot resolve the field anymore.

Closes apache#5676
Closes apache#5707
Closes apache#5399
@rdblue rdblue closed this as completed in 3b65cca Sep 30, 2022
nastra pushed a commit to nastra/iceberg that referenced this issue Oct 4, 2022
If a fields is being deleted that used to be part of a partition spec,
that will throw an error because it cannot resolve the field anymore.

Closes apache#5676
Closes apache#5707
Closes apache#5399
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants