Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] columnMapping readerFeatures is missing when icebergCompatV1(2) is enabled #3154

Open
2 of 8 tasks
ebyhr opened this issue May 24, 2024 · 6 comments
Open
2 of 8 tasks
Labels
bug Something isn't working

Comments

@ebyhr
Copy link
Contributor

ebyhr commented May 24, 2024

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Describe the problem

Writer Requirements for Column Mapping mentions:

write a protocol action to add the feature columnMapping to both readerFeatures and writerFeatures

However, protocol entry doesn't have columnMapping in readerFeatures field.

Steps to reproduce

  1. Create a new table with icebergCompatV1 feature:
CREATE TABLE default.test_iceberg
(a INT)
USING DELTA
LOCATION 's3://test-bucket/test_iceberg'
TBLPROPERTIES ('delta.enableIcebergCompatV1'=true);
  1. Confirm transaction log:
{"commitInfo":{"timestamp":1716541145876,"operation":"CREATE TABLE","operationParameters":{"isManaged":"false","description":null,"partitionBy":"[]","properties":"{\"delta.enableIcebergCompatV1\":\"true\",\"delta.columnMapping.mode\":\"name\",\"delta.columnMapping.maxColumnId\":\"1\"}"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{},"engineInfo":"Apache-Spark/3.5.0 Delta-Lake/3.1.0","txnId":"b172aa56-9393-4750-ac30-2c1996497efb"}}
{"metaData":{"id":"8d56ae44-1648-4fd5-a920-81f5f0725ab5","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"a\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":1,\"delta.columnMapping.physicalName\":\"col-39727da2-bc31-4d96-a0a9-b3f16113fde2\"}}]}","partitionColumns":[],"configuration":{"delta.enableIcebergCompatV1":"true","delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"1"},"createdTime":1716541145862}}
{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV1"]}}

Observed results

Expected results

icebergCompatV1 or icebergCompatV2 exists in readerFeatures.

Further details

Environment information

  • Delta Lake version: 3.1.0
  • Spark version: 3.5.0
  • Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@vkorukanti
Copy link
Collaborator

cc. @lzlfred @harperjiang

@ebyhr
Copy link
Contributor Author

ebyhr commented Jun 5, 2024

@vkorukanti @lzlfred @harperjiang Could you let me know whether this is incorrect documentation or implementation bug?

@harperjiang
Copy link
Contributor

harperjiang commented Jun 5, 2024

@ebyhr the document said that

If the table is on Writer Version 5 or 6: write a metaData action to add the delta.columnMapping.mode table property;
If the table is on Writer Version 7:
write a protocol action to add the feature columnMapping to both readerFeatures and writerFeatures

I believe the behavior you observed is just as described in the doc when we have writer version 5/6, and thus don't consider it a bug.

@ebyhr
Copy link
Contributor Author

ebyhr commented Jun 5, 2024

The writer version is 7 and it mentions "add the feature columnMapping to both readerFeatures and writerFeatures".

{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV1"]}}

Then, I would recommend updating the protocol. I expect readerFeatures always have columnMapping when it exists in writerFeatures from the current sentence.

@felipepessoto
Copy link
Contributor

felipepessoto commented Jun 6, 2024

The writer version is 7 and it mentions "add the feature columnMapping to both readerFeatures and writerFeatures".

{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV1"]}}

Then, I would recommend updating the protocol. I expect readerFeatures always have columnMapping when it exists in writerFeatures from the current sentence.

I think we need to update the protocol to make that paragraph clearer.

When reader version is 2 the readerFeatures doesn't exist, and if it does, it would break the protocol:

For new tables, when a new table is created with a Reader Version up to 2 and Writer Version 7, its protocol action must only contain writerFeatures.

@felipepessoto
Copy link
Contributor

felipepessoto commented Jun 7, 2024

Actually, after running some experiment I realized Spark Delta doesn't allow you to enable column mapping if reader is version 2 and writer is version 7.

But it allows you to create it, so it is inconsistent, I can't say what is the correct behavior, my guess is supporting reader 2 + writer 7 (with columnMapping writerFeatures) should be the considered compliant with spec.

ALTER TABLE Test SET TBLPROPERTIES (
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '7'
  );

This fails:

ALTER TABLE Test SET TBLPROPERTIES (
    'delta.columnMapping.mode' = 'name'
  )
Your current table protocol version does not support changing column mapping modes
using delta.columnMapping.mode.

Required Delta protocol version for column mapping:
Protocol(3,7,[columnMapping],[columnMapping])
Your table's current Delta protocol version:
Protocol(2,7,None,[appendOnly,invariants])

Please enable Column Mapping on your Delta table with mapping mode 'name'.
You can use one of the following commands.

If your table is already on the required protocol version:
ALTER TABLE table_name SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name')

If your table is not on the required protocol version and requires a protocol upgrade:
ALTER TABLE table_name SET TBLPROPERTIES (
   'delta.columnMapping.mode' = 'name',
   'delta.minReaderVersion' = '3',
   'delta.minWriterVersion' = '7')

This works:

CREATE TABLE Test (Id INT) USING DELTA TBLPROPERTIES ('delta.minWriterVersion' = '7', 'delta.columnMapping.mode' = 'name');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants