Docs: add flink and iceberg type compatibility #4865
Conversation
kbendick
left a comment
There was a problem hiding this comment.
Hi a @wuwenchi! Thanks for providing this compatibility matrix.
There’s also an iceberg-docs repo, github.com/apache/iceberg-docs. You might need to make a PR there as well. cc @samredai for visibility.
For types, I’m mostly sure this is correct but need to verify a few things. Also, as a follow up, I’d like to update the language in the “not supported” section to differentiate between what is not supported by Iceberg but possible in Flink and what is simply not presently possible in Flink. Possibly somebody will be inspired to help out over there or here!
docs/flink/flink-getting-started.md
Outdated
|
|
||
| ### Flink type to Iceberg type | ||
|
|
||
| This type conversion table describes how Spark types are converted to the Iceberg types. The conversion applies on both creating Iceberg table and writing to Iceberg table via Spark. |
There was a problem hiding this comment.
Nit: this mentions Spark several times. Assuming a copy past issue?
There was a problem hiding this comment.
=.= copy past issue... fix it.
docs/flink/flink-getting-started.md
Outdated
|
|
||
| ## Type compatibility | ||
|
|
||
| Flink and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations, |
There was a problem hiding this comment.
Question: for what combinations can Iceberg not do the conversion automatically? From the perspective of a new user, this might leave them with more questions than answers.
docs/flink/flink-getting-started.md
Outdated
|
|
||
| ## Type compatibility | ||
|
|
||
| Flink and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations, |
There was a problem hiding this comment.
Additionally, I would say they support “different” types. There are logic types (eg UUID), physical types (eg String or VARCHAR), and then what’s ultimately stored within the respective file types and how.
I’d focus on covering the first two here (logical and physical). And making notes of any peculiarities that are different within the matrix - for example, I believe that timestamp precision and what is and us not possible to store might be worth calling more attention to in the notes section of the matrix.
For this sentence, I’d say something more along the lines of “Iceberg’s type system is mapped to Flink’s type system. This type conversion can be done by Iceberg automatically, though the following cases need to be considered. See the notes section below for compatibility concerns and how to overcome them”.
Then I’d make a small section listing the combinations that don’t work or general discussion of the mounts of concern. As we’re migrating the docs somewhat to a new structure, so I think that would really help (even if you can’t list all cases etc)
Just a suggestion / jumping off point.
There was a problem hiding this comment.
Good suggestion! I modified the description and added related notes, so that we can see some details more intuitively.
After this PR is completed, I will provide a PR to iceberg-docs. |
Since this is in the versioned part of the docs site, there's no need to open a PR against the iceberg-docs repo. I do think we need a bonafide "getting started"/"quickstart" flink guide that gives a docker environment and let's the user fully run a demo environment--that would go in the iceberg-docs repo. On a separate note, I'm noticing there are a number of "Types Compatibility" subsections that are cropping up for various engines. I'm wondering if we should try and consolidate them to a single "Types Compatibility" page, or maybe append it to the Configurations page in PR #4801 (although it's not really 'configuration'). |
Good idea, look forward to this!
I prefer to add a new page to add this part. |
docs/flink/flink-getting-started.md
Outdated
| | float | float | | | ||
| | double | double | | | ||
| | date | date | | | ||
| | time | time | precision is fixed at 0 | |
There was a problem hiding this comment.
precision is 0? I would expect that to be 6 for microseconds.
There was a problem hiding this comment.
In flink, the precision of time is not supported and it default precision is 0:
in iceberg:
...
case TIME:
return new TimeType();
...
in flink:
public static final int DEFAULT_PRECISION = 0;
public TimeType() {
this(DEFAULT_PRECISION);
}
Of course, this is the behavior of flink, maybe we should not write this precision here.
There was a problem hiding this comment.
the precision of time is not supported
Then we should not mention it here.
docs/flink/flink-getting-started.md
Outdated
| | symbol | | Not supported | | ||
| | logical | | Not supported | | ||
|
|
||
| ### Iceberg type to Flink type |
There was a problem hiding this comment.
Should the type be removed here and be consistent with the above?
I have another question, like #4725, the modifications in the spec should not need to submit a PR in iceberg-doc, right? |
Yes that's right, the spec is copied over to iceberg-docs during the release process. |
docs/flink/flink-getting-started.md
Outdated
| | time | time | | | ||
| | timestamp without timezone | timestamp | precision is fixed at 6 | | ||
| | timestamp with timezone | timestamp_ltz | precision is fixed at 6 | | ||
| | string | varchar | length is fixed at 2<sup>31</sup>-1 | |
There was a problem hiding this comment.
Why not use varchar[2147483647]?
docs/flink/flink-getting-started.md
Outdated
| | timestamp without timezone | timestamp | precision is fixed at 6 | | ||
| | timestamp with timezone | timestamp_ltz | precision is fixed at 6 | | ||
| | string | varchar | length is fixed at 2<sup>31</sup>-1 | | ||
| | uuid | binary | length is fixed at 16 | |
docs/flink/flink-getting-started.md
Outdated
| | double | double | | | ||
| | date | date | | | ||
| | time | time | | | ||
| | timestamp without timezone | timestamp | precision is fixed at 6 | |
docs/flink/flink-getting-started.md
Outdated
| | string | varchar | length is fixed at 2<sup>31</sup>-1 | | ||
| | uuid | binary | length is fixed at 16 | | ||
| | fixed | binary | | | ||
| | binary | varbinary | width is fixed at 2<sup>31</sup>-1 | |
docs/flink/flink-getting-started.md
Outdated
| | uuid | binary | length is fixed at 16 | | ||
| | fixed | binary | | | ||
| | binary | varbinary | width is fixed at 2<sup>31</sup>-1 | | ||
| | decimal | decimal | precision and scale are the same as the original | |
There was a problem hiding this comment.
decimal(P, S) <=> decimal(P, S)?
There was a problem hiding this comment.
done, looks more clear
|
@wuwenchi, can you rebase? |
|
Just FYI that there are no longer any subdirectories, so this markdown file is located at docs/flink-getting-started.md |
1. Delete time's notes 2. Delete the type to be consistent with the above
fix by review
39f7241 to
556efc4
Compare
|
Thanks, @wuwenchi! |
Co-authored-by: 吴文池 <wuwenchi@deepexi.com>
Co-authored-by: 吴文池 <wuwenchi@deepexi.com>
Co-authored-by: 吴文池 <wuwenchi@deepexi.com>
Add flink and iceberg type compatibility.
Can you help review it? thanks !
@rdblue @openinx @kbendick
Closes #4864