Skip to content

Docs: add flink and iceberg type compatibility #4865

Merged
rdblue merged 5 commits intoapache:masterfrom
wuwenchi:deepnova_doc_add_flink_iceberg_type
Jul 3, 2022
Merged

Docs: add flink and iceberg type compatibility #4865
rdblue merged 5 commits intoapache:masterfrom
wuwenchi:deepnova_doc_add_flink_iceberg_type

Conversation

@wuwenchi
Copy link
Contributor

Add flink and iceberg type compatibility.

Can you help review it? thanks !
@rdblue @openinx @kbendick

Closes #4864

@github-actions github-actions bot added the docs label May 25, 2022
Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi a @wuwenchi! Thanks for providing this compatibility matrix.

There’s also an iceberg-docs repo, github.com/apache/iceberg-docs. You might need to make a PR there as well. cc @samredai for visibility.

For types, I’m mostly sure this is correct but need to verify a few things. Also, as a follow up, I’d like to update the language in the “not supported” section to differentiate between what is not supported by Iceberg but possible in Flink and what is simply not presently possible in Flink. Possibly somebody will be inspired to help out over there or here!


### Flink type to Iceberg type

This type conversion table describes how Spark types are converted to the Iceberg types. The conversion applies on both creating Iceberg table and writing to Iceberg table via Spark.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this mentions Spark several times. Assuming a copy past issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=.= copy past issue... fix it.


## Type compatibility

Flink and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: for what combinations can Iceberg not do the conversion automatically? From the perspective of a new user, this might leave them with more questions than answers.


## Type compatibility

Flink and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I would say they support “different” types. There are logic types (eg UUID), physical types (eg String or VARCHAR), and then what’s ultimately stored within the respective file types and how.

I’d focus on covering the first two here (logical and physical). And making notes of any peculiarities that are different within the matrix - for example, I believe that timestamp precision and what is and us not possible to store might be worth calling more attention to in the notes section of the matrix.

For this sentence, I’d say something more along the lines of “Iceberg’s type system is mapped to Flink’s type system. This type conversion can be done by Iceberg automatically, though the following cases need to be considered. See the notes section below for compatibility concerns and how to overcome them”.

Then I’d make a small section listing the combinations that don’t work or general discussion of the mounts of concern. As we’re migrating the docs somewhat to a new structure, so I think that would really help (even if you can’t list all cases etc)

Just a suggestion / jumping off point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I modified the description and added related notes, so that we can see some details more intuitively.

@wuwenchi
Copy link
Contributor Author

There’s also an iceberg-docs repo, github.com/apache/iceberg-docs.

After this PR is completed, I will provide a PR to iceberg-docs.

@samredai
Copy link
Contributor

There’s also an iceberg-docs repo, github.com/apache/iceberg-docs. You might need to make a PR there as well.

Since this is in the versioned part of the docs site, there's no need to open a PR against the iceberg-docs repo. I do think we need a bonafide "getting started"/"quickstart" flink guide that gives a docker environment and let's the user fully run a demo environment--that would go in the iceberg-docs repo.

On a separate note, I'm noticing there are a number of "Types Compatibility" subsections that are cropping up for various engines. I'm wondering if we should try and consolidate them to a single "Types Compatibility" page, or maybe append it to the Configurations page in PR #4801 (although it's not really 'configuration').

@wuwenchi
Copy link
Contributor Author

@samredai

I do think we need a bonafide "getting started"/"quickstart" flink guide that gives a docker environment and let's the user fully run a demo environment--that would go in the iceberg-docs repo.

Good idea, look forward to this!

I'm wondering if we should try and consolidate them to a single "Types Compatibility" page, or maybe append it to the Configurations page

I prefer to add a new page to add this part.

| float | float | |
| double | double | |
| date | date | |
| time | time | precision is fixed at 0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision is 0? I would expect that to be 6 for microseconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In flink, the precision of time is not supported and it default precision is 0:

in iceberg:
...
      case TIME:
        return new TimeType();
...


in flink:
public static final int DEFAULT_PRECISION = 0;

public TimeType() {
        this(DEFAULT_PRECISION);
    }

Of course, this is the behavior of flink, maybe we should not write this precision here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the precision of time is not supported

Then we should not mention it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| symbol | | Not supported |
| logical | | Not supported |

### Iceberg type to Flink type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the type be removed here and be consistent with the above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wuwenchi
Copy link
Contributor Author

Since this is in the versioned part of the docs site, there's no need to open a PR against the iceberg-docs repo.

I have another question, like #4725, the modifications in the spec should not need to submit a PR in iceberg-doc, right?

@samredai
Copy link
Contributor

I have another question, like #4725, the modifications in the spec should not need to submit a PR in iceberg-doc, right?

Yes that's right, the spec is copied over to iceberg-docs during the release process.

| time | time | |
| timestamp without timezone | timestamp | precision is fixed at 6 |
| timestamp with timezone | timestamp_ltz | precision is fixed at 6 |
| string | varchar | length is fixed at 2<sup>31</sup>-1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use varchar[2147483647]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| timestamp without timezone | timestamp | precision is fixed at 6 |
| timestamp with timezone | timestamp_ltz | precision is fixed at 6 |
| string | varchar | length is fixed at 2<sup>31</sup>-1 |
| uuid | binary | length is fixed at 16 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary(16)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| double | double | |
| date | date | |
| time | time | |
| timestamp without timezone | timestamp | precision is fixed at 6 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp(6)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| string | varchar | length is fixed at 2<sup>31</sup>-1 |
| uuid | binary | length is fixed at 16 |
| fixed | binary | |
| binary | varbinary | width is fixed at 2<sup>31</sup>-1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

varbinary(2147483647)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| uuid | binary | length is fixed at 16 |
| fixed | binary | |
| binary | varbinary | width is fixed at 2<sup>31</sup>-1 |
| decimal | decimal | precision and scale are the same as the original |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal(P, S) <=> decimal(P, S)?

Copy link
Contributor Author

@wuwenchi wuwenchi Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, looks more clear

@rdblue
Copy link
Contributor

rdblue commented Jun 29, 2022

@wuwenchi, can you rebase?

@rdblue rdblue closed this Jun 29, 2022
@rdblue rdblue reopened this Jun 29, 2022
@samredai
Copy link
Contributor

Just FYI that there are no longer any subdirectories, so this markdown file is located at docs/flink-getting-started.md

@wuwenchi wuwenchi force-pushed the deepnova_doc_add_flink_iceberg_type branch from 39f7241 to 556efc4 Compare June 30, 2022 01:58
@wuwenchi
Copy link
Contributor Author

@samredai ok, thanks! @rdblue done

@rdblue rdblue merged commit 81dec35 into apache:master Jul 3, 2022
@rdblue
Copy link
Contributor

rdblue commented Jul 3, 2022

Thanks, @wuwenchi!

dungdm93 pushed a commit to dungdm93/iceberg that referenced this pull request Jul 4, 2022
Co-authored-by: 吴文池 <wuwenchi@deepexi.com>
@wuwenchi wuwenchi deleted the deepnova_doc_add_flink_iceberg_type branch July 8, 2022 01:48
namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
Co-authored-by: 吴文池 <wuwenchi@deepexi.com>
namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
Co-authored-by: 吴文池 <wuwenchi@deepexi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't find type compatibility for flink in document

4 participants