Doc: fix miscellaneous comments and add more details in release note for 0.11 by jackye1995 · Pull Request #2178 · apache/iceberg

jackye1995 · 2021-01-28T22:37:19Z

@rdblue as we discussed in slack, add more details for bug fixes in 0.11. Also fix some typos and add more details in Flink as suggested by #2168

…for 0.11

rdblue · 2021-01-28T22:47:03Z

site/docs/aws.md


 As you can see, In the shell command, we use `--packages` to specify the additional AWS bundle and HTTP client dependencies with their version as `2.15.40`.

+For integration with other engines such as Flink, please read their engine documentation pages that explain loading a custom catalog. 


I think it should be "explains how to load".

rdblue · 2021-01-28T22:47:41Z

site/docs/flink.md


+### Create through YAML config
+
+Catalog can also be registered in `sql-client-defaults.yaml` before starting the SQL client. Here is an example:


Typo: should be "Catalogs can be ..."

rdblue · 2021-01-28T22:49:05Z

site/docs/releases.md

-* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files
-* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in CachingCatalog
+* [\#2091](https://github.com/apache/iceberg/pull/2091) fixes `ClassCastException` for type promotion `int` to `long` or `float` to `double` during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for `int` and `float` fields.
+* [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties.


I don't think this is a bug because custom catalogs were not supported in 0.10.0. I wouldn't mention it here. These should be serious or correctness errors.

rdblue · 2021-01-28T22:49:45Z

site/docs/releases.md

-* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in CachingCatalog
+* [\#2091](https://github.com/apache/iceberg/pull/2091) fixes `ClassCastException` for type promotion `int` to `long` or `float` to `double` during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for `int` and `float` fields.
+* [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties.
+* [\#2011](https://github.com/apache/iceberg/pull/2011) fixes equality comparison for `BaseSnapshot`. For engines such as Beam that serialize snapshots, now snapshots can be compared through Java equality operator.


I probably wouldn't include this. It was needed for beam, but it wasn't needed for other engines before now so it is not really a bug.

rdblue · 2021-01-28T22:50:40Z

site/docs/releases.md

+* [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties.
+* [\#2011](https://github.com/apache/iceberg/pull/2011) fixes equality comparison for `BaseSnapshot`. For engines such as Beam that serialize snapshots, now snapshots can be compared through Java equality operator.
+* [\#1998](https://github.com/apache/iceberg/pull/1998) fixes bug in `HiveTableOperation` that `unlock` is not called if new metadata cannot be deleted. Now it is guaranteed that `unlock` is always called for Hive catalog users.
+* [\#1991](https://github.com/apache/iceberg/pull/1991) fixes Avro schema conversions to preserve field docs. Now field level documentation is also preserved when converting from Avro schemas to Iceberg schemas.


This may be a notable feature update, but is not a serious bug that we need to draw attention to.

rdblue · 2021-01-28T22:51:52Z

site/docs/releases.md

+* [\#1991](https://github.com/apache/iceberg/pull/1991) fixes Avro schema conversions to preserve field docs. Now field level documentation is also preserved when converting from Avro schemas to Iceberg schemas.
+* [\#1981](https://github.com/apache/iceberg/pull/1981) fixes bug that date and timestamp transforms were producing incorrect values for dates and times before 1970. Before the fix, negative values were incorrectly transformed by date and timestamp transforms to 1 larger than the correct value. For example, `day(1969-12-31 10:00:00)` produced 0 instead of -1. The fix is backwards compatible, which means predicate projection can still work with the incorrectly transformed partitions written using older versions.
+* [\#1979](https://github.com/apache/iceberg/pull/1979) fixes table listing failure in Hadoop catalog when user does not have permission to some tables. Now the tables with no permission are ignored in listing.
+* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files.


I don't think it is true that duplicate data files are ignored. I thought it just fixed the encryption map problem. Can you double-check this? Also, I think that the duplication would need to be in a single task. We don't guarantee deduplication in split planning.

If duplicate exists, the old code path uses immutable map builder which would throw exception. That is why it is changed to hashMap.putIfAbsent() instead and duplicated location is ignored. Yes it is in a single task, let me add that.

site/docs/releases.md

rdblue · 2021-01-28T22:53:48Z

site/docs/releases.md

+* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files.
+* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in `CachingCatalog`. When a table is dropped, all the metadata tables associated with it are also invalidated in the cache.
+* [\#1960](https://github.com/apache/iceberg/pull/1960) fixes bug that ORC writer does not read metrics config and always use the default. Now customized metrics config is respected.
+* [\#1936](https://github.com/apache/iceberg/pull/1936) fixes parallelism setting in Flink. Before, the default Flink parallelism was used which cause performance issue or resource waste. Now the parallelism is set to the number of Iceberg read splits. 


This is also a new feature, not a bug. I'm not sure whether I consider it notable or not.

rdblue · 2021-01-28T22:54:14Z

site/docs/releases.md


 Other notable changes:

+* PrestoSQL is renamed to [Trino](https://trino.io/)


This isn't an Iceberg change, so I would remove it. It is also not the most important change so I would not put it first.

site/docs/releases.md

jackye1995 · 2021-01-28T23:24:11Z

@rdblue All comments should now be addressed, please let me know any further comments, thank you!

rdblue · 2021-01-29T01:19:31Z

Thanks, @jackye1995! I'll deploy this now.

github-actions bot added the docs label Jan 28, 2021

Jack Ye added 2 commits January 28, 2021 14:42

Doc: fix miscellaneous comments and add more details in release note …

021623c

…for 0.11

fix typo

05d0ff0

rdblue reviewed Jan 28, 2021

View reviewed changes

site/docs/releases.md Show resolved Hide resolved

rdblue reviewed Jan 28, 2021

View reviewed changes

site/docs/releases.md Show resolved Hide resolved

Jack Ye added 3 commits January 28, 2021 15:09

fix based on comments

50c82b8

update comments

3c7ac32

update comments

c893431

rdblue approved these changes Jan 29, 2021

View reviewed changes

rdblue merged commit b4f73d2 into apache:master Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: fix miscellaneous comments and add more details in release note for 0.11#2178

Doc: fix miscellaneous comments and add more details in release note for 0.11#2178
rdblue merged 5 commits intoapache:masterfrom
jackye1995:11-doc-patch

jackye1995 commented Jan 28, 2021

Uh oh!

rdblue Jan 28, 2021

Uh oh!

rdblue Jan 28, 2021

Uh oh!

rdblue Jan 28, 2021

Uh oh!

rdblue Jan 28, 2021

Uh oh!

rdblue Jan 28, 2021

Uh oh!

rdblue Jan 28, 2021 •

edited

Loading

Uh oh!

jackye1995 Jan 28, 2021

Uh oh!

Uh oh!

rdblue Jan 28, 2021

Uh oh!

rdblue Jan 28, 2021

Uh oh!

Uh oh!

jackye1995 commented Jan 28, 2021

Uh oh!

rdblue commented Jan 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		As you can see, In the shell command, we use `--packages` to specify the additional AWS bundle and HTTP client dependencies with their version as `2.15.40`.

		For integration with other engines such as Flink, please read their engine documentation pages that explain loading a custom catalog.


		### Create through YAML config

		Catalog can also be registered in `sql-client-defaults.yaml` before starting the SQL client. Here is an example:


		Other notable changes:

		* PrestoSQL is renamed to [Trino](https://trino.io/)

Conversation

jackye1995 commented Jan 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jackye1995 commented Jan 28, 2021

Uh oh!

rdblue commented Jan 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rdblue Jan 28, 2021 •

edited

Loading