Skip to content

Doc: fix miscellaneous comments and add more details in release note for 0.11#2178

Merged
rdblue merged 5 commits intoapache:masterfrom
jackye1995:11-doc-patch
Jan 29, 2021
Merged

Doc: fix miscellaneous comments and add more details in release note for 0.11#2178
rdblue merged 5 commits intoapache:masterfrom
jackye1995:11-doc-patch

Conversation

@jackye1995
Copy link
Contributor

@rdblue as we discussed in slack, add more details for bug fixes in 0.11. Also fix some typos and add more details in Flink as suggested by #2168

@github-actions github-actions bot added the docs label Jan 28, 2021
site/docs/aws.md Outdated

As you can see, In the shell command, we use `--packages` to specify the additional AWS bundle and HTTP client dependencies with their version as `2.15.40`.

For integration with other engines such as Flink, please read their engine documentation pages that explain loading a custom catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be "explains how to load".


### Create through YAML config

Catalog can also be registered in `sql-client-defaults.yaml` before starting the SQL client. Here is an example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: should be "Catalogs can be ..."

* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files
* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in CachingCatalog
* [\#2091](https://github.com/apache/iceberg/pull/2091) fixes `ClassCastException` for type promotion `int` to `long` or `float` to `double` during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for `int` and `float` fields.
* [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a bug because custom catalogs were not supported in 0.10.0. I wouldn't mention it here. These should be serious or correctness errors.

* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in CachingCatalog
* [\#2091](https://github.com/apache/iceberg/pull/2091) fixes `ClassCastException` for type promotion `int` to `long` or `float` to `double` during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for `int` and `float` fields.
* [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties.
* [\#2011](https://github.com/apache/iceberg/pull/2011) fixes equality comparison for `BaseSnapshot`. For engines such as Beam that serialize snapshots, now snapshots can be compared through Java equality operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably wouldn't include this. It was needed for beam, but it wasn't needed for other engines before now so it is not really a bug.

* [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties.
* [\#2011](https://github.com/apache/iceberg/pull/2011) fixes equality comparison for `BaseSnapshot`. For engines such as Beam that serialize snapshots, now snapshots can be compared through Java equality operator.
* [\#1998](https://github.com/apache/iceberg/pull/1998) fixes bug in `HiveTableOperation` that `unlock` is not called if new metadata cannot be deleted. Now it is guaranteed that `unlock` is always called for Hive catalog users.
* [\#1991](https://github.com/apache/iceberg/pull/1991) fixes Avro schema conversions to preserve field docs. Now field level documentation is also preserved when converting from Avro schemas to Iceberg schemas.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a notable feature update, but is not a serious bug that we need to draw attention to.

* [\#1991](https://github.com/apache/iceberg/pull/1991) fixes Avro schema conversions to preserve field docs. Now field level documentation is also preserved when converting from Avro schemas to Iceberg schemas.
* [\#1981](https://github.com/apache/iceberg/pull/1981) fixes bug that date and timestamp transforms were producing incorrect values for dates and times before 1970. Before the fix, negative values were incorrectly transformed by date and timestamp transforms to 1 larger than the correct value. For example, `day(1969-12-31 10:00:00)` produced 0 instead of -1. The fix is backwards compatible, which means predicate projection can still work with the incorrectly transformed partitions written using older versions.
* [\#1979](https://github.com/apache/iceberg/pull/1979) fixes table listing failure in Hadoop catalog when user does not have permission to some tables. Now the tables with no permission are ignored in listing.
* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files.
Copy link
Contributor

@rdblue rdblue Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is true that duplicate data files are ignored. I thought it just fixed the encryption map problem. Can you double-check this? Also, I think that the duplication would need to be in a single task. We don't guarantee deduplication in split planning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If duplicate exists, the old code path uses immutable map builder which would throw exception. That is why it is changed to hashMap.putIfAbsent() instead and duplicated location is ignored. Yes it is in a single task, let me add that.

* [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files.
* [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in `CachingCatalog`. When a table is dropped, all the metadata tables associated with it are also invalidated in the cache.
* [\#1960](https://github.com/apache/iceberg/pull/1960) fixes bug that ORC writer does not read metrics config and always use the default. Now customized metrics config is respected.
* [\#1936](https://github.com/apache/iceberg/pull/1936) fixes parallelism setting in Flink. Before, the default Flink parallelism was used which cause performance issue or resource waste. Now the parallelism is set to the number of Iceberg read splits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a new feature, not a bug. I'm not sure whether I consider it notable or not.


Other notable changes:

* PrestoSQL is renamed to [Trino](https://trino.io/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an Iceberg change, so I would remove it. It is also not the most important change so I would not put it first.

@jackye1995
Copy link
Contributor Author

@rdblue All comments should now be addressed, please let me know any further comments, thank you!

@rdblue rdblue merged commit b4f73d2 into apache:master Jan 29, 2021
@rdblue
Copy link
Contributor

rdblue commented Jan 29, 2021

Thanks, @jackye1995! I'll deploy this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants