Doc: fix miscellaneous comments and add more details in release note for 0.11#2178
Doc: fix miscellaneous comments and add more details in release note for 0.11#2178rdblue merged 5 commits intoapache:masterfrom jackye1995:11-doc-patch
Conversation
site/docs/aws.md
Outdated
|
|
||
| As you can see, In the shell command, we use `--packages` to specify the additional AWS bundle and HTTP client dependencies with their version as `2.15.40`. | ||
|
|
||
| For integration with other engines such as Flink, please read their engine documentation pages that explain loading a custom catalog. |
There was a problem hiding this comment.
I think it should be "explains how to load".
site/docs/flink.md
Outdated
|
|
||
| ### Create through YAML config | ||
|
|
||
| Catalog can also be registered in `sql-client-defaults.yaml` before starting the SQL client. Here is an example: |
There was a problem hiding this comment.
Typo: should be "Catalogs can be ..."
site/docs/releases.md
Outdated
| * [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files | ||
| * [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in CachingCatalog | ||
| * [\#2091](https://github.com/apache/iceberg/pull/2091) fixes `ClassCastException` for type promotion `int` to `long` or `float` to `double` during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for `int` and `float` fields. | ||
| * [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties. |
There was a problem hiding this comment.
I don't think this is a bug because custom catalogs were not supported in 0.10.0. I wouldn't mention it here. These should be serious or correctness errors.
site/docs/releases.md
Outdated
| * [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in CachingCatalog | ||
| * [\#2091](https://github.com/apache/iceberg/pull/2091) fixes `ClassCastException` for type promotion `int` to `long` or `float` to `double` during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema for `int` and `float` fields. | ||
| * [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties. | ||
| * [\#2011](https://github.com/apache/iceberg/pull/2011) fixes equality comparison for `BaseSnapshot`. For engines such as Beam that serialize snapshots, now snapshots can be compared through Java equality operator. |
There was a problem hiding this comment.
I probably wouldn't include this. It was needed for beam, but it wasn't needed for other engines before now so it is not really a bug.
site/docs/releases.md
Outdated
| * [\#2031](https://github.com/apache/iceberg/pull/2031) fixes bug in Flink that custom catalog property causes catalog initialization failure. Now Flink catalog can support arbitrary custom catalog properties. | ||
| * [\#2011](https://github.com/apache/iceberg/pull/2011) fixes equality comparison for `BaseSnapshot`. For engines such as Beam that serialize snapshots, now snapshots can be compared through Java equality operator. | ||
| * [\#1998](https://github.com/apache/iceberg/pull/1998) fixes bug in `HiveTableOperation` that `unlock` is not called if new metadata cannot be deleted. Now it is guaranteed that `unlock` is always called for Hive catalog users. | ||
| * [\#1991](https://github.com/apache/iceberg/pull/1991) fixes Avro schema conversions to preserve field docs. Now field level documentation is also preserved when converting from Avro schemas to Iceberg schemas. |
There was a problem hiding this comment.
This may be a notable feature update, but is not a serious bug that we need to draw attention to.
site/docs/releases.md
Outdated
| * [\#1991](https://github.com/apache/iceberg/pull/1991) fixes Avro schema conversions to preserve field docs. Now field level documentation is also preserved when converting from Avro schemas to Iceberg schemas. | ||
| * [\#1981](https://github.com/apache/iceberg/pull/1981) fixes bug that date and timestamp transforms were producing incorrect values for dates and times before 1970. Before the fix, negative values were incorrectly transformed by date and timestamp transforms to 1 larger than the correct value. For example, `day(1969-12-31 10:00:00)` produced 0 instead of -1. The fix is backwards compatible, which means predicate projection can still work with the incorrectly transformed partitions written using older versions. | ||
| * [\#1979](https://github.com/apache/iceberg/pull/1979) fixes table listing failure in Hadoop catalog when user does not have permission to some tables. Now the tables with no permission are ignored in listing. | ||
| * [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files. |
There was a problem hiding this comment.
I don't think it is true that duplicate data files are ignored. I thought it just fixed the encryption map problem. Can you double-check this? Also, I think that the duplication would need to be in a single task. We don't guarantee deduplication in split planning.
There was a problem hiding this comment.
If duplicate exists, the old code path uses immutable map builder which would throw exception. That is why it is changed to hashMap.putIfAbsent() instead and duplicated location is ignored. Yes it is in a single task, let me add that.
site/docs/releases.md
Outdated
| * [\#1798](https://github.com/apache/iceberg/pull/1798) fixes read failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files. | ||
| * [\#1785](https://github.com/apache/iceberg/pull/1785) fixes invalidation of metadata tables in `CachingCatalog`. When a table is dropped, all the metadata tables associated with it are also invalidated in the cache. | ||
| * [\#1960](https://github.com/apache/iceberg/pull/1960) fixes bug that ORC writer does not read metrics config and always use the default. Now customized metrics config is respected. | ||
| * [\#1936](https://github.com/apache/iceberg/pull/1936) fixes parallelism setting in Flink. Before, the default Flink parallelism was used which cause performance issue or resource waste. Now the parallelism is set to the number of Iceberg read splits. |
There was a problem hiding this comment.
This is also a new feature, not a bug. I'm not sure whether I consider it notable or not.
site/docs/releases.md
Outdated
|
|
||
| Other notable changes: | ||
|
|
||
| * PrestoSQL is renamed to [Trino](https://trino.io/) |
There was a problem hiding this comment.
This isn't an Iceberg change, so I would remove it. It is also not the most important change so I would not put it first.
|
@rdblue All comments should now be addressed, please let me know any further comments, thank you! |
|
Thanks, @jackye1995! I'll deploy this now. |
@rdblue as we discussed in slack, add more details for bug fixes in 0.11. Also fix some typos and add more details in Flink as suggested by #2168