diff --git a/_posts/2024-04-20-16.0.0-release.md b/_posts/2024-04-20-16.0.0-release.md index 74a4843ea983..142344c5efef 100644 --- a/_posts/2024-04-20-16.0.0-release.md +++ b/_posts/2024-04-20-16.0.0-release.md @@ -55,6 +55,80 @@ Thanks for your contributions and participation in the project! - Java: tweak some options to give better performance ([GH-40475](https://github.com/apache/arrow/issues/40745), [GH-40039](https://github.com/apache/arrow/issues/40039)) ## C++ notes +For C++ notes refer to the full changelog. + +## Highlights + +- Initial support for the Azure Blob Storage has been added ([GH-18014](https://github.com/apache/arrow/issues/18014)). +- Arrow C++ can now be built with Emscripten ([GH-37821](https://github.com/apache/arrow/pull/37821)) which lays the foundation for running Arrow C++ under WASM runtimes and eventually [PyArrow](https://github.com/apache/arrow/pull/37822) as well. +- Arrow's filesystem modules have been separated out into individual libraries and this change enables writing and registering custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)). + +## Breaking Changes + +- `Function::is_impure` has been renamed to `is_pure` ([GH-40607](https://github.com/apache/arrow/issues/40607)). + +## Compute + +### Bug Fixes + +- Fixed a potential crash when accessing the `true_count` property on a BooleanArray ([GH-41016](https://github.com/apache/arrow/issues/41016)). + +### Performance improvements + +- Significantly improved performance of the take kernel on certain types of inputs ([GH-40207](https://github.com/apache/arrow/issues/40207)). + +### Enhancements + +- Support for casting to and from half-float (float16) has been added ([GH-20213](https://github.com/apache/arrow/issues/20213)). +- Added support for residual predicates to Swiss Join implementation ([GH-20339](https://github.com/apache/arrow/issues/20339)). +- Expanded support to primitive filter implementation for all fixed-width primitive types and take filter implementation for all well-known fixed-width types ([GH-39740](https://github.com/apache/arrow/issues/39740)). +- Added support for calling the `binary_slice` kernel on Fixed-Size Binary Arrays ([GH-39231](https://github.com/apache/arrow/issues/39231)). +- The cast kernel now supports casting from LargeString, Binary, and LargeBinary to Dictionary ([GH-39463](https://github.com/apache/arrow/issues/39463)). +- Fields of different decimal precision can now be used together in arithmetic operations without an explicit cast beforehand. ([GH-40126](https://github.com/apache/arrow/issues/40126)). + +## Datasets + +- Improved backpressure handling in the Dataset Writer which can significantly reduce memory usage for some use cases ([https://github.com/apache/arrow/pull/40722](https://github.com/apache/arrow/pull/40722)). + +## Parquet + +- Byte stream split encoding support has been added for FIXED_LEN_BYTE_ARRAY, INT32, and INT64 which enables this encoding for half-float (float16) and fixed-width decimal ([GH-39978](https://github.com/apache/arrow/issues/39978)). +- Decoding boolean values has been made faster for a variety of cases ([GH-40872](https://github.com/apache/arrow/issues/40872)). + +## Filesystems + +### New Features + +- In addition to building the individual filesystem implementations as separate modules, users can now write and register custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)). +- A new environment variable, `AWS_ENDPOINT_URL_S3`, has been added which allows separately overriding the endpoint for S3 operations alone ([GH-38663](https://github.com/apache/arrow/issues/38663)). + +### Bug Fixes + +- Fixed a bug in the S3 filesystem implementation that could cause a crash when deleting an object having duplicate forward slashes in its name ([GH-38821](https://github.com/apache/arrow/issues/38821)). +- Fixed a bug where `hash_mean` could silently overflow ([GH-38833](https://github.com/apache/arrow/issues/38833)). + +### Improvements + +- The S3 implementation now sets the content-type of directory-like objects to application/x-directory to improve compatibility with other S3 tools ([GH-38794](https://github.com/apache/arrow/issues/38794)). +- Repeated S3Client initialization is now roughly an order of magnitude faster ([GH-40299](https://github.com/apache/arrow/pull/40299)). +- The MemoryPoolStats implementation has been reworked to re-order loads and stores which may be an improvement for some allocation-heavy, multi-threaded applications ([GH-40783](https://github.com/apache/arrow/issues/40783)). + +### Substrait + +- Support has been added to Substrait for a variety of Arrow types ([GH-40695](https://github.com/apache/arrow/issues/40695)). +- substrait-cpp has been upgraded to 0.44 ([GH-40695](https://github.com/apache/arrow/issues/40695)). + +## Development + +- Added support the mold and lld linkers for building Arrow C++ ([GH-40394](https://github.com/apache/arrow/issues/40394), [GH-40400](https://github.com/apache/arrow/issues/40400)). + +### Miscellaneous + +- Upgraded ORC to 2.0.0 ([GH-40507](https://github.com/apache/arrow/issues/40507)). +- Upgraded zstd to 1.5.6 ([GH-40837](https://github.com/apache/arrow/pull/40837)). +- Upgraded google benchmark to 1.8.3 ([GH-39863](https://github.com/apache/arrow/issues/39863)). +- Upgraded zlib 1.3.1 ([GH-39876](https://github.com/apache/arrow/issues/39876)). +- Various ToString methods now support an optional `show_metadata` argument which will print metadata that may exist in nested types. ([GH-39864](https://github.com/apache/arrow/issues/39864)). ### Parquet