Skip to content

Commit

Permalink
Update _posts/2024-04-20-16.0.0-release.md
Browse files Browse the repository at this point in the history
Co-authored-by: Bryce Mecum <petridish@gmail.com>
  • Loading branch information
raulcd and amoeba committed Apr 29, 2024
1 parent 4cb3eb1 commit 68499a4
Showing 1 changed file with 74 additions and 0 deletions.
74 changes: 74 additions & 0 deletions _posts/2024-04-20-16.0.0-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,80 @@ Thanks for your contributions and participation in the project!
- Java: tweak some options to give better performance ([GH-40475](https://github.com/apache/arrow/issues/40745), [GH-40039](https://github.com/apache/arrow/issues/40039))

## C++ notes
For C++ notes refer to the full changelog.

## Highlights

- Initial support for the Azure Blob Storage has been added ([GH-18014](https://github.com/apache/arrow/issues/18014)).
- Arrow C++ can now be built with Emscripten ([GH-37821](https://github.com/apache/arrow/pull/37821)) which lays the foundation for running Arrow C++ under WASM runtimes and eventually [PyArrow](https://github.com/apache/arrow/pull/37822) as well.
- Arrow's filesystem modules have been separated out into individual libraries and this change enables writing and registering custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)).

## Breaking Changes

- `Function::is_impure` has been renamed to `is_pure` ([GH-40607](https://github.com/apache/arrow/issues/40607)).

## Compute

### Bug Fixes

- Fixed a potential crash when accessing the `true_count` property on a BooleanArray ([GH-41016](https://github.com/apache/arrow/issues/41016)).

### Performance improvements

- Significantly improved performance of the take kernel on certain types of inputs ([GH-40207](https://github.com/apache/arrow/issues/40207)).

### Enhancements

- Support for casting to and from half-float (float16) has been added ([GH-20213](https://github.com/apache/arrow/issues/20213)).
- Added support for residual predicates to Swiss Join implementation ([GH-20339](https://github.com/apache/arrow/issues/20339)).
- Expanded support to primitive filter implementation for all fixed-width primitive types and take filter implementation for all well-known fixed-width types ([GH-39740](https://github.com/apache/arrow/issues/39740)).
- Added support for calling the `binary_slice` kernel on Fixed-Size Binary Arrays ([GH-39231](https://github.com/apache/arrow/issues/39231)).
- The cast kernel now supports casting from LargeString, Binary, and LargeBinary to Dictionary ([GH-39463](https://github.com/apache/arrow/issues/39463)).
- Fields of different decimal precision can now be used together in arithmetic operations without an explicit cast beforehand. ([GH-40126](https://github.com/apache/arrow/issues/40126)).

## Datasets

- Improved backpressure handling in the Dataset Writer which can significantly reduce memory usage for some use cases ([https://github.com/apache/arrow/pull/40722](https://github.com/apache/arrow/pull/40722)).

## Parquet

- Byte stream split encoding support has been added for FIXED_LEN_BYTE_ARRAY, INT32, and INT64 which enables this encoding for half-float (float16) and fixed-width decimal ([GH-39978](https://github.com/apache/arrow/issues/39978)).
- Decoding boolean values has been made faster for a variety of cases ([GH-40872](https://github.com/apache/arrow/issues/40872)).

## Filesystems

### New Features

- In addition to building the individual filesystem implementations as separate modules, users can now write and register custom filesystem implementations ([GH-38309](https://github.com/apache/arrow/issues/38309)).
- A new environment variable, `AWS_ENDPOINT_URL_S3`, has been added which allows separately overriding the endpoint for S3 operations alone ([GH-38663](https://github.com/apache/arrow/issues/38663)).

### Bug Fixes

- Fixed a bug in the S3 filesystem implementation that could cause a crash when deleting an object having duplicate forward slashes in its name ([GH-38821](https://github.com/apache/arrow/issues/38821)).
- Fixed a bug where `hash_mean` could silently overflow ([GH-38833](https://github.com/apache/arrow/issues/38833)).

### Improvements

- The S3 implementation now sets the content-type of directory-like objects to application/x-directory to improve compatibility with other S3 tools ([GH-38794](https://github.com/apache/arrow/issues/38794)).
- Repeated S3Client initialization is now roughly an order of magnitude faster ([GH-40299](https://github.com/apache/arrow/pull/40299)).
- The MemoryPoolStats implementation has been reworked to re-order loads and stores which may be an improvement for some allocation-heavy, multi-threaded applications ([GH-40783](https://github.com/apache/arrow/issues/40783)).

### Substrait

- Support has been added to Substrait for a variety of Arrow types ([GH-40695](https://github.com/apache/arrow/issues/40695)).
- substrait-cpp has been upgraded to 0.44 ([GH-40695](https://github.com/apache/arrow/issues/40695)).

## Development

- Added support the mold and lld linkers for building Arrow C++ ([GH-40394](https://github.com/apache/arrow/issues/40394), [GH-40400](https://github.com/apache/arrow/issues/40400)).

### Miscellaneous

- Upgraded ORC to 2.0.0 ([GH-40507](https://github.com/apache/arrow/issues/40507)).
- Upgraded zstd to 1.5.6 ([GH-40837](https://github.com/apache/arrow/pull/40837)).
- Upgraded google benchmark to 1.8.3 ([GH-39863](https://github.com/apache/arrow/issues/39863)).
- Upgraded zlib 1.3.1 ([GH-39876](https://github.com/apache/arrow/issues/39876)).
- Various ToString methods now support an optional `show_metadata` argument which will print metadata that may exist in nested types. ([GH-39864](https://github.com/apache/arrow/issues/39864)).


### Parquet
Expand Down

0 comments on commit 68499a4

Please sign in to comment.