-
Notifications
You must be signed in to change notification settings - Fork 4.1k
ARROW-5826: [Website] Blog post for 0.14.0 release announcement #4819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
6b82e82
Start markdownifying 0.14.0 blog post
wesm f9653fa
Finish tweaking formatting, links
wesm 1544da1
Attribute community
wesm 205f222
Fix markup
kou e47d69a
Fix a typo
kou c5d9cdc
Add Sparse representation and compression
kou 4dc02a2
Add Packaging section
kou f4f208a
Update site/_posts/2019-07-08-0.14.0-release.md
wesm 079c88c
Add discussion links
wesm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,300 @@ | ||
| --- | ||
| layout: post | ||
| title: "Apache Arrow 0.14.0 Release" | ||
| date: "2019-07-02 00:00:00 -0600" | ||
| author: apache | ||
| categories: [release] | ||
| --- | ||
| <!-- | ||
| {% comment %} | ||
| Licensed to the Apache Software Foundation (ASF) under one or more | ||
| contributor license agreements. See the NOTICE file distributed with | ||
| this work for additional information regarding copyright ownership. | ||
| The ASF licenses this file to you under the Apache License, Version 2.0 | ||
| (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| {% endcomment %} | ||
| --> | ||
|
|
||
| The Apache Arrow team is pleased to announce the 0.14.0 release. This | ||
| covers 3 months of development work and includes [**602 resolved | ||
| issues**][1] from [**75 distinct contributors**][2]. See the Install | ||
| Page to learn how to get the libraries for your platform. The | ||
| [complete changelog][3] is also available. | ||
|
|
||
| This post will give some brief highlights in the project since the | ||
| 0.13.0 release from April. | ||
|
|
||
| ## New committers | ||
|
|
||
| Since the 0.13.0 release, the following have been added: | ||
|
|
||
| * [Neville Dipale][5] was added as a committer | ||
| * [François Saint-Jacques][6] was added as a committer | ||
| * [Praveen Kumar][7] was added as a committer | ||
|
|
||
| Thank you for all your contributions! | ||
|
|
||
| ## Upcoming 1.0.0 Format Stability Release | ||
|
|
||
| We are planning for our next major release to move from 0.14.0 to | ||
| 1.0.0. The major version number will indicate stability of the Arrow | ||
| columnar format and binary protocol. While the format has already been | ||
| stable since December 2017, we believe it is a good idea to make this | ||
| stability official and to indicate that it is safe to persist | ||
| serialized Arrow data in applications. This means that applications | ||
| will be able to safely upgrade to new Arrow versions without having to | ||
| worry about backwards incompatibilities. We will write in a future | ||
| blog post about the stability guarantees we intend to provide to help | ||
| application developers plan accordingly. | ||
|
|
||
| ## Packaging | ||
|
|
||
| We added support for the following platforms: | ||
|
|
||
| * Debian GNU/Linux buster | ||
| * Ubuntu 19.04 | ||
|
|
||
| We dropped support for Ubuntu 14.04. | ||
|
|
||
| ## Development Infrastructure and Tooling | ||
|
|
||
| As the project has grown larger and more diverse, we are increasingly | ||
| outgrowing what we can test in public continuous integration services | ||
| like Travis CI and Appveyor. In addition, we share these resources | ||
| with the entire Apache Software Foundation, and given the high volume | ||
| of pull requests into Apache Arrow, maintainers are frequently waiting | ||
| many hours for the green light to merge patches. | ||
|
|
||
| The complexity of our testing is driven by the number of different | ||
| components and programming languages as well as increasingly long | ||
| compilation and test execution times as individual libraries grow | ||
| larger. The 50 minute time limit of public CI services is simply too | ||
| limited to comprehensively test the project. Additionally, the CI host | ||
| machines are constrained in their features and memory limits, | ||
| preventing us from testing features that are only relevant on large | ||
| amounts of data (10GB or more) or functionality that requires a | ||
| CUDA-enabled GPU. | ||
|
|
||
| Organizations that contribute to Apache Arrow are working on physical | ||
| build infrastructure and tools to improve build times and build | ||
| scalability. One such new tool is `ursabot`, a GitHub-enabled bot | ||
| that can be used to trigger builds either on physical build or in the | ||
| cloud. It can also be used to trigger benchmark timing comparisons. If | ||
| you are contributing to the project, you may see Ursabot being | ||
| employed to trigger tests in pull requests. | ||
|
|
||
| To help assist with migrating away from Travis CI, we are also working | ||
| to make as many of our builds reproducible with Docker and not reliant | ||
| on Travis CI-specific configuration details. This will also help | ||
| contributors reproduce build failures locally without having to wait | ||
| for Travis CI. | ||
|
|
||
| ## Columnar Format Notes | ||
|
|
||
| * User-defined "extension" types have been formalized in the Arrow | ||
| format, enabling library users to embed custom data types in the | ||
| Arrow columnar format. Initial support is available in C++, Java, | ||
| and Python. | ||
| * A new Duration logical type was added to represent absolute lengths | ||
| of time. | ||
|
|
||
| ## Arrow Flight notes | ||
|
|
||
| Flight now supports many of the features of a complete RPC | ||
| framework. | ||
|
|
||
| * Authentication APIs are now supported across all languages (ARROW-5137) | ||
| * Encrypted communication using OpenSSL is supported (ARROW-5643, | ||
| ARROW-5529) | ||
| * Clients can specify timeouts on remote calls (ARROW-5136) | ||
| * On the protocol level, endpoints are now identified with URIs, to | ||
| support an open-ended number of potential transports (including TLS | ||
| and Unix sockets, and perhaps even non-gRPC-based transports in the | ||
| future) (ARROW-4651) | ||
| * Application-defined metadata can be sent alongside data (ARROW-4626, | ||
| ARROW-4627). | ||
|
|
||
| Windows is now a supported platform for Flight in C++ and Python | ||
| (ARROW-3294), and Python wheels are shipped for all languages | ||
| (ARROW-3150, ARROW-5656). C++, Python, and Java have been brought to | ||
| parity, now that actions can return streaming results in Java | ||
| (ARROW-5254). | ||
|
|
||
| ## C++ notes | ||
|
|
||
| 188 resolved issues related to the C++ implementation, so we summarize | ||
| some of the work here. | ||
|
|
||
| ### General platform improvements | ||
|
|
||
| * A FileSystem abstraction (ARROW-767) has been added, which paves the | ||
| way for a future Arrow Datasets library allowing to access sharded | ||
| data on arbitrary storage systems, including remote or cloud | ||
| storage. A first draft of the Datasets API was committed in | ||
| ARROW-5512. Right now, this comes with no implementation, but we | ||
| expect to slowly build it up in the coming weeks or months. Early | ||
| feedback is welcome on this API. | ||
| * The dictionary API has been reworked in ARROW-3144. The dictionary | ||
| values used to be tied to the DictionaryType instance, which ended | ||
| up too inflexible. Since dictionary-encoding is more often an | ||
| optimization than a semantic property of the data, we decided to | ||
| move the dictionary values to the ArrayData structure, making it | ||
| natural for dictionary-encoded arrays to share the same DataType | ||
| instance, regardless of the encoding details. | ||
| * The FixedSizeList and Map types have been implemented, including in | ||
| integration tests. The Map type is akin to a List of Struct(key, | ||
| value) entries, but making it explicit that the underlying data has | ||
| key-value mapping semantics. Also, map entries are always non-null. | ||
| * A `Result<T>` class has been introduced in ARROW-4800. The aim is to | ||
| allow to return an error as w ell as a function's logical result | ||
| without resorting to pointer-out arguments. | ||
| * The Parquet C++ library has been refactored to use common Arrow IO | ||
| classes for improved C++ platform interoperability. | ||
|
|
||
| ### Line-delimited JSON reader | ||
|
|
||
| A multithreaded line-delimited JSON reader (powered internally by | ||
| RapidJSON) is now available for use (also in Python and R via | ||
| bindings) . This will likely be expanded to support more kinds of JSON | ||
| storage in the future. | ||
|
|
||
| ### New computational kernels | ||
|
|
||
| A number of new computational kernels have been developed | ||
|
|
||
| * Compare filter for logical comparisons yielding boolean arrays | ||
| * Filter kernel for selecting elements of an input array according to | ||
| a boolean selection array. | ||
| * Take kernel, which selects elements by integer index, has been | ||
| expanded to support nested types | ||
|
|
||
| ## C# Notes | ||
|
|
||
| The native C# implementation has continued to mature since 0.13. This | ||
| release includes a number of performance, memory use, and usability | ||
| improvements. | ||
|
|
||
| ## Go notes | ||
|
|
||
| Go's support for the Arrow columnar format continues to expand. Go now | ||
| supports reading and writing the Arrow columnar binary protocol, and | ||
| it has also been **added to the cross language integration | ||
| tests**. There are now four languages (C++, Go, Java, and JavaScript) | ||
| included in our integration tests to verify cross-language | ||
| interoperability. | ||
|
|
||
| ## Java notes | ||
|
|
||
| * Support for referencing arbitrary memory using `ArrowBuf` has been | ||
| implemented, paving the way for memory map support in Java | ||
| * A number of performance improvements around vector value access were | ||
| added (see ARROW-5264, ARROW-5290). | ||
| * The Map type has been implemented in Java and integration tested | ||
| with C++ | ||
| * Several microbenchmarks have been added and improved. Including a | ||
| significant speed-up of zeroing out buffers. | ||
| * A new algorithms package has been started to contain reference | ||
| implementations of common algorithms. The initial contribution is | ||
| for Array/Vector sorting. | ||
|
|
||
| ## JavaScript Notes | ||
|
|
||
| A new incremental [array builder API][4] is available. | ||
|
|
||
| ## MATLAB Notes | ||
|
|
||
| Version 0.14.0 features improved Feather file support in the MEX bindings. | ||
|
|
||
| ## Python notes | ||
|
|
||
| * We fixed a problem with the Python wheels causing the Python wheels | ||
| to be much larger in 0.13.0 than they were in 0.12.0. Since the | ||
| introduction of LLVM into our build toolchain, the wheels are going | ||
| to still be significantly bigger. We are interested in approaches to | ||
| enable pyarrow to be installed in pieces with pip or conda rather | ||
| than monolithically. | ||
| * It is now possible to define ExtensionTypes with a Python | ||
| implementation (ARROW-840). Those ExtensionTypes can survive a | ||
| roundtrip through C++ and serialization. | ||
| * The Flight improvements highlighted above (see C++ notes) are all | ||
| available from Python. Furthermore, Flight is now bundled in our | ||
| binary wheels and conda packages for Linux, Windows and macOS | ||
| (ARROW-3150, ARROW-5656). | ||
| * We will build "manylinux2010" binary wheels for Linux systems, in | ||
| addition to "manylinux1" wheels (ARROW-2461). Manylinux2010 is a | ||
| newer standard for more recent systems, with less limiting toolchain | ||
| constraints. Installing manylinux2010 wheels requires an up-to-date | ||
| version of pip. | ||
| * Various bug fixes for CSV reading in Python and C++ including the | ||
| ability to parse Decimal(x, y) columns. | ||
|
|
||
| ### Parquet improvements | ||
|
|
||
| * Column statistics for logical types like unicode strings, unsigned | ||
| integers, and timestamps are casted to compatible Python types (see | ||
| ARROW-4139) | ||
| * It's now possible to configure "data page" sizes when writing a file | ||
| from Python | ||
|
|
||
| ## Ruby and C GLib notes | ||
|
|
||
| The GLib and Ruby bindings have been tracking features in the C++ | ||
| project. This release includes bindings for Gandiva, JSON reader, and | ||
| other C++ features. | ||
|
|
||
| ## Rust notes | ||
|
|
||
| There is ongoing work in Rust happening on Parquet file support, | ||
| computational kernels, and the DataFusion query engine. See the full | ||
| changelog for details. | ||
|
|
||
| ## R notes | ||
|
|
||
| We have been working on build and packaging for R so that community | ||
| members can hopefully release the project to CRAN in the near | ||
| future. Feature development for R has continued to follow the upstream | ||
| C++ project. | ||
|
|
||
| ## Community Discussions Ongoing | ||
|
|
||
| There are a number of active discussions ongoing on the developer | ||
| dev@arrow.apache.org mailing list. We look forward to hearing from the | ||
| community there: | ||
|
|
||
| * [Timing and scope of 1.0.0 release][15] | ||
| * [Solutions to increase continuous integration capacity][13] | ||
| * [A proposal for versioning and forward/backward compatibility | ||
| guarantees for the 1.0.0 release][8] was shared, not much discussion has | ||
| occurred yet. | ||
| * [Addressing possible unaligned access and undefined behavior concerns][9] | ||
| in the Arrow binary protocol | ||
| * [Supporting smaller than 128-bit encoding of fixed width decimals][10] | ||
| * [Forking the Avro C++ implementation][11] so as to adapt it to Arrow's | ||
| needs | ||
| * [Sparse representation and compression in Arrow][12] | ||
| * [Flight extensions: middleware API and generalized Put operations][14] | ||
|
|
||
| [1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.13.0 | ||
| [2]: https://arrow.apache.org/release/0.14.0.html#contributors | ||
| [3]: https://arrow.apache.org/release/0.14.0.html | ||
| [4]: https://github.com/apache/arrow/tree/master/js/src/builder | ||
| [5]: https://github.com/nevi-me | ||
| [6]: https://github.com/fsaintjacques | ||
| [7]: https://github.com/praveenbingo | ||
| [8]: https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8@%3Cdev.arrow.apache.org%3E | ||
| [9]: https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E | ||
| [10]: https://lists.apache.org/thread.html/31b00086c2991104bd71fb1a2173f32b4a2f569d8e7b5b41e836f3a3@%3Cdev.arrow.apache.org%3E | ||
| [11]: https://lists.apache.org/thread.html/97d78112ab583eecb155a7d78342c1063df65d64ec3ccfa0b18737c3@%3Cdev.arrow.apache.org%3E | ||
| [12]: https://lists.apache.org/thread.html/a99124e57c14c3c9ef9d98f3c80cfe1dd25496bf3ff7046778add937@%3Cdev.arrow.apache.org%3E | ||
| [13]: https://lists.apache.org/thread.html/96b2e22606e8a7b0ad7dc4aae16f232724d1059b34636676ed971d40@%3Cdev.arrow.apache.org%3E | ||
| [14]: https://lists.apache.org/thread.html/82a7c026ad18dbe9fdbcffa3560979aff6fd86dd56a49f40d9cfb46e@%3Cdev.arrow.apache.org%3E | ||
| [15]: https://lists.apache.org/thread.html/44a7a3d256ab5dbd62da6fe45b56951b435697426bf4adedb6520907@%3Cdev.arrow.apache.org%3E | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.