From 6b82e825f2a2520503582d3871f818ea3a3b158e Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Sun, 7 Jul 2019 17:30:30 -0500 Subject: [PATCH 1/9] Start markdownifying 0.14.0 blog post --- site/_data/contributors.yml | 5 +- site/_posts/2019-07-08-0.14.0-release.md | 230 +++++++++++++++++++++++ 2 files changed, 234 insertions(+), 1 deletion(-) create mode 100644 site/_posts/2019-07-08-0.14.0-release.md diff --git a/site/_data/contributors.yml b/site/_data/contributors.yml index 95b18b2f0942..185a565abf60 100644 --- a/site/_data/contributors.yml +++ b/site/_data/contributors.yml @@ -16,10 +16,13 @@ # Database of contributors to Apache Arrow (WIP) # Blogs and other pages use this data # +- name: Apache Arrow Community + githubId: apache + homepage: https://arrow.apache.org - name: Wes McKinney apacheId: wesm githubId: wesm - homepage: http://wesmckinney.com + homepage: https://wesmckinney.com role: PMC - name: Uwe Korn apacheId: uwe diff --git a/site/_posts/2019-07-08-0.14.0-release.md b/site/_posts/2019-07-08-0.14.0-release.md new file mode 100644 index 000000000000..ab2effb4b35d --- /dev/null +++ b/site/_posts/2019-07-08-0.14.0-release.md @@ -0,0 +1,230 @@ +--- +layout: post +title: "Apache Arrow 0.14.0 Release" +date: "2019-07-02 00:00:00 -0600" +author: wesm +categories: [release] +--- + + +The Apache Arrow team is pleased to announce the 0.14.0 release. This +covers 3 months of development work and includes 602 resolved issues +from X distinct contributors. See the Install Page to learn how to +get the libraries for your platform. The complete changelog is also +available. + +While it's a large release, this post will give some brief highlights +in the project since the 0.13.0 release from April. + +## New committers + +Since the 0.13.0 release, the following have been added: + +* Neville Dipale was added as a committer +* François Saint-Jacques was added as a committer +* Praveen Kumar was added as a committer + +Thank you for all your contributions! + +## Upcoming 1.0.0 Format Stability Release + +We are planning for our next major release to move from 0.14.0 to +1.0.0. The major version number will indicate stability of the Arrow +columnar format and binary protocol. While the format has already been +stable since December 2017, we believe it is a good idea to make this +stability official and to indicate that it is safe to persist +serialized Arrow data in applications. This means that applications +will be able to safely upgrade to new Arrow versions without having to +worry about backwards incompatibilities. We will write in a future +blog post about the stability guarantees we intend to provide to help +application developers plan accordingly. + +## Development Infrastructure and Tooling + +As the project has grown larger and more diverse, we are increasingly +outgrowing what we can test in public continuous integration services +like Travis CI and Appveyor. In addition, we share these resources +with the entire Apache Software Foundation, and given the high volume +of pull requests into Apache Arrow, maintainers are frequently waiting +many hours for the green light to merge patches. + +The complexity of our testing is driven by the number of different +components and programming languages as well as increasingly long +compilation and test execution times as individual libraries grow +larger. The 50 minute time limit of public CI services is simply too +limited to comprehensively test the project. Additionally, the CI host +machines are constrained in their features and memory limits, +preventing us from testing features that are only relevant on large +amounts of data (10GB or more) or functionality that requires a +CUDA-enabled GPU. + +Organizations that contribute to Apache Arrow are working on physical +build infrastructure and tools to improve build times and build +scalability. One such new tool is ``ursabot``, a GitHub-enabled bot +that can be used to trigger builds either on physical build or in the +cloud. It can also be used to trigger benchmark timing comparisons. + +To help assist with migrating away from Travis CI, we are also working +to make as many of our builds reproducible with Docker and not reliant +on Travis CI-specific configuration details. This will also help +contributors reproduce build failures locally without having to wait +for Travis CI. + +## Columnar Format Notes + +* User-defined “extension” types have been formalized in the Arrow + format, enabling library users to embed custom data types in the + Arrow columnar format. Initial support is available in C++, Java, + and Python. +* A new Duration logical type was added to represent absolute lengths + of time. + +## Arrow Flight notes + +Flight now supports many of the features of a complete RPC +framework. Authentication APIs are now supported across all languages +(ARROW-5137), as is encrypted communication using OpenSSL (ARROW-5643, +ARROW-5529), and clients can specify timeouts on remote calls +(ARROW-5136). On the protocol level, endpoints are now identified with +URIs, to support an open-ended number of potential transports +(including TLS and Unix sockets, and perhaps even non-gRPC-based +transports in the future) (ARROW-4651), and application-defined +metadata can be sent alongside data (ARROW-4626, ARROW-4627). Finally, +can now send and receive streams containing dictionaries (ARROW-3200). + +Windows is now a supported platform for Flight in C++ and Python +(ARROW-3294), and Python wheels are shipped for all languages +(ARROW-3150, ARROW-5656). C++, Python, and Java have been brought to +parity, now that actions can return streaming results in Java +(ARROW-5254). + +## C++ notes + +### General platform improvements + +* A FileSystem abstraction (ARROW-767) has been added, which paves the + way for a future Arrow Datasets library allowing to access sharded + data on arbitrary storage systems, including remote or cloud + storage. A first draft of the Datasets API was committed in + ARROW-5512. Right now, this comes with no implementation, but we + expect to slowly build it up in the coming weeks or months. Early + feedback is welcome on this API. +* The dictionary API has been reworked in ARROW-3144. The dictionary + values used to be tied to the DictionaryType instance, which ended + up too inflexible. Since dictionary-encoding is more often an + optimization than a semantic property of the data, we decided to + move the dictionary values to the ArrayData structure, making it + natural for dictionary-encoded arrays to share the same DataType + instance, regardless of the encoding details. +* The FixedSizeList and Map types have been implemented, including in + integration tests. The Map type is akin to a List of Struct(key, + value) entries, but making it explicit that the underlying data has + key-value mapping semantics. Also, map entries are always non-null. +* A Result class has been introduced in ARROW-4800. The aim is to + allow to return an error as w ell as a function's logical result + without resorting to pointer-out arguments. + +* The Parquet C++ library has been refactored to use common Arrow IO + classes for improved C++ platform interoperability. + +### Line-delimited JSON reader + +A multithreaded line-delimited JSON reader (powered internally by +RapidJSON) is now available for use (also in Python and R via +bindings) . This will likely be expanded to support more kinds of JSON +storage in the future. + +### New computational kernels + +A number of new computational kernels have been developed + +* Compare filter for logical comparisons yielding boolean arrays +* Filter kernel for selecting elements of an input array according to a boolean selection array. +* Take kernel, which selects elements by integer index, has been expanded to support nested types + +## C# Notes + +The native C# implementation has continued to mature since 0.13. This +release includes a number of performance, memory use, and usability +improvements. + +## Go notes + +Go's support for the Arrow columnar format continues to expand. Go now +supports reading and writing the Arrow columnar binary protocol, and +it has also been added to the cross language integration tests. There +are now four languages (C++, Go, Java, and JavaScript) included in our +integration tests to verify cross-language interoperability. + +## Java notes + +Support for referencing arbitrary memory using `ArrowBuf` has been implemented, paving the way for memory map support in Java +A number of performance improvements around vector value access were added (see ARROW-5264, ARROW-5290). +The Map type has been implemented in Java and integration tested with C++ +Several microbenchmarks have been added and improved. Including a significant speed-up of zeroing out buffers. +A new algorithms package has been started to contain reference implementations of common algorithms. The initial contribution is for Array/Vector sorting. + +## Javascript Notes + +TODO new Builder API + +## MATLAB Notes +Version 0.14.0 features improved Feather file support in the MEX bindings. + +## Python notes + +We fixed a problem with the Python wheels causing the Python wheels to be much larger in 0.13.0 than they were in 0.12.0. Since the introduction of LLVM into our build toolchain, the wheels are going to still be significantly bigger. We are interested in approaches to enable pyarrow to be installed in pieces with pip or conda rather than monolithically. +It is now possible to define ExtensionTypes with a Python implementation (ARROW-840). Those ExtensionTypes can survive a roundtrip through C++ and serialization. +The Flight improvements highlighted above (see C++ notes) are all available from Python. Furthermore, Flight is now bundled in our binary wheels and conda packages for Linux, Windows and macOS (ARROW-3150, ARROW-5656). +We will build “manylinux2010” binary wheels for Linux systems, in addition to “manylinux1” wheels (ARROW-2461). Manylinux2010 is a newer standard for more recent systems, with less limiting toolchain constraints. Installing manylinux2010 wheels requires an up-to-date version of pip. +Various bug fixes for CSV reading in Python and C++ including the ability to parse Decimal(x, y) columns. +Parquet file improvements +Column statistics for logical types like unicode strings, unsigned integers, and timestamps are casted to compatible Python types (see ARROW-4139) +It's now possible to configure “data page” sizes when writing a file from Python + +## Ruby and C GLib notes + +The GLib and Ruby bindings have been tracking features in the C++ +project. This release includes bindings for Gandiva, JSON reader, and +other C++ features. + +## Rust notes + +## R notes + +We have been working on build and packaging for R so that community +members can hopefully release the project to CRAN in the near future. + +TODO features + +## Community Discussions Ongoing + +There are a number of active discussions ongoing on the developer +dev@arrow.apache.org mailing list. We look forward to hearing from the +community there: + +* A proposal for versioning and forward/backward compatibility + guarantees for the 1.0.0 release was shared, not much discussion has + occurred yet. +* Addressing possible unaligned access and undefined behavior concerns + in the Arrow binary protocol +* Supporting smaller than 128-bit encoding of fixed width decimals +* Forking the Avro C++ implementation so as to adapt it to Arrow's + needs From f9653fa173607b6e9ecad69456603ea0e1774536 Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Sun, 7 Jul 2019 17:53:35 -0500 Subject: [PATCH 2/9] Finish tweaking formatting, links --- site/_posts/2019-07-08-0.14.0-release.md | 137 +++++++++++++++-------- site/_release/0.14.0.md | 40 +++---- 2 files changed, 113 insertions(+), 64 deletions(-) diff --git a/site/_posts/2019-07-08-0.14.0-release.md b/site/_posts/2019-07-08-0.14.0-release.md index ab2effb4b35d..9a185ed905d8 100644 --- a/site/_posts/2019-07-08-0.14.0-release.md +++ b/site/_posts/2019-07-08-0.14.0-release.md @@ -25,21 +25,21 @@ limitations under the License. --> The Apache Arrow team is pleased to announce the 0.14.0 release. This -covers 3 months of development work and includes 602 resolved issues -from X distinct contributors. See the Install Page to learn how to -get the libraries for your platform. The complete changelog is also -available. +covers 3 months of development work and includes [**602 resolved +issues**][1] from [**75 distinct contributors**][2]. See the Install +Page to learn how to get the libraries for your platform. The +[complete changelog][3] is also available. -While it's a large release, this post will give some brief highlights -in the project since the 0.13.0 release from April. +This post will give some brief highlights in the project since the +0.13.0 release from April. ## New committers Since the 0.13.0 release, the following have been added: -* Neville Dipale was added as a committer -* François Saint-Jacques was added as a committer -* Praveen Kumar was added as a committer +* [Neville Dipale][5] was added as a committer +* [François Saint-Jacques][6] was added as a committer +* [Praveen Kumar][7] was added as a committer Thank you for all your contributions! @@ -79,7 +79,9 @@ Organizations that contribute to Apache Arrow are working on physical build infrastructure and tools to improve build times and build scalability. One such new tool is ``ursabot``, a GitHub-enabled bot that can be used to trigger builds either on physical build or in the -cloud. It can also be used to trigger benchmark timing comparisons. +cloud. It can also be used to trigger benchmark timing comparisons. If +you are contributing to the project, you may see Ursabot being +employed to trigger tests in pull requests. To help assist with migrating away from Travis CI, we are also working to make as many of our builds reproducible with Docker and not reliant @@ -89,7 +91,7 @@ for Travis CI. ## Columnar Format Notes -* User-defined “extension” types have been formalized in the Arrow +* User-defined "extension" types have been formalized in the Arrow format, enabling library users to embed custom data types in the Arrow columnar format. Initial support is available in C++, Java, and Python. @@ -99,15 +101,18 @@ for Travis CI. ## Arrow Flight notes Flight now supports many of the features of a complete RPC -framework. Authentication APIs are now supported across all languages -(ARROW-5137), as is encrypted communication using OpenSSL (ARROW-5643, -ARROW-5529), and clients can specify timeouts on remote calls -(ARROW-5136). On the protocol level, endpoints are now identified with -URIs, to support an open-ended number of potential transports -(including TLS and Unix sockets, and perhaps even non-gRPC-based -transports in the future) (ARROW-4651), and application-defined -metadata can be sent alongside data (ARROW-4626, ARROW-4627). Finally, -can now send and receive streams containing dictionaries (ARROW-3200). +framework. + +* Authentication APIs are now supported across all languages (ARROW-5137) +* Encrypted communication using OpenSSL is supported (ARROW-5643, + ARROW-5529) +* Clients can specify timeouts on remote calls (ARROW-5136) +* On the protocol level, endpoints are now identified with URIs, to + support an open-ended number of potential transports (including TLS + and Unix sockets, and perhaps even non-gRPC-based transports in the + future) (ARROW-4651) +* Application-defined metadata can be sent alongside data (ARROW-4626, + ARROW-4627). Windows is now a supported platform for Flight in C++ and Python (ARROW-3294), and Python wheels are shipped for all languages @@ -117,6 +122,9 @@ parity, now that actions can return streaming results in Java ## C++ notes +188 resolved issues related to the C++ implementation, so we summarize +some of the work here. + ### General platform improvements * A FileSystem abstraction (ARROW-767) has been added, which paves the @@ -137,10 +145,9 @@ parity, now that actions can return streaming results in Java integration tests. The Map type is akin to a List of Struct(key, value) entries, but making it explicit that the underlying data has key-value mapping semantics. Also, map entries are always non-null. -* A Result class has been introduced in ARROW-4800. The aim is to +* A `Result` class has been introduced in ARROW-4800. The aim is to allow to return an error as w ell as a function's logical result without resorting to pointer-out arguments. - * The Parquet C++ library has been refactored to use common Arrow IO classes for improved C++ platform interoperability. @@ -156,8 +163,10 @@ storage in the future. A number of new computational kernels have been developed * Compare filter for logical comparisons yielding boolean arrays -* Filter kernel for selecting elements of an input array according to a boolean selection array. -* Take kernel, which selects elements by integer index, has been expanded to support nested types +* Filter kernel for selecting elements of an input array according to + a boolean selection array. +* Take kernel, which selects elements by integer index, has been + expanded to support nested types ## C# Notes @@ -169,35 +178,63 @@ improvements. Go's support for the Arrow columnar format continues to expand. Go now supports reading and writing the Arrow columnar binary protocol, and -it has also been added to the cross language integration tests. There -are now four languages (C++, Go, Java, and JavaScript) included in our -integration tests to verify cross-language interoperability. +it has also been **added to the cross language integration +tests**. There are now four languages (C++, Go, Java, and JavaScript) +included in our integration tests to verify cross-language +interoperability. ## Java notes -Support for referencing arbitrary memory using `ArrowBuf` has been implemented, paving the way for memory map support in Java -A number of performance improvements around vector value access were added (see ARROW-5264, ARROW-5290). -The Map type has been implemented in Java and integration tested with C++ -Several microbenchmarks have been added and improved. Including a significant speed-up of zeroing out buffers. -A new algorithms package has been started to contain reference implementations of common algorithms. The initial contribution is for Array/Vector sorting. +* Support for referencing arbitrary memory using `ArrowBuf` has been + implemented, paving the way for memory map support in Java +* A number of performance improvements around vector value access were + added (see ARROW-5264, ARROW-5290). +* The Map type has been implemented in Java and integration tested + with C++ +* Several microbenchmarks have been added and improved. Including a + significant speed-up of zeroing out buffers. +* A new algorithms package has been started to contain reference + implementations of common algorithms. The initial contribution is + for Array/Vector sorting. ## Javascript Notes -TODO new Builder API +A new incremental [array builder API][4] is available. ## MATLAB Notes + Version 0.14.0 features improved Feather file support in the MEX bindings. ## Python notes -We fixed a problem with the Python wheels causing the Python wheels to be much larger in 0.13.0 than they were in 0.12.0. Since the introduction of LLVM into our build toolchain, the wheels are going to still be significantly bigger. We are interested in approaches to enable pyarrow to be installed in pieces with pip or conda rather than monolithically. -It is now possible to define ExtensionTypes with a Python implementation (ARROW-840). Those ExtensionTypes can survive a roundtrip through C++ and serialization. -The Flight improvements highlighted above (see C++ notes) are all available from Python. Furthermore, Flight is now bundled in our binary wheels and conda packages for Linux, Windows and macOS (ARROW-3150, ARROW-5656). -We will build “manylinux2010” binary wheels for Linux systems, in addition to “manylinux1” wheels (ARROW-2461). Manylinux2010 is a newer standard for more recent systems, with less limiting toolchain constraints. Installing manylinux2010 wheels requires an up-to-date version of pip. -Various bug fixes for CSV reading in Python and C++ including the ability to parse Decimal(x, y) columns. -Parquet file improvements -Column statistics for logical types like unicode strings, unsigned integers, and timestamps are casted to compatible Python types (see ARROW-4139) -It's now possible to configure “data page” sizes when writing a file from Python +* We fixed a problem with the Python wheels causing the Python wheels + to be much larger in 0.13.0 than they were in 0.12.0. Since the + introduction of LLVM into our build toolchain, the wheels are going + to still be significantly bigger. We are interested in approaches to + enable pyarrow to be installed in pieces with pip or conda rather + than monolithically. +* It is now possible to define ExtensionTypes with a Python + implementation (ARROW-840). Those ExtensionTypes can survive a + roundtrip through C++ and serialization. +* The Flight improvements highlighted above (see C++ notes) are all + available from Python. Furthermore, Flight is now bundled in our + binary wheels and conda packages for Linux, Windows and macOS + (ARROW-3150, ARROW-5656). +* We will build "manylinux2010" binary wheels for Linux systems, in + addition to "manylinux1" wheels (ARROW-2461). Manylinux2010 is a + newer standard for more recent systems, with less limiting toolchain + constraints. Installing manylinux2010 wheels requires an up-to-date + version of pip. +* Various bug fixes for CSV reading in Python and C++ including the + ability to parse Decimal(x, y) columns. + +### Parquet improvements + +* Column statistics for logical types like unicode strings, unsigned + integers, and timestamps are casted to compatible Python types (see + ARROW-4139) +* It's now possible to configure "data page" sizes when writing a file + from Python ## Ruby and C GLib notes @@ -207,12 +244,16 @@ other C++ features. ## Rust notes +There is ongoing work in Rust happening on Parquet file support, +computational kernels, and the DataFusion query engine. See the full +changelog for details. + ## R notes We have been working on build and packaging for R so that community -members can hopefully release the project to CRAN in the near future. - -TODO features +members can hopefully release the project to CRAN in the near +future. Feature development for R has continued to follow the upstream +C++ project. ## Community Discussions Ongoing @@ -228,3 +269,11 @@ community there: * Supporting smaller than 128-bit encoding of fixed width decimals * Forking the Avro C++ implementation so as to adapt it to Arrow's needs + +[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.13.0 +[2]: https://arrow.apache.org/release/0.14.0.html#contributors +[3]: https://arrow.apache.org/release/0.14.0.html +[4]: https://github.com/apache/arrow/tree/master/js/src/builder +[5]: https://github.com/nevi-me +[6]: https://github.com/fsaintjacques +[7]: https://github.com/praveenbingo \ No newline at end of file diff --git a/site/_release/0.14.0.md b/site/_release/0.14.0.md index 8bf84c82d11f..ed191d9d355d 100644 --- a/site/_release/0.14.0.md +++ b/site/_release/0.14.0.md @@ -192,13 +192,13 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-3166](https://issues.apache.org/jira/browse/ARROW-3166) - [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp * [ARROW-3191](https://issues.apache.org/jira/browse/ARROW-3191) - [Java] Add support for ArrowBuf to point to arbitrary memory. * [ARROW-3200](https://issues.apache.org/jira/browse/ARROW-3200) - [C++] Add support for reading Flight streams with dictionaries -* [ARROW-3290](https://issues.apache.org/jira/browse/ARROW-3290) - [C++] Toolchain support for secure gRPC +* [ARROW-3290](https://issues.apache.org/jira/browse/ARROW-3290) - [C++] Toolchain support for secure gRPC * [ARROW-3294](https://issues.apache.org/jira/browse/ARROW-3294) - [C++] Test Flight RPC on Windows / Appveyor * [ARROW-3314](https://issues.apache.org/jira/browse/ARROW-3314) - [R] Set -rpath using pkg-config when building * [ARROW-3419](https://issues.apache.org/jira/browse/ARROW-3419) - [C++] Run include-what-you-use checks as nightly build * [ARROW-3459](https://issues.apache.org/jira/browse/ARROW-3459) - [C++][Gandiva] Add support for variable length output vectors * [ARROW-3475](https://issues.apache.org/jira/browse/ARROW-3475) - [C++] Int64Builder.Finish(NumericArray) -* [ARROW-3572](https://issues.apache.org/jira/browse/ARROW-3572) - [Packaging] Correctly handle ssh origin urls for crossbow +* [ARROW-3572](https://issues.apache.org/jira/browse/ARROW-3572) - [Packaging] Correctly handle ssh origin urls for crossbow * [ARROW-3671](https://issues.apache.org/jira/browse/ARROW-3671) - [Go] implement Interval array * [ARROW-3676](https://issues.apache.org/jira/browse/ARROW-3676) - [Go] implement Decimal128 array * [ARROW-3679](https://issues.apache.org/jira/browse/ARROW-3679) - [Go] implement IPC protocol @@ -213,7 +213,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-3791](https://issues.apache.org/jira/browse/ARROW-3791) - [C++] Add type inference for boolean values in CSV files * [ARROW-3794](https://issues.apache.org/jira/browse/ARROW-3794) - [R] Consider mapping INT8 to integer() not raw() * [ARROW-3804](https://issues.apache.org/jira/browse/ARROW-3804) - [R] Consider lowering required R runtime -* [ARROW-3810](https://issues.apache.org/jira/browse/ARROW-3810) - [R] type= argument for Array and ChunkedArray +* [ARROW-3810](https://issues.apache.org/jira/browse/ARROW-3810) - [R] type= argument for Array and ChunkedArray * [ARROW-3811](https://issues.apache.org/jira/browse/ARROW-3811) - [R] struct arrays inference * [ARROW-3814](https://issues.apache.org/jira/browse/ARROW-3814) - [R] RecordBatch$from\_arrays() * [ARROW-3815](https://issues.apache.org/jira/browse/ARROW-3815) - [R] refine record batch factory @@ -226,7 +226,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-4047](https://issues.apache.org/jira/browse/ARROW-4047) - [Python] Document use of int96 timestamps and options in Parquet docs * [ARROW-4086](https://issues.apache.org/jira/browse/ARROW-4086) - [Java] Add apis to debug alloc failures * [ARROW-4121](https://issues.apache.org/jira/browse/ARROW-4121) - [C++] Refactor memory allocation from InvertKernel -* [ARROW-4159](https://issues.apache.org/jira/browse/ARROW-4159) - [C++] Check for -Wdocumentation issues +* [ARROW-4159](https://issues.apache.org/jira/browse/ARROW-4159) - [C++] Check for -Wdocumentation issues * [ARROW-4194](https://issues.apache.org/jira/browse/ARROW-4194) - [Format] Metadata.rst does not specify timezone for Timestamp type * [ARROW-4302](https://issues.apache.org/jira/browse/ARROW-4302) - [C++] Add OpenSSL to C++ build toolchain * [ARROW-4337](https://issues.apache.org/jira/browse/ARROW-4337) - [C#] Array / RecordBatch Builder Fluent API @@ -246,7 +246,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-4627](https://issues.apache.org/jira/browse/ARROW-4627) - [Flight] Add application metadata field to DoPut * [ARROW-4701](https://issues.apache.org/jira/browse/ARROW-4701) - [C++] Add JSON chunker benchmarks * [ARROW-4702](https://issues.apache.org/jira/browse/ARROW-4702) - [C++] Upgrade dependency versions -* [ARROW-4708](https://issues.apache.org/jira/browse/ARROW-4708) - [C++] Add multithreaded JSON reader +* [ARROW-4708](https://issues.apache.org/jira/browse/ARROW-4708) - [C++] Add multithreaded JSON reader * [ARROW-4714](https://issues.apache.org/jira/browse/ARROW-4714) - [C++][Java] Providing JNI interface to Read ORC file via Arrow C++ * [ARROW-4717](https://issues.apache.org/jira/browse/ARROW-4717) - [C#] Consider exposing ValueTask instead of Task * [ARROW-4719](https://issues.apache.org/jira/browse/ARROW-4719) - [C#] Implement ChunkedArray, Column and Table in C# @@ -262,7 +262,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-4904](https://issues.apache.org/jira/browse/ARROW-4904) - [C++] Move implementations in arrow/ipc/test-common.h into libarrow\_testing * [ARROW-4911](https://issues.apache.org/jira/browse/ARROW-4911) - [R] Support for building package for Windows * [ARROW-4912](https://issues.apache.org/jira/browse/ARROW-4912) - [C++, Python] Allow specifying column names to CSV reader -* [ARROW-4913](https://issues.apache.org/jira/browse/ARROW-4913) - [Java][Memory] Limit number of ledgers and arrowbufs +* [ARROW-4913](https://issues.apache.org/jira/browse/ARROW-4913) - [Java][Memory] Limit number of ledgers and arrowbufs * [ARROW-4945](https://issues.apache.org/jira/browse/ARROW-4945) - [Flight] Enable Flight integration tests in Travis * [ARROW-4956](https://issues.apache.org/jira/browse/ARROW-4956) - [C#] Allow ArrowBuffers to wrap external Memory in C# * [ARROW-4959](https://issues.apache.org/jira/browse/ARROW-4959) - [Gandiva][Crossbow] Builds broken @@ -274,7 +274,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-4990](https://issues.apache.org/jira/browse/ARROW-4990) - [C++] Kernel to compare array with array * [ARROW-4993](https://issues.apache.org/jira/browse/ARROW-4993) - [C++] Display summary at the end of CMake configuration * [ARROW-5000](https://issues.apache.org/jira/browse/ARROW-5000) - [Python] Fix deprecation warning from setup.py -* [ARROW-5007](https://issues.apache.org/jira/browse/ARROW-5007) - [C++] Move DCHECK out of sse-utils +* [ARROW-5007](https://issues.apache.org/jira/browse/ARROW-5007) - [C++] Move DCHECK out of sse-utils * [ARROW-5020](https://issues.apache.org/jira/browse/ARROW-5020) - [C++][Gandiva] Split Gandiva-related conda packages for builds into separate .yml conda env file * [ARROW-5027](https://issues.apache.org/jira/browse/ARROW-5027) - [Python] Add JSON Reader * [ARROW-5038](https://issues.apache.org/jira/browse/ARROW-5038) - [Rust] [DataFusion] Implement AVG aggregate function @@ -282,7 +282,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5045](https://issues.apache.org/jira/browse/ARROW-5045) - [Rust] Code coverage silently failing in CI * [ARROW-5053](https://issues.apache.org/jira/browse/ARROW-5053) - [Rust] [DataFusion] Use env var for location of arrow test data * [ARROW-5054](https://issues.apache.org/jira/browse/ARROW-5054) - [C++][Release] Test Flight in verify-release-candidate.sh -* [ARROW-5056](https://issues.apache.org/jira/browse/ARROW-5056) - [Packaging] Adjust conda recipes to use ORC conda-forge package on unix systems +* [ARROW-5056](https://issues.apache.org/jira/browse/ARROW-5056) - [Packaging] Adjust conda recipes to use ORC conda-forge package on unix systems * [ARROW-5061](https://issues.apache.org/jira/browse/ARROW-5061) - [Release] Improve 03-binary performance * [ARROW-5062](https://issues.apache.org/jira/browse/ARROW-5062) - [Java] Shade Java Guava dependency for Flight * [ARROW-5063](https://issues.apache.org/jira/browse/ARROW-5063) - [Java] FlightClient should not create a child allocator @@ -337,7 +337,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5203](https://issues.apache.org/jira/browse/ARROW-5203) - [GLib] Add support for Compare filter * [ARROW-5204](https://issues.apache.org/jira/browse/ARROW-5204) - [C++] Improve BufferBuilder performance * [ARROW-5212](https://issues.apache.org/jira/browse/ARROW-5212) - [Go] Array BinaryBuilder in Go library has no access to resize the values buffer -* [ARROW-5218](https://issues.apache.org/jira/browse/ARROW-5218) - [C++] Improve build when third-party library locations are specified +* [ARROW-5218](https://issues.apache.org/jira/browse/ARROW-5218) - [C++] Improve build when third-party library locations are specified * [ARROW-5219](https://issues.apache.org/jira/browse/ARROW-5219) - [C++] Build protobuf\_ep in parallel when using Ninja * [ARROW-5222](https://issues.apache.org/jira/browse/ARROW-5222) - [Python] Issues with installing pyarrow for development on MacOS * [ARROW-5225](https://issues.apache.org/jira/browse/ARROW-5225) - [Java] Improve performance of BaseValueVector#getValidityBufferSizeFromCount @@ -362,7 +362,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5288](https://issues.apache.org/jira/browse/ARROW-5288) - [Documentation] Enrich the contribution guidelines * [ARROW-5289](https://issues.apache.org/jira/browse/ARROW-5289) - [C++] Move arrow/util/concatenate.h to arrow/array/ * [ARROW-5290](https://issues.apache.org/jira/browse/ARROW-5290) - [Java] Provide a flag to enable/disable null-checking in vectors' get methods -* [ARROW-5291](https://issues.apache.org/jira/browse/ARROW-5291) - [Python] Add wrapper for "take" kernel on Array +* [ARROW-5291](https://issues.apache.org/jira/browse/ARROW-5291) - [Python] Add wrapper for "take" kernel on Array * [ARROW-5298](https://issues.apache.org/jira/browse/ARROW-5298) - [Rust] Add debug implementation for Buffer * [ARROW-5299](https://issues.apache.org/jira/browse/ARROW-5299) - [C++] ListArray comparison is incorrect * [ARROW-5309](https://issues.apache.org/jira/browse/ARROW-5309) - [Python] Add clarifications to Python "append" methods that return new objects @@ -441,7 +441,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5488](https://issues.apache.org/jira/browse/ARROW-5488) - [R] Workaround when C++ lib not available * [ARROW-5490](https://issues.apache.org/jira/browse/ARROW-5490) - [C++] Remove ARROW\_BOOST\_HEADER\_ONLY * [ARROW-5491](https://issues.apache.org/jira/browse/ARROW-5491) - [C++] Remove unecessary semicolons following MACRO definitions -* [ARROW-5492](https://issues.apache.org/jira/browse/ARROW-5492) - [R] Add "col\_select" argument to read\_\* functions to read subset of columns +* [ARROW-5492](https://issues.apache.org/jira/browse/ARROW-5492) - [R] Add "col\_select" argument to read\_\* functions to read subset of columns * [ARROW-5495](https://issues.apache.org/jira/browse/ARROW-5495) - [C++] Use HTTPS consistently for downloading dependencies * [ARROW-5496](https://issues.apache.org/jira/browse/ARROW-5496) - [R][CI] Fix relative paths in R codecov.io reporting * [ARROW-5498](https://issues.apache.org/jira/browse/ARROW-5498) - [C++] Build failure with Flatbuffers 1.11.0 and MinGW @@ -453,7 +453,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5512](https://issues.apache.org/jira/browse/ARROW-5512) - [C++] Draft initial public APIs for Datasets project * [ARROW-5513](https://issues.apache.org/jira/browse/ARROW-5513) - [Java] Refactor method name for getstartOffset to use camel case * [ARROW-5516](https://issues.apache.org/jira/browse/ARROW-5516) - [Python] Development page for pyarrow has a missing dependency in using pip -* [ARROW-5518](https://issues.apache.org/jira/browse/ARROW-5518) - [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear +* [ARROW-5518](https://issues.apache.org/jira/browse/ARROW-5518) - [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear * [ARROW-5524](https://issues.apache.org/jira/browse/ARROW-5524) - [C++] Turn off PARQUET\_BUILD\_ENCRYPTION in CMake if OpenSSL not found * [ARROW-5526](https://issues.apache.org/jira/browse/ARROW-5526) - [Developer] Add more prominent notice to GitHub issue template to direct bug reports to JIRA * [ARROW-5529](https://issues.apache.org/jira/browse/ARROW-5529) - [Flight] Allow serving with multiple TLS certificates @@ -532,7 +532,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5718](https://issues.apache.org/jira/browse/ARROW-5718) - [R] auto splice data frames in record\_batch() and table() * [ARROW-5721](https://issues.apache.org/jira/browse/ARROW-5721) - [Rust] Move array related code into a separate module * [ARROW-5724](https://issues.apache.org/jira/browse/ARROW-5724) - [R] [CI] AppVeyor build should use ccache -* [ARROW-5725](https://issues.apache.org/jira/browse/ARROW-5725) - [Crossbow] Port conda recipes to azure pipelines +* [ARROW-5725](https://issues.apache.org/jira/browse/ARROW-5725) - [Crossbow] Port conda recipes to azure pipelines * [ARROW-5726](https://issues.apache.org/jira/browse/ARROW-5726) - [Java] Implement a common interface for int vectors * [ARROW-5727](https://issues.apache.org/jira/browse/ARROW-5727) - [Python] [CI] Install pytest-faulthandler before running tests * [ARROW-5748](https://issues.apache.org/jira/browse/ARROW-5748) - [Packaging][deb] Add support for Debian GNU/Linux buster @@ -570,7 +570,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-2461](https://issues.apache.org/jira/browse/ARROW-2461) - [Python] Build wheels for manylinux2010 tag * [ARROW-3344](https://issues.apache.org/jira/browse/ARROW-3344) - [Python] test\_plasma.py fails (in test\_plasma\_list) * [ARROW-3399](https://issues.apache.org/jira/browse/ARROW-3399) - [Python] Cannot serialize numpy matrix object -* [ARROW-3650](https://issues.apache.org/jira/browse/ARROW-3650) - [Python] Mixed column indexes are read back as strings +* [ARROW-3650](https://issues.apache.org/jira/browse/ARROW-3650) - [Python] Mixed column indexes are read back as strings * [ARROW-3762](https://issues.apache.org/jira/browse/ARROW-3762) - [C++] Parquet arrow::Table reads error when overflowing capacity of BinaryArray * [ARROW-4021](https://issues.apache.org/jira/browse/ARROW-4021) - [Ruby] Error building red-arrow on msys2 * [ARROW-4076](https://issues.apache.org/jira/browse/ARROW-4076) - [Python] schema validation and filters @@ -592,7 +592,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-4885](https://issues.apache.org/jira/browse/ARROW-4885) - [Python] read\_csv() can't handle decimal128 columns * [ARROW-4886](https://issues.apache.org/jira/browse/ARROW-4886) - [Rust] Inconsistent behaviour with casting sliced primitive array to list array * [ARROW-4923](https://issues.apache.org/jira/browse/ARROW-4923) - Expose setters for Decimal vector that take long and double inputs -* [ARROW-4934](https://issues.apache.org/jira/browse/ARROW-4934) - [Python] Address deprecation notice that will be a bug in Python 3.8 +* [ARROW-4934](https://issues.apache.org/jira/browse/ARROW-4934) - [Python] Address deprecation notice that will be a bug in Python 3.8 * [ARROW-5019](https://issues.apache.org/jira/browse/ARROW-5019) - [C#] ArrowStreamWriter doesn't work on a non-seekable stream * [ARROW-5049](https://issues.apache.org/jira/browse/ARROW-5049) - [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark * [ARROW-5051](https://issues.apache.org/jira/browse/ARROW-5051) - [GLib][Gandiva] Test failure in release verification script @@ -614,7 +614,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5142](https://issues.apache.org/jira/browse/ARROW-5142) - [CI] Fix conda calls in AppVeyor scripts * [ARROW-5144](https://issues.apache.org/jira/browse/ARROW-5144) - [Python] ParquetDataset and ParquetPiece not serializable * [ARROW-5146](https://issues.apache.org/jira/browse/ARROW-5146) - [Dev] Merge script imposes directory name -* [ARROW-5147](https://issues.apache.org/jira/browse/ARROW-5147) - [C++] get an error in building: Could NOT find DoubleConversion +* [ARROW-5147](https://issues.apache.org/jira/browse/ARROW-5147) - [C++] get an error in building: Could NOT find DoubleConversion * [ARROW-5148](https://issues.apache.org/jira/browse/ARROW-5148) - [CI] [C++] LLVM-related compile errors * [ARROW-5149](https://issues.apache.org/jira/browse/ARROW-5149) - [Packaging][Wheel] Pin LLVM to version 7 in windows builds * [ARROW-5152](https://issues.apache.org/jira/browse/ARROW-5152) - [Python] CMake warnings when building @@ -622,7 +622,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5160](https://issues.apache.org/jira/browse/ARROW-5160) - [C++] ABORT\_NOT\_OK evalutes expression twice * [ARROW-5166](https://issues.apache.org/jira/browse/ARROW-5166) - [Python][Parquet] Statistics for uint64 columns may overflow * [ARROW-5167](https://issues.apache.org/jira/browse/ARROW-5167) - [C++] Upgrade string-view-light to latest -* [ARROW-5169](https://issues.apache.org/jira/browse/ARROW-5169) - [Python] non-nullable fields are converted to nullable in {{Table.from\_pandas}} +* [ARROW-5169](https://issues.apache.org/jira/browse/ARROW-5169) - [Python] non-nullable fields are converted to nullable in Table.from\_pandas * [ARROW-5173](https://issues.apache.org/jira/browse/ARROW-5173) - [Go] handle multiple concatenated streams back-to-back * [ARROW-5174](https://issues.apache.org/jira/browse/ARROW-5174) - [Go] implement Stringer for DataTypes * [ARROW-5177](https://issues.apache.org/jira/browse/ARROW-5177) - [Python] ParquetReader.read\_column() doesn't check bounds @@ -655,13 +655,13 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5301](https://issues.apache.org/jira/browse/ARROW-5301) - [Python] parquet documentation outdated on nthreads argument * [ARROW-5306](https://issues.apache.org/jira/browse/ARROW-5306) - [CI] [GLib] Disable GTK-Doc * [ARROW-5308](https://issues.apache.org/jira/browse/ARROW-5308) - [Go] remove deprecated Feather format -* [ARROW-5314](https://issues.apache.org/jira/browse/ARROW-5314) - [Go] Incorrect Printing for String Arrays with Offsets +* [ARROW-5314](https://issues.apache.org/jira/browse/ARROW-5314) - [Go] Incorrect Printing for String Arrays with Offsets * [ARROW-5325](https://issues.apache.org/jira/browse/ARROW-5325) - [Archery][Benchmark] Output properly formatted jsonlines from benchmark diff cli command * [ARROW-5330](https://issues.apache.org/jira/browse/ARROW-5330) - [Python] [CI] Run Python Flight tests on Travis-CI * [ARROW-5332](https://issues.apache.org/jira/browse/ARROW-5332) - [R] R package fails to build/install: error in dyn.load() * [ARROW-5348](https://issues.apache.org/jira/browse/ARROW-5348) - [CI] [Java] Gandiva checkstyle failure * [ARROW-5360](https://issues.apache.org/jira/browse/ARROW-5360) - [Rust] Builds are broken by rustyline on nightly 2019-05-16+ -* [ARROW-5362](https://issues.apache.org/jira/browse/ARROW-5362) - [C++] Compression round trip test can cause some sanitizers to to fail +* [ARROW-5362](https://issues.apache.org/jira/browse/ARROW-5362) - [C++] Compression round trip test can cause some sanitizers to to fail * [ARROW-5371](https://issues.apache.org/jira/browse/ARROW-5371) - [Release] Add tests for dev/release/00-prepare.sh * [ARROW-5373](https://issues.apache.org/jira/browse/ARROW-5373) - [Java] Add missing details for Gandiva Java Build * [ARROW-5376](https://issues.apache.org/jira/browse/ARROW-5376) - [C++] Compile failure on gcc 5.4.0 @@ -669,7 +669,7 @@ $ git shortlog -csn apache-arrow-0.13.0..apache-arrow-0.14.0 * [ARROW-5387](https://issues.apache.org/jira/browse/ARROW-5387) - [Go] properly handle sub-slice of List * [ARROW-5388](https://issues.apache.org/jira/browse/ARROW-5388) - [Go] use arrow.TypeEqual in array.NewChunked * [ARROW-5390](https://issues.apache.org/jira/browse/ARROW-5390) - [CI] Job time limit exceeded on Travis -* [ARROW-5397](https://issues.apache.org/jira/browse/ARROW-5397) - Test Flight TLS support +* [ARROW-5397](https://issues.apache.org/jira/browse/ARROW-5397) - Test Flight TLS support * [ARROW-5398](https://issues.apache.org/jira/browse/ARROW-5398) - [Python] Flight tests broken by URI changes * [ARROW-5403](https://issues.apache.org/jira/browse/ARROW-5403) - [C++] Test failures not propagated in Windows shared builds * [ARROW-5411](https://issues.apache.org/jira/browse/ARROW-5411) - [C++][Python] Build error building on Mac OS Mojave From 1544da1803e9aab2c6bbc2eba14b93af04138d9c Mon Sep 17 00:00:00 2001 From: Wes McKinney Date: Sun, 7 Jul 2019 18:03:32 -0500 Subject: [PATCH 3/9] Attribute community --- site/_posts/2019-07-08-0.14.0-release.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/_posts/2019-07-08-0.14.0-release.md b/site/_posts/2019-07-08-0.14.0-release.md index 9a185ed905d8..d9156ad1552a 100644 --- a/site/_posts/2019-07-08-0.14.0-release.md +++ b/site/_posts/2019-07-08-0.14.0-release.md @@ -2,7 +2,7 @@ layout: post title: "Apache Arrow 0.14.0 Release" date: "2019-07-02 00:00:00 -0600" -author: wesm +author: apache categories: [release] ---