Apache Arrow is a columnar in-memory analytics layer designed to accelerate big data. It houses a set of canonical in-memory representations of flat and hierarchical data along with multiple language-bindings for structure manipulation. It also provides IPC and common algorithm implementations.
Java C++ Python CMake Shell C Other
Latest commit 8ca7033 Jan 20, 2017 @wesm wesm ARROW-499: Update file serialization to use the streaming serializati…
…on format.

Author: Wes McKinney <wes.mckinney@twosigma.com>
Author: Nong Li <nongli@gmail.com>

Closes #292 from nongli/file and squashes the following commits:

18890a9 [Wes McKinney] Message fixes. Fix Java test suite. Integration tests pass
f187539 [Nong Li] Merge pull request #1 from wesm/file-change-cpp-impl
e3af434 [Wes McKinney] Remove unused variable
664d5be [Wes McKinney] Fixes, stream tests pass again
ba8db91 [Wes McKinney] Redo MessageSerializer with unions. Still has bugs
21854cc [Wes McKinney] Restore Block.bodyLength to long
7c6f7ef [Nong Li] Update to restore Block behavior
27b3909 [Nong Li] [ARROW-499]: [Java] Update file serialization to use the streaming serialization format.
Permalink
Failed to load latest commit information.
ci ARROW-456: Add jemalloc based MemoryPool Jan 6, 2017
cpp ARROW-499: Update file serialization to use the streaming serializati… Jan 20, 2017
dev ARROW-430: Improved version handling Dec 20, 2016
format ARROW-499: Update file serialization to use the streaming serializati… Jan 20, 2017
integration ARROW-499: Update file serialization to use the streaming serializati… Jan 20, 2017
java ARROW-499: Update file serialization to use the streaming serializati… Jan 20, 2017
python ARROW-461: [Python] Add Python interfaces to DictionaryArray data, pa… Jan 19, 2017
.clang-format ARROW-432: [Python] Construct precise pandas BlockManager structure f… Dec 23, 2016
.clang-tidy ARROW-432: [Python] Construct precise pandas BlockManager structure f… Dec 23, 2016
.clang-tidy-ignore ARROW-432: [Python] Construct precise pandas BlockManager structure f… Dec 23, 2016
.gitignore ARROW-363: [Java/C++] integration testing harness, initial integratio… Nov 29, 2016
.readthedocs.yml ARROW-346: Use conda environment to build API docs Dec 9, 2016
.travis.yml ARROW-456: Add jemalloc based MemoryPool Jan 6, 2017
LICENSE.txt ARROW-202: Integrate with appveyor ci for windows Nov 26, 2016
NOTICE.txt ARROW-202: Integrate with appveyor ci for windows Nov 26, 2016
README.md ARROW-484: Revise README to include more detail about software compon… Jan 16, 2017
appveyor.yml ARROW-456: Add jemalloc based MemoryPool Jan 6, 2017
header ARROW-259: Use Flatbuffer Field type instead of MaterializedField Aug 18, 2016

README.md

Apache Arrow

Build Status travis build status

Powering Columnar In-Memory Analytics

Arrow is a set of technologies that enable big-data systems to process and move data fast.

Initial implementations include:

Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.

What's in the Arrow libraries?

The reference Arrow implementations contain a number of distinct software components:

  • Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
  • Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
  • Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
  • Low-overhead IO interfaces to files on disk, HDFS (C++ only)
  • Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
  • Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
  • Conversions to and from other in-memory data structures (e.g. Python's pandas library)

Getting involved

Right now the primary audience for Apache Arrow are the developers of data systems; most people will use Apache Arrow indirectly through systems that use it for internal data handling and interoperating with other Arrow-enabled systems.

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: