ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation.#274
ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation.#274wesm wants to merge 4 commits intoapache:masterfrom
Conversation
Change-Id: I740bc111d0a3e81f8a5076a6607926a8d164ddc1
Change-Id: I93e1744b45f0d3dce65ac11a43bc7726ce3209ce
…nkedArray::Equals Change-Id: I15e4caf37d1eb3ffe9479d95c9119fb35b0fefcc
| other_start_idx = 0; | ||
| } else { | ||
| other_start_idx += common_length; | ||
| } |
| option(ARROW_JEMALLOC | ||
| "Build the Arrow jemalloc-based allocator" | ||
| ON) | ||
| OFF) |
There was a problem hiding this comment.
I didn't think this would be controversial, but let's let people opt in to this for now
There was a problem hiding this comment.
Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.
Still if we decide to make it opt-in, we should have it enabled in Travis.
There was a problem hiding this comment.
We can turn it back on by default as soon as we have the external project set up to auto build.
There was a problem hiding this comment.
I can add an external project but I skipped that as it is packaged nearly everywhere (maybe not on windows but Homebrew and all linux distros come with a decently recent version).
There was a problem hiding this comment.
That's fair enough. We eventually need this to all build out of the box with Visual Studio -- it might be useful to provide an option to statically link jemalloc to ensure reliable cross-platform behavior (on Linux, at least)
There was a problem hiding this comment.
Will take care of that: https://issues.apache.org/jira/browse/ARROW-466
Change-Id: I8b0f72c562f075a481e2795914b05bf2427d5080
| option(ARROW_JEMALLOC | ||
| "Build the Arrow jemalloc-based allocator" | ||
| ON) | ||
| OFF) |
There was a problem hiding this comment.
Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.
Still if we decide to make it opt-in, we should have it enabled in Travis.
|
github appears to have had some kind of push hiccup (I had to force push this commit to my master branch, oddly), so this commit hasn't sync'd yet |
…le concatenation. This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#274 from wesm/ARROW-438 and squashes the following commits: 1f39568 [Wes McKinney] py3 compatibility 2e76c5e [Wes McKinney] Implement arrow::ConcatenateTables and Python wrapper. Fix bug in ChunkedArray::Equals f3cb170 [Wes McKinney] Fix Cython compilation, verify pyarrow.Table.from_batches still works af28755 [Wes McKinney] Implement Table::FromRecordBatches Change-Id: I948b61d848c178edefad63465a74d9c303ad1f18
|
against my better judgment I did a no-op amend and force-pushed apache/master but no dice. we may have to bug ASF infra |
|
Looks like the commit sync'd. Didn't close the PR though, weird. |
This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.