ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation. #274

Closed
wants to merge 4 commits into
from

Projects

None yet

2 participants

@wesm
Member
wesm commented Jan 7, 2017

This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.

wesm added some commits Jan 7, 2017
@wesm wesm Implement Table::FromRecordBatches
Change-Id: I740bc111d0a3e81f8a5076a6607926a8d164ddc1
af28755
@wesm wesm Fix Cython compilation, verify pyarrow.Table.from_batches still works
Change-Id: I93e1744b45f0d3dce65ac11a43bc7726ce3209ce
f3cb170
@wesm wesm Implement arrow::ConcatenateTables and Python wrapper. Fix bug in Chu…
…nkedArray::Equals

Change-Id: I15e4caf37d1eb3ffe9479d95c9119fb35b0fefcc
2e76c5e
if (other_start_idx + common_length == other_array->length()) {
other_chunk_idx++;
other_start_idx = 0;
+ } else {
+ other_start_idx += common_length;
}
@wesm
wesm Jan 7, 2017 Member

See bug fix here

@@ -76,7 +76,7 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")
option(ARROW_JEMALLOC
"Build the Arrow jemalloc-based allocator"
- ON)
+ OFF)
@wesm
wesm Jan 7, 2017 Member

I didn't think this would be controversial, but let's let people opt in to this for now

@xhochy
xhochy Jan 8, 2017 Member

Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.

Still if we decide to make it opt-in, we should have it enabled in Travis.

@wesm
wesm Jan 8, 2017 Member

We can turn it back on by default as soon as we have the external project set up to auto build.

@xhochy
xhochy Jan 8, 2017 edited Member

I can add an external project but I skipped that as it is packaged nearly everywhere (maybe not on windows but Homebrew and all linux distros come with a decently recent version).

@wesm
wesm Jan 8, 2017 Member

That's fair enough. We eventually need this to all build out of the box with Visual Studio -- it might be useful to provide an option to statically link jemalloc to ensure reliable cross-platform behavior (on Linux, at least)

@wesm wesm py3 compatibility
Change-Id: I8b0f72c562f075a481e2795914b05bf2427d5080
1f39568
@xhochy
xhochy approved these changes Jan 8, 2017 View changes
@@ -76,7 +76,7 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")
option(ARROW_JEMALLOC
"Build the Arrow jemalloc-based allocator"
- ON)
+ OFF)
@xhochy
xhochy Jan 8, 2017 Member

Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.

Still if we decide to make it opt-in, we should have it enabled in Travis.

@wesm wesm added a commit to wesm/arrow that referenced this pull request Jan 8, 2017
@wesm wesm ARROW-438: [C++/Python] Implement zero-data-copy record batch and tab…
…le concatenation.

This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #274 from wesm/ARROW-438 and squashes the following commits:

1f39568 [Wes McKinney] py3 compatibility
2e76c5e [Wes McKinney] Implement arrow::ConcatenateTables and Python wrapper. Fix bug in ChunkedArray::Equals
f3cb170 [Wes McKinney] Fix Cython compilation, verify pyarrow.Table.from_batches still works
af28755 [Wes McKinney] Implement Table::FromRecordBatches
6526a52
@wesm
Member
wesm commented Jan 8, 2017

github appears to have had some kind of push hiccup (I had to force push this commit to my master branch, oddly), so this commit hasn't sync'd yet

@wesm wesm added a commit to wesm/arrow that referenced this pull request Jan 8, 2017
@wesm wesm ARROW-438: [C++/Python] Implement zero-data-copy record batch and tab…
…le concatenation.

This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #274 from wesm/ARROW-438 and squashes the following commits:

1f39568 [Wes McKinney] py3 compatibility
2e76c5e [Wes McKinney] Implement arrow::ConcatenateTables and Python wrapper. Fix bug in ChunkedArray::Equals
f3cb170 [Wes McKinney] Fix Cython compilation, verify pyarrow.Table.from_batches still works
af28755 [Wes McKinney] Implement Table::FromRecordBatches

Change-Id: I948b61d848c178edefad63465a74d9c303ad1f18
3195948
@wesm
Member
wesm commented Jan 8, 2017 edited

against my better judgment I did a no-op amend and force-pushed apache/master but no dice. we may have to bug ASF infra

@wesm
Member
wesm commented Jan 8, 2017

Looks like the commit sync'd. Didn't close the PR though, weird.

@wesm wesm closed this Jan 8, 2017
@wesm wesm deleted the wesm:ARROW-438 branch Jan 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment