ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation. by wesm · Pull Request #274 · apache/arrow

wesm · 2017-01-07T21:14:04Z

This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.

Change-Id: I740bc111d0a3e81f8a5076a6607926a8d164ddc1

Change-Id: I93e1744b45f0d3dce65ac11a43bc7726ce3209ce

…nkedArray::Equals Change-Id: I15e4caf37d1eb3ffe9479d95c9119fb35b0fefcc

wesm · 2017-01-07T21:14:26Z

cpp/src/arrow/column.cc

      other_start_idx = 0;
+    } else {
+      other_start_idx += common_length;
    }


See bug fix here

wesm · 2017-01-07T21:14:52Z

cpp/CMakeLists.txt

  option(ARROW_JEMALLOC
    "Build the Arrow jemalloc-based allocator"
-    ON)
+    OFF)


I didn't think this would be controversial, but let's let people opt in to this for now

Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.

Still if we decide to make it opt-in, we should have it enabled in Travis.

We can turn it back on by default as soon as we have the external project set up to auto build.

I can add an external project but I skipped that as it is packaged nearly everywhere (maybe not on windows but Homebrew and all linux distros come with a decently recent version).

That's fair enough. We eventually need this to all build out of the box with Visual Studio -- it might be useful to provide an option to statically link jemalloc to ensure reliable cross-platform behavior (on Linux, at least)

Will take care of that: https://issues.apache.org/jira/browse/ARROW-466

Change-Id: I8b0f72c562f075a481e2795914b05bf2427d5080

xhochy · 2017-01-08T09:50:23Z

cpp/CMakeLists.txt

  option(ARROW_JEMALLOC
    "Build the Arrow jemalloc-based allocator"
-    ON)
+    OFF)


Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.

Still if we decide to make it opt-in, we should have it enabled in Travis.

wesm · 2017-01-08T15:54:44Z

github appears to have had some kind of push hiccup (I had to force push this commit to my master branch, oddly), so this commit hasn't sync'd yet

…le concatenation. This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#274 from wesm/ARROW-438 and squashes the following commits: 1f39568 [Wes McKinney] py3 compatibility 2e76c5e [Wes McKinney] Implement arrow::ConcatenateTables and Python wrapper. Fix bug in ChunkedArray::Equals f3cb170 [Wes McKinney] Fix Cython compilation, verify pyarrow.Table.from_batches still works af28755 [Wes McKinney] Implement Table::FromRecordBatches Change-Id: I948b61d848c178edefad63465a74d9c303ad1f18

wesm · 2017-01-08T15:57:13Z

against my better judgment I did a no-op amend and force-pushed apache/master but no dice. we may have to bug ASF infra

wesm · 2017-01-08T16:00:25Z

see https://issues.apache.org/jira/browse/INFRA-13248

wesm · 2017-01-08T18:29:49Z

Looks like the commit sync'd. Didn't close the PR though, weird.

wesm added 3 commits January 6, 2017 19:02

Implement Table::FromRecordBatches

af28755

Change-Id: I740bc111d0a3e81f8a5076a6607926a8d164ddc1

Fix Cython compilation, verify pyarrow.Table.from_batches still works

f3cb170

Change-Id: I93e1744b45f0d3dce65ac11a43bc7726ce3209ce

Implement arrow::ConcatenateTables and Python wrapper. Fix bug in Chu…

2e76c5e

…nkedArray::Equals Change-Id: I15e4caf37d1eb3ffe9479d95c9119fb35b0fefcc

wesm commented Jan 7, 2017

View reviewed changes

py3 compatibility

1f39568

Change-Id: I8b0f72c562f075a481e2795914b05bf2427d5080

xhochy approved these changes Jan 8, 2017

View reviewed changes

wesm closed this Jan 8, 2017

wesm deleted the ARROW-438 branch January 8, 2017 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation.#274

ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation.#274
wesm wants to merge 4 commits intoapache:masterfrom
wesm:ARROW-438

wesm commented Jan 7, 2017

Uh oh!

wesm Jan 7, 2017

Uh oh!

wesm Jan 7, 2017

Uh oh!

xhochy Jan 8, 2017

Uh oh!

wesm Jan 8, 2017

Uh oh!

xhochy Jan 8, 2017 •

edited

Loading

Uh oh!

wesm Jan 8, 2017

Uh oh!

xhochy Jan 9, 2017

Uh oh!

xhochy Jan 8, 2017

Uh oh!

wesm commented Jan 8, 2017

Uh oh!

wesm commented Jan 8, 2017 •

edited

Loading

Uh oh!

wesm commented Jan 8, 2017

Uh oh!

wesm commented Jan 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wesm commented Jan 7, 2017

Uh oh!

wesm Jan 7, 2017

Choose a reason for hiding this comment

Uh oh!

wesm Jan 7, 2017

Choose a reason for hiding this comment

Uh oh!

xhochy Jan 8, 2017

Choose a reason for hiding this comment

Uh oh!

wesm Jan 8, 2017

Choose a reason for hiding this comment

Uh oh!

xhochy Jan 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesm Jan 8, 2017

Choose a reason for hiding this comment

Uh oh!

xhochy Jan 9, 2017

Choose a reason for hiding this comment

Uh oh!

xhochy Jan 8, 2017

Choose a reason for hiding this comment

Uh oh!

wesm commented Jan 8, 2017

Uh oh!

wesm commented Jan 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wesm commented Jan 8, 2017

Uh oh!

wesm commented Jan 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xhochy Jan 8, 2017 •

edited

Loading

wesm commented Jan 8, 2017 •

edited

Loading