Skip to content

ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation.#274

Closed
wesm wants to merge 4 commits intoapache:masterfrom
wesm:ARROW-438
Closed

ARROW-438: [C++/Python] Implement zero-data-copy record batch and table concatenation.#274
wesm wants to merge 4 commits intoapache:masterfrom
wesm:ARROW-438

Conversation

@wesm
Copy link
Copy Markdown
Member

@wesm wesm commented Jan 7, 2017

This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.

wesm added 3 commits January 6, 2017 19:02
Change-Id: I740bc111d0a3e81f8a5076a6607926a8d164ddc1
Change-Id: I93e1744b45f0d3dce65ac11a43bc7726ce3209ce
…nkedArray::Equals

Change-Id: I15e4caf37d1eb3ffe9479d95c9119fb35b0fefcc
other_start_idx = 0;
} else {
other_start_idx += common_length;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See bug fix here

option(ARROW_JEMALLOC
"Build the Arrow jemalloc-based allocator"
ON)
OFF)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think this would be controversial, but let's let people opt in to this for now

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.

Still if we decide to make it opt-in, we should have it enabled in Travis.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can turn it back on by default as soon as we have the external project set up to auto build.

Copy link
Copy Markdown
Member

@xhochy xhochy Jan 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add an external project but I skipped that as it is packaged nearly everywhere (maybe not on windows but Homebrew and all linux distros come with a decently recent version).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair enough. We eventually need this to all build out of the box with Visual Studio -- it might be useful to provide an option to statically link jemalloc to ensure reliable cross-platform behavior (on Linux, at least)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change-Id: I8b0f72c562f075a481e2795914b05bf2427d5080
option(ARROW_JEMALLOC
"Build the Arrow jemalloc-based allocator"
ON)
OFF)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there currently a problem with jemalloc somewhere? Given that it should greatly improve performance, I would really like to build it by default so a user can opt-in at runtime rather than compile time.

Still if we decide to make it opt-in, we should have it enabled in Travis.

@wesm
Copy link
Copy Markdown
Member Author

wesm commented Jan 8, 2017

github appears to have had some kind of push hiccup (I had to force push this commit to my master branch, oddly), so this commit hasn't sync'd yet

wesm added a commit to wesm/arrow that referenced this pull request Jan 8, 2017
…le concatenation.

This also fixes a bug in ChunkedArray::Equals. This is caught by the Python test suite but would benefit from more C++ unit tests.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#274 from wesm/ARROW-438 and squashes the following commits:

1f39568 [Wes McKinney] py3 compatibility
2e76c5e [Wes McKinney] Implement arrow::ConcatenateTables and Python wrapper. Fix bug in ChunkedArray::Equals
f3cb170 [Wes McKinney] Fix Cython compilation, verify pyarrow.Table.from_batches still works
af28755 [Wes McKinney] Implement Table::FromRecordBatches

Change-Id: I948b61d848c178edefad63465a74d9c303ad1f18
@wesm
Copy link
Copy Markdown
Member Author

wesm commented Jan 8, 2017

against my better judgment I did a no-op amend and force-pushed apache/master but no dice. we may have to bug ASF infra

@wesm
Copy link
Copy Markdown
Member Author

wesm commented Jan 8, 2017

@wesm
Copy link
Copy Markdown
Member Author

wesm commented Jan 8, 2017

Looks like the commit sync'd. Didn't close the PR though, weird.

@wesm wesm closed this Jan 8, 2017
@wesm wesm deleted the ARROW-438 branch January 8, 2017 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants