WIP ARROW-2193: [C++] Do not depend on Boost libraries at runtime in plasma_store#1711
WIP ARROW-2193: [C++] Do not depend on Boost libraries at runtime in plasma_store#1711wesm wants to merge 1590 commits intoapache:masterfrom
Conversation
Change-Id: I2a7909e2f0fa87780197270982ef941d89834cca
Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1420 from pcmoritz/revert-to-pickle-arg and squashes the following commits: bfef3ae [Philipp Moritz] fix windows test c156653 [Philipp Moritz] fix remote serialization test on windows 3f58d0d [Philipp Moritz] fix windows 6a2a83d [Philipp Moritz] add regression test 3eb9325 [Philipp Moritz] fix 518fb7d [Philipp Moritz] fix b488586 [Philipp Moritz] revert to pickle=True argument for serialization
The option is used in building deb package. `arrow/gpu/cuda_version.h` exists in build directory not source directory. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1429 from kou/glib-fix-build-error-with-arrow-cpp-build-dir and squashes the following commits: 879d6bc [Kouhei Sutou] [GLib] Fix build error with --with-arrow-cpp-build-dir
This is not a 0.8.0 broker. If 0.8.0 RC2 is dropped, I hope that 0.8.0 includes this. If RC2 has no problem, I hope that 0.9.0 includes this. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1424 from kou/glib-add-timestamp-data-type-get-unit and squashes the following commits: a88771b [Kouhei Sutou] [GLib] Add garrow_timestamp_data_type_get_unit()
Add JSON reader, as well as `js/bin/integration.js` script for running integration test validation Author: Paul Taylor <paul.e.taylor@me.com> Author: Brian Hulette <hulettbh@gmail.com> Author: Brian Hulette <brian.hulette@ccri.com> Closes apache#1343 from TheNeuralBit/json-reader and squashes the following commits: bd6e80c [Paul Taylor] print correct error messages f1a51bd [Paul Taylor] update example html file for new API bb0059b [Paul Taylor] fix off by one error reading buffers from metadata v3 arrows 6da297c [Paul Taylor] update CI and JS integration scripts to invoke jest integration tests 03d82bd [Paul Taylor] add integration tests to gulp test task 937dbf8 [Paul Taylor] split out integration and unit tests e0354a7 [Paul Taylor] quote enum keys to save from mangling, add JSON customMetadata map c49f43a [Paul Taylor] use string indexers to protect JSON fields from closure compiler's mangler, fix es5 umd build 97f8e5e [Paul Taylor] really flatten buffers from json 7b6ea0a [Paul Taylor] fix a few json reader typos fa352ea [Paul Taylor] update tests f81dcb9 [Paul Taylor] move arrow2csv into src so it's distributed in the npm packages b5f1470 [Paul Taylor] add Arrow types AST, refactor buffer + json reader to emit type AST nodes 85bf03c [Brian Hulette] linter fixes 793f9e5 [Brian Hulette] only use json-bignum in bin/integration eaa5de4 [Brian Hulette] add dictionary-encoded vectors 12f99de [Brian Hulette] Add JSON support for Date/Time/Timestamp vectors 5349080 [Brian Hulette] move test data creation after integration_test.py 68c2349 [Paul Taylor] update npm script name in integration runner f50356e [Paul Taylor] run the js build before integration.py dcc85f9 [Brian Hulette] linter fixes 313cd58 [Brian Hulette] Add int-test to test-task 86b53b4 [Brian Hulette] Use Int128.fromString in JSON reader 148e997 [Brian Hulette] cleanup a2befbb [Brian Hulette] Switch endianness of Int64/128 a1ea88f [Brian Hulette] Now uses Uint32 for all internal buffers 02a7838 [Brian Hulette] WIP Int64, Int128 645b844 [Brian Hulette] JS integration script uses new Table.from for JSON c1f3f6a [Paul Taylor] move createTypedArray and createValidityArray to VectorReaderContext 1191a27 [Paul Taylor] refactor `Table.from()` to accept a JSON object or string 02ea8a6 [Paul Taylor] refactor traits to be compatible with closure compiler's full ES6 -> ES5 01de162 [Paul Taylor] move generated format to format/fb folder, fix closure compiler es5 build ad41741 [Brian Hulette] Fix bug with zero-length vectors b17367c [Brian Hulette] Add list,struct to JSON reader e3d6d62 [Brian Hulette] linter fixes 1e64707 [Brian Hulette] Add JS integration script and integration_test.py JS runner 7e33b1c [Brian Hulette] Add JSON reader
cc @wesm , @jacques-n , @BryanCutler , @icexelloss A small post on recent improvements in JAVA vectors. Suggestions are welcome :) Author: siddharth <siddharth@dremio.com> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1419 from siddharthteotia/ARROW-1922 and squashes the following commits: ebdd986 [Wes McKinney] Minor tweaks to post, add Dremio link eaedd87 [siddharth] review comments 5705019 [siddharth] correct typo c2af13c [siddharth] ARROW-1922: Blog post on JAVA vector changes
…checksum links, add verification instructions Website fixes per ASF policies and feedback in ARROW-1935, ARROW-1936 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1434 from wesm/ARROW-1935 and squashes the following commits: 41f128e [Wes McKinney] Remove link to nightly builds. Fix signature / checksum links, add KEYS file, verification instructions
…ed table This requires PARQUET-1092 apache/parquet-cpp#426 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1425 from wesm/ARROW-232 and squashes the following commits: da8d999 [Wes McKinney] Add unit test to validate PARQUET-1092
I took a hack at this after poring through the changelog, others please let me know if you'd like to add or change anything. I need to incorporate Sidd's blog post and add a link to that here. I can publish all of this sometime tomorrow morning New York time and post to social media etc. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1432 from wesm/ARROW-1934 and squashes the following commits: 4aa6bc6 [Wes McKinney] Tweaks for publication da9e65d [Wes McKinney] Start drafting 0.8.0 release blog post
cc @wesm Author: siddharth <siddharth@dremio.com> Closes apache#1436 from siddharthteotia/ARROW-1939 and squashes the following commits: 31f1be1 [siddharth] ARROW-1939: Correct links in release blog post
…or now It's reasonably harmless to suppress these warnings for the time being. When we upgrade to a new release of googletest, we can remove this again Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1433 from wesm/ARROW-1931 and squashes the following commits: 09c3722 [Wes McKinney] Add tr1 define to CMAKE_CXX_FLAGS 6fe636e [Wes McKinney] Rearrange appveyor build jobs for faster feedback 6ace398 [Wes McKinney] Use CXX_COMMON_FLAGS, all in one place b786a40 [Wes McKinney] Silence std::tr1 tuple warning everywhere 549e083 [Wes McKinney] Silence std::tr1 namespace warning only when building gtest 13a77e6 [Wes McKinney] Silence tr1 deprecation warning in MSVC 2017 ccf1318 [Wes McKinney] Add /bigobj flag, suppress C4996 deprecation warning for now
Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1440 from pcmoritz/findarrow-libs-fix and squashes the following commits: bb161b4 [Philipp Moritz] fix static lib path in FindArrow
The current implementation of setInitialCapacity() uses a factor of 5 for every level we go into list: So if the schema is LIST (LIST (LIST (LIST (LIST (LIST (LIST (BIGINT)))))) and we start with an initial capacity of 128, we end up throwing OversizedAllocationException from the BigIntVector because at every level we increased the capacity by 5 and by the time we reached inner scalar that actually stores the data, we were well over max size limit per vector (1MB). We saw this problem downstream when we failed to read deeply nested JSON data. The potential fix is to use the factor of 5 only when we are down to the leaf vector. As the depth increases and we are still working with complex/list, we don't use the factor of 5. cc @jacques-n , @BryanCutler , @icexelloss Author: siddharth <siddharth@dremio.com> Closes apache#1439 from siddharthteotia/ARROW-1943 and squashes the following commits: d0adbad [siddharth] unit tests e2f21a8 [siddharth] fix imports d103436 [siddharth] ARROW-1943: handle setInitialCapacity for deeply nested lists
Author: Robert Nishihara <robertnishihara@gmail.com> Closes apache#1451 from robertnishihara/numthreads and squashes the following commits: 5e2c7ee [Robert Nishihara] Fix tests. 55cb8ac [Robert Nishihara] Revert old change 0903726 [Robert Nishihara] Move memcopy_threads from serialization context to put. 9281de1 [Robert Nishihara] Expose memcopy threads to serialization context.
…er to handle all non-null Need to properly set the ListVector validity buffer for the case when the field has all non-nulls. This is done already in `BitVectorHelper.loadValidityBuffer`, so just need to build the buffer with a call to that function. Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#1447 from BryanCutler/java-ListVector-non-null-validity-buffer-ARROW-1948 and squashes the following commits: 0d82345 [Bryan Cutler] used BitVectorHelper to properly set the validity buffer
this is just a small fix in doxygen documentation of the c++ api Author: Viktor Gal <viktor.gal@maeth.com> Closes apache#1442 from vigsterkr/doxygen_fix and squashes the following commits: 3557aef [Viktor Gal] fix doxygen example in array.h
this is necessary until gulp publishes 4.0.0 builds to npm Author: Paul Taylor <paul.e.taylor@me.com> Closes apache#1453 from trxcllnt/fix-js-gulp-build and squashes the following commits: 0dddbcc [Paul Taylor] Merge branch 'master' into fix-js-gulp-build dc329fa [Paul Taylor] Merge branch 'master' into fix-js-gulp-build d9c1b0c [Paul Taylor] set gulp dependency to specific commit
- Create now takes in a pointer to a shared pointer of Buffer and returns a MutableBuffer. - Object Buffers data and metadata are pointers to shared pointers of Buffer. Author: Philipp Moritz <pcmoritz@gmail.com> Author: William Paul <wapaul1@berkeley.edu> Closes apache#1444 from Wapaul1/plasma_buffer_api and squashes the following commits: 7fe1cee [Philipp Moritz] fix size of MutableBuffer returned by plasma::Create aeed751 [Philipp Moritz] more linting b3274e0 [Philipp Moritz] fix 463dbeb [Philipp Moritz] fix plasma python extension a055fa8 [Philipp Moritz] fix linting fc62dda [William Paul] Added metadata buffer 4d8cbb8 [William Paul] Create and Get use Buffers now
…data We recently moved Dremio to LE Decimal format (similar to Arrow). As part of that we introduced some APIs in decimal vector which take a big endian data and swap the bytes while writing into the ArrowBuf of decimal vector. The advantage of these APIs is that caller would not have to allocate an additional memory and write( and read) source big endian twice for swapping into new memory and using that to write into the vector. We can directly swap bytes while writing into the vector – just read once and swap while writing. cc @jacques-n , @BryanCutler , @icexelloss Author: siddharth <siddharth@dremio.com> Closes apache#1443 from siddharthteotia/ARROW-1946 and squashes the following commits: 7805b62 [siddharth] unit tests c89efbf [siddharth] ARROW-1946: Add APIs to decimal vector for writing big endian data
This closes [ARROW-1941](https://issues.apache.org/jira/browse/ARROW-1941). Author: Licht-T <licht-t@outlook.jp> Closes apache#1449 from Licht-T/fix-empty-list-roundtrip and squashes the following commits: 165dc6f [Licht-T] TST: Add test for the empty list roundtrip 0ddfd87 [Licht-T] BUG: Fix empty list roundtrip
This adds support for reading ORC files in the C++ library, as well as python bindings for this functionality. Author: Jim Crist <jiminy.crist@gmail.com> Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1418 from jcrist/orc-adapter and squashes the following commits: 7e0400e [Jim Crist] lint d6d32b5 [Uwe L. Korn] Hide symbols introduced by orc static lib a296640 [Jim Crist] Tweak error message f45ac3d [Jim Crist] Read reads as a table 57bc63d [Jim Crist] Use `vector<int>` instead of `list<uint64_t>` 1d53927 [Jim Crist] date32 instead of date64 4b7a3a5 [Jim Crist] Add brief docs e783544 [Jim Crist] More fixups 33f5b10 [Jim Crist] Turn off ARROW_ORC on windows 86a2355 [Jim Crist] Cleanups 2cfdd92 [Jim Crist] Fix build when dependencies aren't already installed 876c3a3 [Jim Crist] Use fPIC on protobuf as well f4a29f8 [Jim Crist] Ensure -fPIC on orc build 7cf1659 [Jim Crist] Build python orc support on travis 2adf938 [Jim Crist] Add ORC support 5c79104 [Jim Crist] Add cmake support for liborc
These changes were necessary to compile on Windows with "-DARROW_BUILD_BENCHMARKS=ON". I added Shwlapi based on google/benchmark#202. Author: Adam Seibert <seibs@users.noreply.github.com> Closes apache#1406 from seibs/ARROW-1909 and squashes the following commits: 98602cd [Adam Seibert] ARROW-1909: [C++] Enables building with benchmarks on windows
Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1421 from pcmoritz/plasma-object-ids and squashes the following commits: fc77908 [Philipp Moritz] fixes 9f613c0 [Philipp Moritz] fix windows test f1d7ca0 [Philipp Moritz] fix linting 6be7f4a [Philipp Moritz] Test that object ids are 20 bytes
Adding `reset()` to the ValueVector interface and implementing where it is not done already. Removing unused abstract class BaseDataValueVector that is not used anymore by the UnionVector. Expanded reset tests to check that valueCount is 0, and buffers have same capacity and zeroed out. Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#1455 from BryanCutler/java-reset-ValueVector-ARROW-1962 and squashes the following commits: da994e1 [Bryan Cutler] typo a52e7db [Bryan Cutler] expanded reset documentations 1526a83 [Bryan Cutler] improved vector reset testing a251d10 [Bryan Cutler] reset should zero data buffer and set value count to 0 bf2a16a [Bryan Cutler] add reset to NullableMapVector to zero validity buffer 7fbde5b [Bryan Cutler] need to zero out vector buffers when reset b59addf [Bryan Cutler] adding reset to ValueVector interface, removing BaseDataValueVector
… garrow_chunked_array_get_value_type() Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1458 from kou/glib-add-chunked-array-get-value-type and squashes the following commits: 4d99a07 [Kouhei Sutou] [GLib] Add garrow_chunked_array_get_value_data_type() and garrow_chunked_array_get_value_type()
Fix conversion of datetimetz row index for non-UTC time zones in to_pandas. Author: Albert Shieh <adshieh@gmail.com> Closes apache#1454 from adshieh/master and squashes the following commits: 6f41302 [Albert Shieh] Fix pandas conversion for datetimetz row index.
Author: Robert Nishihara <robertnishihara@gmail.com> Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1463 from robertnishihara/segfaultfix and squashes the following commits: ec8a6c5 [Robert Nishihara] Add comment. 6222340 [Philipp Moritz] fix tests, linting, add license 3e969db [Robert Nishihara] Simplify tests. 8aa3fca [Philipp Moritz] add regression test bfa0851 [Robert Nishihara] Import pyarrow in DeserializeObject.
- Turns off building optional ORC extension by default - Fixes travis builds to turn on ORC extension for a few branches - Adds trivial import test to python build - Adds documentation on how to build optional ORC extension Author: Jim Crist <jiminy.crist@gmail.com> Closes apache#1457 from jcrist/orc-off-by-default and squashes the following commits: fc9898d [Jim Crist] Document how to build ORC integration 950ae38 [Jim Crist] ORC integration is off by default
Change-Id: I82e6b51fa5129473815e19fd6d03bdaaef7a88ff
cpp/src/plasma/CMakeLists.txt
Outdated
There was a problem hiding this comment.
I think this block could benefit from a short comment explaining what the purpose is.
There was a problem hiding this comment.
This looks like a rebase artifact
There was a problem hiding this comment.
Yes, I have added this in another commit (without it, the manylinux1 wheel would fail).
This change introduces GBytes constructors to GArrowBuffer and GArrowMutableBuffer. GBytes has reference count feature. It means that we can share the same memory safely. We can't share the same memory safely with the current raw guint8 constructor. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1701 from kou/glib-buffer-accept-gbytes and squashes the following commits: 78de627 <Kouhei Sutou> Improve memory management for GArrowBuffer data
…g-config pkg-config doesn't show -I... and -L... flags when ... is the system default path (i.e. /usr). If Arrow C++ is installed in /usr, we couldn't detect include path and library path for Arrow C++. It's caused when we install Arrow C++ with .rpm and .deb packages. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1721 from kou/cpp-find-arrow and squashes the following commits: 4911c8f <Kouhei Sutou> Add Debian based system support 0a2d9c9 <Kouhei Sutou> Support Arrow C++ installed in /usr detection by pkg-config
Author: Antoine Pitrou <antoine@python.org> Closes apache#1684 from pitrou/ARROW-2238-cmake-clcache and squashes the following commits: 8539a0e <Antoine Pitrou> ARROW-2238: Detect and use clcache in cmake configuration
Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1719 from xhochy/ARROW-2280 and squashes the following commits: 82b50a7 <Uwe L. Korn> ARROW-2280: Return the offset for the buffers in pyarrow.Array
Recommend Ninja and clcache. Author: Antoine Pitrou <antoine@python.org> Closes apache#1722 from pitrou/ARROW-2239-windows-build-docs and squashes the following commits: a0e0288 <Antoine Pitrou> ARROW-2239: Update Windows build docs
…t_cython.py Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1730 from wesm/ARROW-2263 and squashes the following commits: 6b3f827 <Wes McKinney> Prepend local pyarrow/ path to PYTHONPATH in test_cython.py
…ions between pd.DataFrame and pa.Table Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#1728 from cpcloud/ARROW-1940 and squashes the following commits: 2e5b7af <Phillip Cloud> ARROW-1940: Extra metadata gets added after multiple conversions between pd.DataFrame and pa.Table
They are useful to detect numeric data types. Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1726 from kou/glib-numeric-data-type and squashes the following commits: 89a5a8a <Kouhei Sutou> Add Numeric, Integer, FloatingPoint data types
Author: Antoine Pitrou <antoine@python.org> Closes apache#1714 from pitrou/ARROW-2270-pyforeignbuffer and squashes the following commits: 51f2d85 <Antoine Pitrou> ARROW-2270: Fix lifetime of ForeignBuffer base object
…arrow.Array for now Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1729 from wesm/ARROW-2150 and squashes the following commits: f0ddcf5 <Wes McKinney> Raise NotImplementedError when comparing with pyarrow.Array for now
Just a trivial fix. stderr is captured by py.test, not by the subprocess call. Author: Antoine Pitrou <antoine@python.org> Closes apache#1724 from pitrou/ARROW-2284 and squashes the following commits: 46a692c <Antoine Pitrou> ARROW-2284: Fix error display on test_plasma error
Also disambiguate the Tensor API on this front. Author: Antoine Pitrou <antoine@python.org> Closes apache#1717 from pitrou/ARROW-2275-bad-mutable-data and squashes the following commits: fabd7b9 <Antoine Pitrou> ARROW-2275: Guard against bad use of Buffer.mutable_data()
…tion scripts Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1731 from wesm/ARROW-2268 and squashes the following commits: 42768ef <Wes McKinney> Drop usage of md5 checksums for source releases, verification scripts
cc @mitar Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1718 from xhochy/ARROW-2269 and squashes the following commits: edc1f3d <Uwe L. Korn> ARROW-2269: Make boost namespace selectable in wheels
…se existing process Author: Mitar <mitar.git@tnode.com> Closes apache#1705 from mitar/ARROW-2250 and squashes the following commits: 4e71d44 <Mitar> ARROW-2250: Do not create a subprocess for plasma but just use existing process.
Adds not, gt, lt, and neq Author: Brian Hulette <brian.hulette@ccri.com> Closes apache#1683 from TheNeuralBit/js-more-predicates and squashes the following commits: 707de82 <Brian Hulette> Use two letter names 56ecea3 <Brian Hulette> export packBools, import compiled code in vector-test 32b26e3 <Brian Hulette> lint 5738327 <Brian Hulette> add externs 84895a0 <Brian Hulette> Add not, lt, gt, neq
Change-Id: I1b2fb90419df18a52e9d302ee0225871b1294903
Change-Id: Iee935158133c653a4c2a663ba7693aa83406b5e3
Change-Id: I1261970faf5d150d43b903d3b741de6b0dabd0ea
Change-Id: Ib2363a27779a3559e3f8f57f01ee02b8c701e66a
|
I am not sure why libboost_regex is still a runtime dependency, even with |
|
@xhochy It seems that |
|
Should probably close this PR as this issue has been fixed by removing regex_boost usage. |
This is sort of a hack; I wasn't sure the way to deal with this more generally. Unfortunately, this only gets rid of the boost_system and boost_filesystem runtime dependencies. boost_regex still has a transitive dependency somehow