Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Aug 16, 2018

No description provided.

@pitrou
Copy link
Member Author

pitrou commented Aug 16, 2018

Hmm, it appears the build failed. Frankly I don't really know what this is for, but having a reference to PARQUET_BUILD_SHARED in the Arrow cmake files sounds intuitively wrong. @wesm ?

@wesm
Copy link
Member

wesm commented Aug 16, 2018

Yes, this looks like a copy-paste error. I'd have to look back through commit logs but there was an issue with CMake's behavior around symbol visibility flags. I can try to take a look today at this if it helps

@pitrou
Copy link
Member Author

pitrou commented Aug 16, 2018

It would be nice if you could take a look, though I guess it's not terribly important. I just wanted to diagnose the build warnings mentioned here:
apache/parquet-cpp#487 (comment)

@wesm
Copy link
Member

wesm commented Aug 16, 2018

Got it, I will take a look

@wesm
Copy link
Member

wesm commented Aug 20, 2018

I can look into this this week

@wesm
Copy link
Member

wesm commented Nov 25, 2018

I'll take over this one. Turns out we haven't been consistently setting default visibility flags, and so some changes will be needed where we have been relying on symbols in shared libraries in some cases that lack ARROW_EXPORT

@wesm
Copy link
Member

wesm commented Nov 25, 2018

See commentary in apache/kudu@bd549e1

@wesm
Copy link
Member

wesm commented Nov 25, 2018

Now with -fvisibility=hidden gcc doesn't like the way that we are exporting symbols from our extern templates in libparquet

src/parquet/CMakeFiles/parquet-column_reader-test.dir/column_reader-test.cc.o: In function `parquet::test::TestPrimitiveReader_TestInt32FlatRequiredSkip_Test::TestBody()':
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:282: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::Skip(long)'
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:286: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::ReadBatch(long, short*, short*, int*, long*)'
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:293: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::Skip(long)'
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:297: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::ReadBatch(long, short*, short*, int*, long*)'
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:307: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::Skip(long)'
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:311: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::ReadBatch(long, short*, short*, int*, long*)'
src/parquet/CMakeFiles/parquet-column_reader-test.dir/column_reader-test.cc.o: In function `parquet::test::TestPrimitiveReader::CheckResults()':
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:102: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::ReadBatch(long, short*, short*, int*, long*)'
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:119: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::ReadBatch(long, short*, short*, int*, long*)'
src/parquet/CMakeFiles/parquet-column_reader-test.dir/column_reader-test.cc.o: In function `parquet::test::TestPrimitiveReader::CheckResultsSpaced()':
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:146: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::ReadBatchSpaced(long, short*, short*, int*, unsigned char*, long, long*, long*, long*)'
/home/wesm/code/arrow/cpp/build/../src/parquet/column_reader-test.cc:168: undefined reference to `parquet::TypedColumnReader<parquet::DataType<(parquet::Type::type)1> >::ReadBatchSpaced(long, short*, short*, int*, unsigned char*, long, long*, long*, long*)'
collect2: error: ld returned 1 exit status

I fiddled with this a little bit but don't know why the symbols are not being exported. As an aside, the mess of visibility modifiers on these templates has always struck me as a little bit fishy (I had to dig around in Chromium's codebase to learn the incantations to get things to work originally on gcc / clang / MSVC)

@wesm
Copy link
Member

wesm commented Nov 25, 2018

I might like to refactor things to make the extern templates unnecessary, which would be a good bit of work but perhaps for the best. This would be without altering the public API. I'll leave this issue open for a while before embarking on this

@wesm wesm changed the title [C++] Fix typo in CMakeLists.txt WIP ARROW-3873: [C++] Fix typo in CMakeLists.txt Nov 25, 2018
wesm and others added 19 commits January 31, 2019 11:04
This enables Arrow Flight to build gRPC successfully from ExternalProject both with all dependencies being built from EP or from provided packages via _HOME variables.

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3527 from pitrou/grpc-vendored-fixes and squashes the following commits:

fb8fdf3 <Wes McKinney> No more verbose build in CI
d84b919 <Wes McKinney> Restore all builds
c778bec <Wes McKinney> Revert vendored Boost changes
65ac53d <Wes McKinney> Revert Boost link order
d5b8f54 <Wes McKinney> Try removing SYSTEM
7563955 <Wes McKinney> Verbose build again
1880bfa <Wes McKinney> Fix Boost include dir
dd4f1eb <Wes McKinney> debugging
369f0e5 <Wes McKinney> Use verbose makefile for debugging
accaeb0 <Wes McKinney> verbose windows build for debugging
3181813 <Wes McKinney> Use BEFORE SYSTEM with vendored Boost headers
0a2f967 <Wes McKinney> More tweaks / fixed
f2db99a <Wes McKinney> Change Boost library include order
ad994b1 <Wes McKinney> Fix brotli library locations after refactor
7340545 <Wes McKinney> Fix Brotli headers copy step
3140b3c <Wes McKinney> Quote thirdparty install prefixes, verbose thirdparty build in MSVC
024a2dc <Wes McKinney> Fixes for gRPC vendored builds
Author: Benjamin Kietzman <bengilgit@gmail.com>

Closes apache#3528 from bkietz/ARROW-4430-test-typedbufferbuilder-append-copies and squashes the following commits:

d70e71d <Benjamin Kietzman> explicitly cast from bool to TypeParam
f739720 <Benjamin Kietzman> fixed format issue
2789a4d <Benjamin Kietzman> add test, fix append method
(this also includes documentation fixes which should fix the build due to doxygen warnings) but they are duplciated from ARROW-4411 apache#3518

Author: Micah Kornfield <emkornfield@gmail.com>

Closes apache#3519 from emkornfield/fix_bin_edge_cases_for_real and squashes the following commits:

97e60d4 <Micah Kornfield> Fix edge-cases in Binary Kernel Invert
e77f67f <Micah Kornfield> remove trailing whitespace
39d93f8 <Micah Kornfield> Fix documentation on util-internal as well
…ests passing on Windows

The tests pass cleanly for me on Windows with these changes. Can't say this was the most enjoyable project. I will fix up the CI but wanted to get eyes on the changes in case anything looks undesirable

Some notes

* This requires that LLVM was built with CMake, which is not true
  of the Windows installers from llvm.org. It works for me if I
  `conda install llvmdev=6.0.1 clangdev=6.0.1`
* I had to suppress a ton of compiler warnings that seem to be
  coming from the LLVM headers. see gandiva/llvm_includes.h
* Unix targets are still using strptime for date parsing. We
  might want to use `arrow/vendored/date.h` on all platforms
  unless there is some significant performance difference
* This doesn't build in Appveyor yet. I wanted to wait to see which
  builds folks think we should add this too (build times will go up
  because installing llvmdev/clangdev will be time consuming)
* Python or other bindings will have to be investigated in a separate patch

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3295 from wesm/gandiva-windows and squashes the following commits:

1fb5671 <Wes McKinney> Add missing file
1e0fdbc <Wes McKinney> Revert Flatbuffers changes that are causing CI flakiness
5fb78bc <Wes McKinney> Bump flatbuffers version to v1.10.0
707e9d9 <Wes McKinney> Visibility fixes for windows with the cast_time changes
fac669e <Wes McKinney> Rebase, remove failing unit test
f3b60bf <Wes McKinney> Some basic fixes to build system, clang IR generation, suppress endogenous LLVM warnings with MSVC 2017
Master build of the docker image has passed, so use this one again.
Along with [ARROW-4388](https://issues.apache.org/jira/browse/ARROW-4388), it would be useful to introduce dim_names in Tensor and SparseTensor of C++ library.

Author: Kenta Murata <mrkn@mrkn.jp>

Closes apache#3507 from mrkn/cpp_tensor_dim_names and squashes the following commits:

f529ee2 <Kenta Murata>  Add dim_names in Tensor and SparseTensor
To some extent this was an educated guess on CMake settings (with a lot of copy and pasting, so I expect things could be done slightly better).

Author: Micah Kornfield <emkornfield@gmail.com>

Closes apache#3416 from emkornfield/gmock and squashes the following commits:

d6b143f <Micah Kornfield> try to fix mac
9f8854e <Micah Kornfield> revert version bump for gtest and append GMOCK directory before other includes
e3b3a32 <Micah Kornfield> try to use vendored gmock on trusty
4c2425b <Micah Kornfield> reorder gtest library for trusty?
367abc9 <Micah Kornfield> Can more typos
2cf4a26 <Micah Kornfield> Remove force found and add more messaging
43f47af <Micah Kornfield> remove unused GMOCK_HOME
0a738b9 <Micah Kornfield> remove prefix
7c78244 <Micah Kornfield> force gmock true
218506e <Micah Kornfield> fix condition
84bab12 <Micah Kornfield> fix environment variable
f7f23a0 <Micah Kornfield> remove duplicate gmock
0c0ff92 <Micah Kornfield> fix some typos
812f414 <Micah Kornfield> update gmock output description in gmock modules
626621a <Micah Kornfield> cleanup bad suffixes and use gtest_roots
4d69be8 <Micah Kornfield> fix cmake gmock message and format
7c05e73 <Micah Kornfield> update cmake recipe with copy/paste bug fixes
be8020a <Micah Kornfield> add gmock to conda package list
e4d081a <Micah Kornfield> add mock
f2c4768 <Micah Kornfield> replace DummyBinaryKernel
3c51702 <Micah Kornfield> add many linux build
911f093 <Micah Kornfield> Add gmock (mostly guesswork and copy/paste)
Author: Yosuke Shiro <yosuke.shiro615@gmail.com>

Closes apache#3540 from shiro615/glib-use-column-builder and squashes the following commits:

a1741cc <Yosuke Shiro> Use column_builders in GArrowRecordBatchBuilderPrivate
23021ae <Yosuke Shiro> Add garrow_record_batch_builder_get_n_columns()
fc388b7 <Yosuke Shiro> Use garrow_record_batch_builder_get_column_builder() in garrow_record_batch_builder_get_field()
03b681d <Yosuke Shiro> Add test case for column_builders
6092875 <Yosuke Shiro>  Make public column_builders
1680c66 <Yosuke Shiro>  Add garrow_record_batch_builder_get_column_builder()
    cpp/src/plasma/store.cc: In member function 'arrow::Status plasma::PlasmaStore::ProcessMessage(plasma::Client*)':
    cpp/src/plasma/store.cc:767:36: error: 'void* memset(void*, int, size_t)' clearing an object of type 'struct plasma::PlasmaObject' with no trivial copy-assignment; use assignment or value-initialization instead [-Werror=class-memaccess]
       memset(&object, 0, sizeof(object));
                                        ^
    In file included from cpp/src/plasma/eviction_policy.h:27,
                     from cpp/src/plasma/store.h:30,
                     from cpp/src/plasma/store.cc:29:
    cpp/src/plasma/plasma.h:75:8: note: 'struct plasma::PlasmaObject' declared here
     struct PlasmaObject {
            ^~~~~~~~~~~~
    cpp/src/plasma/test/serialization_tests.cc: In function 'plasma::PlasmaObject plasma::random_plasma_object()':
    cpp/src/plasma/test/serialization_tests.cc:68:36: error: 'void* memset(void*, int, size_t)' clearing an object of type 'struct plasma::PlasmaObject' with no trivial copy-assignment; use assignment or value-initialization instead [-Werror=class-memaccess]
       memset(&object, 0, sizeof(object));
                                        ^
    In file included from cpp/src/plasma/test/serialization_tests.cc:25:
    cpp/src/plasma/plasma.h:75:8: note: 'struct plasma::PlasmaObject' declared here
     struct PlasmaObject {
            ^~~~~~~~~~~~
    cpp/src/plasma/test/serialization_tests.cc: In member function 'virtual void plasma::PlasmaSerialization_CreateReply_Test::TestBody()':
    cpp/src/plasma/test/serialization_tests.cc:110:38: error: 'void* memset(void*, int, size_t)' clearing an object of type 'struct plasma::PlasmaObject' with no trivial copy-assignment; use assignment or value-initialization instead [-Werror=class-memaccess]
       memset(&object2, 0, sizeof(object2));
                                          ^
    In file included from cpp/src/plasma/test/serialization_tests.cc:25:
    cpp/src/plasma/plasma.h:75:8: note: 'struct plasma::PlasmaObject' declared here
     struct PlasmaObject {
            ^~~~~~~~~~~~

Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#3543 from kou/plasma-suppress-class-memaccess-warning and squashes the following commits:

34ce66c <Kouhei Sutou>  Suppress class-memaccess warnings
Author: Renat Valiullin <rip.nsk@gmail.com>

Closes apache#3544 from rip-nsk/ARROW-4454 and squashes the following commits:

984fc62 <Renat Valiullin> fix unused parameter warnings
This PR adds the arrow-testing repo as a submodule in the directory `testing`.

Author: Andy Grove <andygrove73@gmail.com>

Closes apache#3547 from andygrove/ARROW-4459 and squashes the following commits:

522ac26 <Andy Grove> Add arrow-testing submodule
Make note of the fact that pyarrow has to be installed for sphinx to successfully build the documentation.

Author: Micah Kornfield <emkornfield@gmail.com>

Closes apache#3539 from emkornfield/update_build_doc and squashes the following commits:

8a3e541 <Micah Kornfield> Fix install instructions
9468abe <Micah Kornfield> convert to note
1. Add .sln file to the repo.
2. Minor clean up of .csproj to move properties out of the Configuration/Platform PropertyGroups, which is the standard way of setting properties in SDK style projects.

Author: Eric Erhardt <eric.erhardt@microsoft.com>

Closes apache#3529 from eerhardt/SolutionProjectFixUp and squashes the following commits:

88a7c6e <Eric Erhardt> Add RAT exclude for csharp sln file
e9036b3 <Eric Erhardt> Minor fixups to csharp .sln and .csproj file
This problem happens in some rare case.
When there are arrow lib build by old maven, but now current maven is updated to 3.5.4. This problem could happen and raise a building failure.

Author: Yuhong Guo <yuhong.gyh@antfin.com>

Closes apache#2738 from guoyuhong/fixJavaBuild and squashes the following commits:

7466510 <Yuhong Guo> Fix java build with Maven 3.5.4
This PR is to donate the DataFusion source code (assuming that the vote passes!)

Author: Andy Grove <andygrove73@gmail.com>

Closes apache#3399 from andygrove/ARROW-4263 and squashes the following commits:

990d06f <Andy Grove> formatting
6603091 <Andy Grove> update path again, update testing submodule
38fa63b <Andy Grove> remove test csv file, update tests to use test data from new testing submodule
16e4cff <Andy Grove> remove test csv file, update tests to use test data from new testing submodule
91f6e90 <Andy Grove> update example to use new data file
4ebeee5 <Andy Grove> formatting
ae88a90 <Andy Grove> convert tests to use new test data file that was randomly generated
d7bea8e <Andy Grove> update test to use uk_cities.csv and remove people.csv
061d788 <Andy Grove> remove unused test data files
f60e50d <Andy Grove> remove unused test data files, manually recreate uk_cities.csv because I can't trace where the original data came from
28d914a <Andy Grove> Update 00-prepare.sh to handle datafusion versioning
c4e1a26 <Andy Grove> DataFusion Donation
With the latest updates I have problems linking to gbenchmark on OSX, this fixes it.

Author: Korn, Uwe <Uwe.Korn@blue-yonder.com>

Closes apache#3550 from xhochy/ARROW-4471 and squashes the following commits:

b0608b0 <Korn, Uwe> Use linebreaks to separate arguments
9c20714 <Korn, Uwe> ARROW-4471:  Pass AR and RANLIB to all external projects
Seeing if this fixes the failures we are seeing. The conda-forge binutils package was just updated 2 days ago

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3554 from wesm/ARROW-4469 and squashes the following commits:

dfa1b3f <Wes McKinney> Pin conda-forge binutils version to 2.31 for now
Closes https://issues.apache.org/jira/browse/ARROW-4442

Typescript is generating an overly broad type for the `typeId` property of the ChunkedVector class, leading to a type mismatch and failure to infer `Chunked<T>` is a `Vector<T>`:

```ts
let col: Vector<Utf8>;
col = new Chunked(new Utf8());
  ^
/*
Argument of type 'Chunked<Utf8>' is not assignable to parameter of type 'Vector<Utf8>'.
  Type 'Chunked<Utf8>' is not assignable to type 'Vector<Utf8>'.
    Types of property 'typeId' are incompatible.
      Type 'Type' is not assignable to type 'Type.Utf8'.
*/
```

The type being generated currently is:
```ts
    readonly typeId: import("../enum").Type;
```

but it should be:
```ts
    readonly typeId: T['TType'];
```

The fix is to add an explicit return annotation to the Chunked `typeId` getter. Unfortunately this only affects the generated typings (`.d.ts` files) and not the library source, so it's difficult to test. We can look into whether there are any flags to trigger stricter type checking of the compiled code in the unit tests, but I don't know any off the top of my head.

Author: ptaylor <paul.e.taylor@me.com>

Closes apache#3538 from trxcllnt/js/add-chunked-typeId-annoation and squashes the following commits:

077f383 <ptaylor> add explicit type annotation to Chunked typeId getter
…kiness

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3552 from wesm/ARROW-4440 and squashes the following commits:

f3d5d2a <Wes McKinney> Fix brotli header copy step
908a9e7 <Wes McKinney> Revert use of common thirdparty install directory
371ee0c <Wes McKinney> Use EP_COMMON_CMAKE_ARGS
282e164 <Wes McKinney> Revert recent changes to flatbuffers EP causing flakiness
pcmoritz and others added 2 commits February 4, 2019 16:47
…bjects

This is a regression from apache#3423, the recursion depth was incremented for arrays and dicts but not for lists, tuples and sets.

I also added a regression test for this.

Author: Philipp Moritz <pcmoritz@gmail.com>

Closes apache#3556 from pcmoritz/recursive-serialization and squashes the following commits:

f83cff9 <Philipp Moritz> fix
903788e <Philipp Moritz> fix serialization of objects that contain themselves
… external store

https://issues.apache.org/jira/browse/ARROW-4294

Note: this PR was previously at apache#3432, which was closed since its commit history was broken.

Currently, when Plasma needs storage space for additional objects, it evicts objects by deleting them from the Plasma store. This is a problem when it isn't possible to reconstruct the object or reconstructing it is expensive. This patch adds support for a pluggable external store that Plasma can evict objects to when it runs out of memory.

Author: Anurag Khandelwal <anuragk@berkeley.edu>
Author: Philipp Moritz <pcmoritz@gmail.com>

Closes apache#3482 from anuragkh/plasma_evict_to_external_store and squashes the following commits:

6316715 <Philipp Moritz> remove external store worker, simplify interface
6fbc55b <Anurag Khandelwal> Revert "Add an eviction buffer to allow asynchronous evictions"
4f2c02c <Anurag Khandelwal> Revert "Minor fix"
1bc1dbe <Anurag Khandelwal> Revert "format fix"
7b662be <Anurag Khandelwal> Revert "Remove timeout for external store test tearDown"
25663df <Anurag Khandelwal> Remove timeout for external store test tearDown
7945cc9 <Anurag Khandelwal> format fix
0d72639 <Anurag Khandelwal> Minor fix
957efb5 <Anurag Khandelwal> Add an eviction buffer to allow asynchronous evictions
896d895 <Anurag Khandelwal> Fixes
7ae4867 <Anurag Khandelwal> Merge branch 'master' into plasma_evict_to_external_store
1af2f8b <Anurag Khandelwal> Fix cpplint issues
04e1730 <Anurag Khandelwal> Merge branch 'master' into plasma_evict_to_external_store
301e575 <Anurag Khandelwal> Fix uses of ARROW_CHECK_OK/ARROW_CHECK
69a56ab <Anurag Khandelwal> Fix documentation errrors
c19c576 <Anurag Khandelwal> Add documentation for notify flag
f3fad80 <Anurag Khandelwal> Fix external store worker intialization
9081596 <Anurag Khandelwal> Clean up formatting issues
f5cc95c <Anurag Khandelwal> Add lint exclusion for external_store_worker, since it uses mutex
ffd1f0e <Anurag Khandelwal> Extend plasma eviction changes to python module
8afc9fb <Anurag Khandelwal> Kill only the plasma_store_server that we started
be31567 <Anurag Khandelwal> Add test for testing evictions/unevictions
a43445a <Anurag Khandelwal> Update serialization test
58a9953 <Anurag Khandelwal> Add support for evicting/un-evicting Plasma objects to/from external store
@emkornfield
Copy link
Contributor

@wesm I think you pushed the changes to remove the extern templates? Can this be revisited now?

mikepigott and others added 7 commits February 5, 2019 10:18
https://issues.apache.org/jira/browse/ARROW-3923

Hello!  I was reading through the JDBC source code and I noticed that a java.util.Calendar was required for creating an Arrow Schema and Arrow Vectors from a JDBC ResultSet, when none is required.

This change makes the Calendar optional.

Unit Tests:
The existing SureFire plugin configuration uses a UTC calendar for the database, which is the default Calendar in the existing code.  Likewise, no changes to the unit tests are required to provide adequate coverage for the change.

Author: Michael Pigott <mikepigott@users.noreply.github.com>
Author: Mike Pigott <mpigott@gmail.com>

Closes apache#3066 from mikepigott/jdbc-timestamp-no-calendar and squashes the following commits:

4d95da0 <Mike Pigott> ARROW-3923: Supporting a null Calendar in the config, and reverting the breaking change.
cd9a230 <Mike Pigott> Merge branch 'master' into jdbc-timestamp-no-calendar
509a1cc <Michael Pigott> Merge pull request #5 from apache/master
789c8c8 <Michael Pigott> Merge pull request #4 from apache/master
e5b19ee <Michael Pigott> Merge pull request #3 from apache/master
3b17c29 <Michael Pigott> Merge pull request #2 from apache/master
881c6c8 <Michael Pigott> Merge pull request #1 from apache/master
089cff4 <Mike Pigott> Format fixes
a58a4a5 <Mike Pigott> Fixing calendar usage.
e12832a <Mike Pigott> Allowing for timestamps without a time zone.
Removes the constructor override leftover from the original `Object.create()` implementation of the bignum mixins. Closes https://issues.apache.org/jira/browse/ARROW-4477

Author: ptaylor <paul.e.taylor@me.com>

Closes apache#3557 from trxcllnt/js/fix-bn-constructor and squashes the following commits:

3a46859 <ptaylor> remove constructor override in the bignum mixins
This implement the following API.

```
random::RandomArrayGenerator rand(seed);
auto bool_array = rand.Boolean(num_rows, 0.75, null_prob);
auto u8_array = rand.Int8(num_rows, 0, 100, null_prob);
```

Author: François Saint-Jacques <fsaintjacques@gmail.com>

Closes apache#3533 from fsaintjacques/ARROW-3239-random-utils and squashes the following commits:

a806b1f <François Saint-Jacques> Add ARROW_EXPORT to RandomArrayGenerator
63d9103 <François Saint-Jacques> Fix GenerateOptions seed type
59c3a3b <François Saint-Jacques> Add undef to macro
22eca80 <François Saint-Jacques> Handle special case with MSVC
728aadc <François Saint-Jacques> Fix downcasting issues
4840ac0 <François Saint-Jacques> ARROW-3239:  Implement simple random array generation
…umnWriter, remove use of 'extern template class'

This follows corresponding work in TypedColumnReader. The public API is unchanged as can be verified by lack of changes to the unit tests

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3551 from wesm/PARQUET-1521 and squashes the following commits:

aa6687a <Wes McKinney> Fix clang warnings
3355504 <Wes McKinney> Print build warning level
b657ac9 <Wes McKinney> Fix parquet-column-io-benchmark
61204de <Wes McKinney> Refactor TypedColumnWriter implementation to be based on pure virtual interface, remove use of extern template class
Author: Andy Grove <andygrove73@gmail.com>

Closes apache#3548 from andygrove/ARROW-4460 and squashes the following commits:

5b3a770 <Andy Grove> minor edit
b557060 <Andy Grove> Remove section on Gandiva, change date to today, link to JIRA for adding Parquet support
29c3dbe <Andy Grove> update blog post with link to open Rust issues in confluence
a6fd606 <Andy Grove> Minor edits
067cb04 <Andy Grove> Add example to blog post
2a735fd <Andy Grove> Draft of DataFusion announcement
…in Arrow 0.12

This blog shows how we were able to significant improve performance and memory use in common cases when converting from the Arrow string memory layout to pandas's native memory model based on NumPy arrays of Python objects.

Author: Wes McKinney <wesm+git@apache.org>

Closes apache#3553 from wesm/python-string-memory-0.12 and squashes the following commits:

f0d684d <Wes McKinney> Update publication date
2bbb92d <Wes McKinney> Fix some base urls
c624e55 <Wes McKinney> Draft blog post about string memory use work in Arrow 0.12
Change-Id: I4208e6d42fc6040313de7a01f897fc22db490c43
@wesm
Copy link
Member

wesm commented Feb 5, 2019

Yeah, I think so, let me rebase and see how things look. "extern template class" is now my most hated C++11 feature. It only makes sense if you are developing for a single compiler

rip-nsk and others added 6 commits February 5, 2019 09:54
Author: Renat Valiullin <rip.nsk@gmail.com>

Closes apache#3545 from rip-nsk/ARROW-4456 and squashes the following commits:

939a4fb <Renat Valiullin> Add parquet tools to ci/cpp-msvc-build-main
03422aa <Renat Valiullin> remove dependency on getopt in parquet tools
Change-Id: I7cf2dec1cca5bf5c1c1a772cd81b7bf10d2ce287
…th CMP0063

Change-Id: I59d8bc3e88a648b909606bebe710c98c18442507
Change-Id: I10f34a28309e03700c335b88fea0e7f138140ada
Change-Id: Ie148f050eb211e7f64dbdd99307e9eb237340b93
@wesm
Copy link
Member

wesm commented Feb 5, 2019

@emkornfield unfortunately there are more usages of "extern template class" causing linking problems

FAILED: debug/parquet-comparison-test 
: && /usr/bin/ccache /home/wesm/miniconda/envs/arrow-3.7/bin/x86_64-conda_cos6-linux-gnu-c++  -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=1 -fuse-ld=gold -ggdb -O0  -Wall -Wconversion -Wno-sign-conversion -Werror -msse4.2  -g  -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags   -rdynamic src/parquet/util/CMakeFiles/parquet-comparison-test.dir/comparison-test.cc.o  -o debug/parquet-comparison-test  -Wl,-rpath,/home/wesm/code/arrow/cpp/build/debug:/home/wesm/miniconda/envs/arrow-3.7/lib debug/libarrow_testing.so.13.0.0 googletest_ep-prefix/src/googletest_ep/lib/libgtest_main.a googletest_ep-prefix/src/googletest_ep/lib/libgtest.a -lpthread -ldl debug/libparquet.so.13.0.0 /home/wesm/miniconda/envs/arrow-3.7/lib/libthrift.a debug/libarrow.so.13.0.0 -ldl jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt -Wl,-rpath-link,/home/wesm/miniconda/envs/arrow-3.7/lib && :
/home/wesm/miniconda/envs/arrow-3.7/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld.gold: warning: discarding version information for __pthread_key_create@GLIBC_2.2.5, defined in unused shared library /home/wesm/miniconda/envs/arrow-3.7/bin/../x86_64-conda_cos6-linux-gnu/sysroot/lib/libpthread.so.0 (linked with --as-needed)
../src/parquet/util/comparison-test.cc:199: error: undefined reference to 'parquet::CompareDefault<parquet::DataType<(parquet::Type::type)2> >::CompareDefault()'
../src/parquet/util/comparison-test.cc:201: error: undefined reference to 'parquet::CompareDefault<parquet::DataType<(parquet::Type::type)2> >::operator()(long const&, long const&)'
../src/parquet/util/comparison-test.cc:202: error: undefined reference to 'parquet::CompareDefault<parquet::DataType<(parquet::Type::type)2> >::operator()(long const&, long const&)'
../src/parquet/util/comparison-test.cc:202: error: undefined reference to 'parquet::CompareDefault<parquet::DataType<(parquet::Type::type)2> >::operator()(long const&, long const&)'
../src/parquet/util/comparison-test.cc:203: error: undefined reference to 'parquet::CompareDefault<parquet::DataType<(parquet::Type::type)2> >::operator()(long const&, long const&)'
../src/parquet/util/comparison.h:39: error: undefined reference to 'vtable for parquet::CompareDefault<parquet::DataType<(parquet::Type::type)2> >'
/home/wesm/miniconda/envs/arrow-3.7/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld.gold: the vtable symbol may be undefined because the class is missing its key function
../src/parquet/util/comparison.h:103: error: undefined reference to 'parquet::CompareDefault<parquet::DataType<(parquet::Type::type)2> >::CompareDefault()'
../src/parquet/util/comparison.h:39: error: undefined reference to 'vtable for parquet::CompareDefault<parquet::DataType<(parquet::Type::type)1> >'
/home/wesm/miniconda/envs/arrow-3.7/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld.gold: the vtable symbol may be undefined because the class is missing its key function
../src/parquet/util/comparison.h:98: error: undefined reference to 'parquet::CompareDefault<parquet::DataType<(parquet::Type::type)1> >::CompareDefault()'
collect2: error: ld returned 1 exit status
[26/166] Building CXX object src/parquet/CMakeFiles/parquet-statistics-test.dir/statistics-test.cc.o
^C
ninja: build stopped: interrupted by user.

I opened https://issues.apache.org/jira/browse/PARQUET-1523 about improving the comparator performance while also removing "extern template class"

cc @majetideepak

@wesm
Copy link
Member

wesm commented Feb 5, 2019

These are the remaining public usages of this feature

../src/parquet/util/visibility.h:#define PARQUET_TEMPLATE_EXPORT PARQUET_EXPORT
../src/parquet/util/visibility.h:#define PARQUET_TEMPLATE_EXPORT
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<BooleanType>;
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<Int32Type>;
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<Int64Type>;
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<Int96Type>;
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<FloatType>;
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<DoubleType>;
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<ByteArrayType>;
../src/parquet/util/comparison.cc:template class PARQUET_TEMPLATE_EXPORT CompareDefault<FLBAType>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<BooleanType>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<Int32Type>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<Int64Type>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<Int96Type>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<FloatType>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<DoubleType>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<ByteArrayType>;
../src/parquet/statistics.cc:template class PARQUET_TEMPLATE_EXPORT TypedRowGroupStatistics<FLBAType>;

@pitrou
Copy link
Member Author

pitrou commented Apr 23, 2019

This PR looks like it needs rebasing.

@wesm
Copy link
Member

wesm commented Apr 24, 2019

Yes. I'm going to close this and start fresh once we eliminate the last of extern template class in Parquet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.