Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3182: [C++][Java] Gandiva merge into Arrow #2558

Merged
merged 60 commits into from
Sep 29, 2018

Conversation

praveenbingo
Copy link
Contributor

Merging Gandiva CPP and Java into Arrow. The build is off by default.

Some things pending

  1. Update the licenses.
  2. Use Arrow CMake functions instead of Gandiva CMake functions.
  3. Address mix of styles in the cmake files, Gandiva follows modern cmake targets and all of the dependencies seem to be direct references in Arrow. Follow on consistently.

@wesm
Copy link
Member

wesm commented Sep 14, 2018

@praveenbingo thanks for setting this up. A couple things

  • Step 1 (fixing licenses) is the necessary one to proceed with the IP clearance. Everything else is secondary. The sooner you do that, the sooner we can merge this and move forward with the other stuff

  • In addition to each of you submitting ICLAs to the Apache secretary, Dremio Corporation will need to submit a Software Grant Agreement for this software

  • It's a small detail but if you could use [Gandiva] instead of [GDV] in the commit logs, that would help with readability

  • On Step 3 regarding CMake targets, I would definitely like to modernize and improve our use of CMake, so we can work on that together after the merge

@praveenbingo
Copy link
Contributor Author

Thanks Wes.
I've updated the licenses across the board.
Still pending

  1. Lint is failing, we can fix that post the merge.
  2. For the message edit, is that mandatory :) because it looks like i would have to restart the merge as the entire tree is getting re-written on trying to re-write the message.

Sure, we can work together on the CMake infrastructure post the merge.

@wesm
Copy link
Member

wesm commented Sep 18, 2018

I'll fix the commit messages before merging. You're failing RAT checks. Can you add a license header to your README.md?

@wesm wesm changed the title ARROW-3182:[C++][Java]Gandiva merge into Arrow ARROW-3182: [C++][Java] Gandiva merge into Arrow Sep 18, 2018
@praveenbingo
Copy link
Contributor Author

Sure Wes, done.

@wesm
Copy link
Member

wesm commented Sep 20, 2018

Can you fix the cpplint issues or do you want help with that?

@wesm
Copy link
Member

wesm commented Sep 20, 2018

We can also deal with the linting issues while the IP clearance vote is under way, so getting your CLAs in so we can do that is the urgent thing

@praveenbingo
Copy link
Contributor Author

I will fix the linting issues. Just caught up with some release tasks, so was hoping we can do that post-merge. But looks like CI does a style check on all cpp code and not just the ones being built. I will do this today or over the weekend.

Will send out the ICLAs for me, Ravindra and Vivek today.

@praveenbingo
Copy link
Contributor Author

All of us have sent the ICLA.

@wesm
Copy link
Member

wesm commented Sep 21, 2018

OK, I have seen 1 acknowledgement from the ASF secretary and will look out for the other 2. Should be able to start the IP clearance vote today, then, and then this can be merged on Monday

@wesm
Copy link
Member

wesm commented Sep 21, 2018

I started the IP clearance vote: https://lists.apache.org/thread.html/be56602b6154b8537ff7ea260a5a93ccb328c5504c87b550a857b8b6@%3Cgeneral.incubator.apache.org%3E. I can merge the patch after 72 hours have elapsed

@praveenbingo
Copy link
Contributor Author

The linting clean up is taking significantly more time than i anticipated. make format is not fixing the issues (even with the 6.0 version).

There are couple of files which has most of the issues.

I see two paths forward,

  1. Exclude the two files for now and fix them as part of a different PR.
  2. Wait for a while on the merge.

Wes - what do you recommend.

@wesm
Copy link
Member

wesm commented Sep 26, 2018

Would you like me to help?

@wesm
Copy link
Member

wesm commented Sep 26, 2018

Can you push whatever changes you have now and I'll take a look in about an hour or so

@praveenbingo
Copy link
Contributor Author

Pushed the latest i had.

@wesm
Copy link
Member

wesm commented Sep 26, 2018

Thanks. I'm about to give a talk but I'll report back a little later this morning (central europe time)

@wesm
Copy link
Member

wesm commented Sep 26, 2018

BTW I ran into https://issues.apache.org/jira/browse/ARROW-3331 when trying to build this

@wesm
Copy link
Member

wesm commented Sep 26, 2018

I should have the lint fixes pushed in ~10 minutes

@wesm
Copy link
Member

wesm commented Sep 26, 2018

There are a number of build failures when building with -DBUILD_WARNING_LEVEL=CHECKIN, missing virtual dtors, integer conversion issues, etc. I'm going to fix those quickly here

@wesm
Copy link
Member

wesm commented Sep 26, 2018

OK, this is building and linting cleanly for me now, PTAL

@praveenbingo
Copy link
Contributor Author

It is not building locally for me now, errors in RE2 and hash.cc.

@wesm
Copy link
Member

wesm commented Sep 27, 2018

#include <cstring> and #include <string.h> should be the same thing http://www.cplusplus.com/reference/cstring/?

NOTICE.txt Outdated
* Copyright (c) 2017 Florian Dang
* Copyright (c) 2017 Paul Thompson
* Copyright (c) 2018 Tomasz Kamiński
* Licensed under the MIT License;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the license for this in LICENSE.txt instead? There was some early confusion with NOTICE.txt vs. LICENSE.txt but my understanding is that vendored code licenses should be put in the latter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for and <string.h> agreed, but there seems to be a conflict between the xcode cstring and string.h for some versions so i reverted back to the original header..

@wesm
Copy link
Member

wesm commented Sep 27, 2018

This branch needs to be rebased, but if you like I can take care of that right before merging

@wesm
Copy link
Member

wesm commented Sep 28, 2018

Barring anything crazy I'm going to rebase and merge this tomorrow morning CEST. Any objections?

@praveenbingo
Copy link
Contributor Author

Nope, please go ahead Wes. Thanks for helping out and sorry we did not pay enough attention this week on the merge.

The gandiva repo has gone ahead a bit from the time we started the merge, once our release is done we would move the remaining commits to Arrow and then start our development from the Arrow repo.

@wesm
Copy link
Member

wesm commented Sep 29, 2018

@praveenbingo OK -- it is important that you stop making PRs into the old Gandiva repo ASAP and instead open the PRs here. I'm going to look at merging this right now

pravindra and others added 19 commits September 29, 2018 04:23
[Gandiva] Add hash functions on all data types

[Gandiva] Fix stylecheck in travis to print diff

[Gandiva] pick clang-format from llvm-binary dir

[Gandiva] handle case when seed is null

[Gandiva] Fix a style check
Added support for literals and null for time types.
Class references are local by default and eligible for GC.

We would need to convert it to global reference on library load for it
to be safely used for the program lifetime.
Add support for timestampaddXxx functions
Add support for is_distinct_from, is_not_distinct_from, isnull, isnotnull, date_add/add, date_sub/subtract/date_diff, date_trunc_Xxx functions
* Temporarily matching what the dremio does for mod zero.
* Used the latest Arrow APIs for allocating buffers.
- similar to projection, filter is built for a specific schema and
  condition 
- the output of filter is a selection vector
* Add java bindings for filter expr
* Mv selection vector impl to internal
Fixed some bugs in the filter code path.
Change the selection vector arrays as unsigned to match dremio.
1. Added lock to holder read to address potential race condition.
2. Fixed log message.
3, Addressed breaking arrow change.
1. In evaluate to lookup module, first do without lock and fallback only if
   module is not found.
2. Use release builds in travis.
Introducing a cache to hold the projectors and filters for re-use.
The cache is a LRU that can hold 100 entries.
[Gandiva] Fixed concurrency issue in cache.

Modifications were happening in get without a mutex.
Wrote a test to verify and prevent regression.
Literal string coversion was ignoring types, leading
to mismatch in hashing of expressions.
- add a registry for "function holders" implemented in cpp
- the function holder is instantiated at expression decomposition time
- at eval time, the registered fn gets an extra param
- To get around the java load issue, create a native library and load it in the LLVM module. 
   This module has the hooks for all the c++ function helpers.
- for files that are compiled in libgandiva_helpers, add into  gandiva::helpers namespace.
- merged status.cc into status.h
- reduce benchmark iterations to 1M instead of 100M
- add checks in benchmark test to verify that elapsed_time is
  atleast <= 2 * expected
@wesm wesm force-pushed the gandiva-merge branch 2 times, most recently from e1a2849 to 810adb7 Compare September 29, 2018 08:25
@wesm
Copy link
Member

wesm commented Sep 29, 2018

OK, I normalized the commit messages, moved the note about precompiled/date.h to LICENSE.txt and rebased. I'm going to make sure the build isn't totally broken and then ff-merge to master. Would you like some help turning the new work into Arrow patches?

…es to apache license.

Fix clang-format, cpplint warnings, -Wconversion warnings and other warnings
with -DBUILD_WARNING_LEVEL=CHECKIN. Fix some build toolchain issues, Arrow
target dependencies. Remove some unused CMake code
@wesm
Copy link
Member

wesm commented Sep 29, 2018

Fixed some lingering lint issues

@wesm wesm merged commit 8e9a915 into apache:master Sep 29, 2018
@wesm
Copy link
Member

wesm commented Sep 29, 2018

Merged. @praveenbingo can you let me know your ASF JIRA id so that I can assign https://issues.apache.org/jira/browse/ARROW-3182 to you?

@wesm
Copy link
Member

wesm commented Sep 29, 2018

I wrote on the mailing list about the additional work that needs to be moved over. Let's continue the discussion there

@praveenbingo
Copy link
Contributor Author

My JIRA id (i think) is praveenbingo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants