Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3182: [C++][Java] Gandiva merge into Arrow #2558

Merged
merged 60 commits into from
Sep 29, 2018

Commits on Sep 29, 2018

  1. [Gandiva] Bootstrap evaluation using LLVM code generation

    Bootstrap evaluation using llvm code generation
    
    LLVM code generation is done using a mix of :
    - glue IR code that loops over the vector, generates function
      calls
    - byte-code files generated from simple c++ functions using
      clang .
    The glue-code and pre-compiled byte code are merged and
    optimized together.
    
    Expressions are specified using a "tree builder" where each
    node is an arrow vector, or a binary/unary function.
    
    During code generation, the expressions are "decomposed" so
    that the value array and bitmap array are evaluated separately
    to compute the expression result. This avoids the use of too
    many branch/conditional instructions , and
    hence, can be vectorized efficiently.
    
    Support added for arithmetic and logical expressions on
    numeric types.
    
    Travis CI support added for build on ubuntu.
    
    Change-Id: I06db1dfd398750755c76e0a395ba52bd1a01e329
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    28c4617 View commit details
    Browse the repository at this point in the history
  2. [Gandiva] Make use of the modular features of cmake

    Separate out the public and private target dependencies.
    
    For arrow, export an interface target. This avoids the need to add
    include dirs for each dependency on arrow.
    
    Removed dependency on gtest. Instead, build it as an external project.
    This is the recommended practice for googletest.
    
    For pre-compiled files, generate the bitcode files for each of them
    independently and then, link them to generate a unified bitcode file.
    Removed cpplint exceptions since there is no more sourcing of .cc
    files.
    
    Separate out the public include files from private includes, and 
    add them in the dependency list in cmake.
    
    pass the bytecode filepath from cmake
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    5315d20 View commit details
    Browse the repository at this point in the history
  3. [Gandiva] Introduce error codes as error handling strategy.

    [Gandiva] Introduce error codes as error handling strategy.
    
    Introduced status codes and using the same as the error handling strategy.
    
    The decision was taken to accommodate existing libraries that use error codes and
    because Arrow also uses error codes and not exception.
    
    Changed the signatures across the board for the same.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    5f721ae View commit details
    Browse the repository at this point in the history
  4. [Gandiva] Support functions of type NULL_INTERNAL

    The pre-compiled functions takes an extra arg  to set the
    result validity.
    
    At decompose time, a local bitmap is assigned to track the result
    validity bits for such functions.
    
    At evaluate time, sufficient number of local bitmaps are allocated for
    all the local bitmaps.
    
    For the final computation of the expression validity, the input bitmaps
    can be either one of the value-vector bitmaps, or a local bitmap.
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    030735f View commit details
    Browse the repository at this point in the history
  5. [Gandiva] Simplify the api to make function nodes

    Replaced MakeUnaryFunction, MakeBinaryFunction with a simpler
    MakeFunction that takes a vector of args.
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    fd00ba9 View commit details
    Browse the repository at this point in the history
  6. [Gandiva] Support if-else expression

    An if-else expression has three sub-expressions :
    - condition
    - then-expression
    - else-expression
    Each of these can again be a node in the expression tree.
    
    The result validity of the if-else expression is saved in a local bitmap.
    
    Also, moved all of the integ tests to a different folder  so that there is no mix of include files.
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    d06db12 View commit details
    Browse the repository at this point in the history
  7. [Gandiva] expr decomposition moved to visitor

    - moved the expression decomposition logic from Node class
      to a visitor class 
    - moved node.h out of external includes
    - renamed Evaluator to Projector
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    ba71c91 View commit details
    Browse the repository at this point in the history
  8. [Gandiva] Support literal expressions

    Added support for literals .
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    8b0dfe0 View commit details
    Browse the repository at this point in the history
  9. [Gandiva] Reduce bitmap updates for if-else

    In case of nested if-else conditions, eg.
    
    if A
    else if B
         else if C
              else D
    
    The else parts of A & C will not update validity bitmaps.
    Only the if parts and the terminal else  update bitmaps.
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    f8738db View commit details
    Browse the repository at this point in the history
  10. [Gandiva] First draft of Gandiva Java APIs

    [Gandiva] Add proto file to serialize the expression tree
    Fix the build also to generate the java and cpp proto files
    
    [Gandiva] Added proto files and added code to serialize an expression and other data types
    
    * Addressed review comments
    
    * Addressed code review comments
    
    * Made some classes package private
    
    Change-Id: Id7d3059782b00ffc14c20691764019a24ad143a8
    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    600f407 View commit details
    Browse the repository at this point in the history
  11. [Gandiva] Add CMake support for proto files

    Split gandiva into two sub modules : codegen & jni.
    - codegen is the core having cpp APIs and LLVM
    - jni deals with protobufs & interfacing with java
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    203bb7e View commit details
    Browse the repository at this point in the history
  12. [Gandiva] Add a zero-copy variant to Evaluate

    Dremio allocates the output vectors in java and passes the pointers
    to gandiva. In that case, gandiva will use the passed in buffers.
    
    Made Evaluate use ArrayData internally for output buffers, since Array
    is expected to be immutable.
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    2b27a99 View commit details
    Browse the repository at this point in the history
  13. [Gandiva] switch to /// or // style comments

    Also, added "Adapted from XX" comments in ci/travis
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    cadb463 View commit details
    Browse the repository at this point in the history
  14. [Gandiva] Add unit tests for bitmap/time fns

    - Added definitions for other integer types 
    - Added definitions for unsigned types
    - Added a test for arithmetic ops on all int types
    - The functions should be inlined in the pre-compiled library, but
    not in the unit tests. Added a  compiler flag to control this.
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    3de315f View commit details
    Browse the repository at this point in the history
  15. [Gandiva] Fix order of includes.

    [Gandiva] Fix order of includes.
    
    Fixing the order of includes to follow style guideline.
    The order to follow is documented here : https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes
    Also enabled the check in lint.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    d943f7d View commit details
    Browse the repository at this point in the history
  16. [Gandiva] Add Java APIs

    [Gandiva] Gandiva Java APIs
    
    Added the JNI Implementation of the Java APIs
    Added Java based unit and integration tests
    Use cmake to build gandiva_jni
    Added pom.xml to build Java files
    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    585e79c View commit details
    Browse the repository at this point in the history
  17. [Gandiva] Integrate java with travis CI.

    Building the java library as part of travis CI.
    We build the library in the before script and run the tests
    in the script part.
    This is to kickoff the travis integration, further work is coming to
    automate the library location and making it os independent.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    6f67ce8 View commit details
    Browse the repository at this point in the history
  18. [Gandiva] update benchmark results

    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    0559290 View commit details
    Browse the repository at this point in the history
  19. [Gandiva] Add validation checks for Java coding guidelines in the build

    Add Java coding guideline rules
    Add validation checks to validate Java coding guidelines
    
    Change-Id: Iae6b16dad0cc72ec760ecb2f2232a7c713dc4b8d
    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    c949c11 View commit details
    Browse the repository at this point in the history
  20. [Gandiva] Added validation to projector build.

    Validating the input schema and expressions during the projector build.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    9ce2377 View commit details
    Browse the repository at this point in the history
  21. [Gandiva] Fixed licenses and minor corrections in build.

    Fixed the license to be Dremio and not Apache licenses.
    Failing on check stlye errors.
    Include exceptions in the gandiva java target.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    bd2aa97 View commit details
    Browse the repository at this point in the history
  22. [Gandiva] Support boolean and/or

    - tree builder api for and/or
    - decomposer/validator for and/or
    - code generator for and/or
    - tests for and/or
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    0f46c5b View commit details
    Browse the repository at this point in the history
  23. [Gandiva] Support null literals

    - add tree-builder, codegen support for null literals
    - moved the code for final bitmap computation to class BitMapAccumulator
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    1fc01ee View commit details
    Browse the repository at this point in the history
  24. [Gandiva] Support AND/OR control expressions

    add java bindings for and/or
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    5216242 View commit details
    Browse the repository at this point in the history
  25. [Gandiva] Support null literals

    add java bindings for null literals
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    d1f161b View commit details
    Browse the repository at this point in the history
  26. [Gandiva] Support date/time functions and datatypes

    Support date/time types in Java
    Add cpp/Java tests for date/time types
    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    366a972 View commit details
    Browse the repository at this point in the history
  27. [Gandiva] Dynamically load dependencies.

    Loading Gandiva dynamically in java bindings.
    Packaging the dynamic library and byte code files in Gandiva JAR.
    Introduced configuration object to customize Gandiva at runtime.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    e344807 View commit details
    Browse the repository at this point in the history
  28. [Gandiva] Support variable len arrow vectors

    - Track offsets buffer for string/binary
    - annotator/generator support for string/binary
    - literal support for string/binary
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    c0ae3c3 View commit details
    Browse the repository at this point in the history
  29. [Gandiva] Made Gandiva JNI a packagable library.

    Modified the build to package the gandiva jni as a stand alone library that
    can be packaged in the Gandiva JAR.
    
    Also producing two versions of gandiva core - a static and a shared one.
    
    Fixed LLVM dependencies to be target based.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    35c9203 View commit details
    Browse the repository at this point in the history
  30. [Gandiva] clang-format to validate/fix style

    - added target "make stylecheck" to check style
    - added target "make stylefix" to check style
    - fixed README.md
    - fixed ci script
    - used stylefix to fix all existing style violations
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    6099f6d View commit details
    Browse the repository at this point in the history
  31. [Gandiva] Deploy to ossrh after build.

    Deploying the Gandiva Jar to OSSRH after master merge builds.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    18a9052 View commit details
    Browse the repository at this point in the history
  32. [Gandiva] support varlen types in gandiva

    - added java bindings for varlen types/literals
    - minor cleanups in llvm generator and engine
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    9bc45a0 View commit details
    Browse the repository at this point in the history
  33. [Gandiva] Add cpp/Java microbenchmarks

    Added microbenchmarks in both cpp and Java
    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    4eb7cde View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    75ca520 View commit details
    Browse the repository at this point in the history
  35. [Gandiva] Add support to print expressions

    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    025d1b9 View commit details
    Browse the repository at this point in the history
  36. [Gandiva] Support more date/time functions

    Support isnull, isnotnull, equal, and not_equal for date/time types
    Support date/time types for less_than, less_than_or_equal_to, greater_than, greater_than_or_equal_to
    Implement all extractXxx functions
    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    39d180f View commit details
    Browse the repository at this point in the history
  37. [Gandiva] link libstdc++ statically

    - Switched to gcc-4.9, since the stdc++ linked with 4.8 doesn't work with llvm libs.
    - Build arrow in travis instead of the conda build 
    - fixed an error in node.h that showed up when I toyed with clang compiler
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    eb11ea9 View commit details
    Browse the repository at this point in the history
  38. [Gandiva] Export supported types from Gandiva.

    Exporting supported data types and functions from Gandiva.
    Added a JNI bridge to access this from the java layer.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    9e98f03 View commit details
    Browse the repository at this point in the history
  39. [Gandiva] Fix missing include directory of gtest in CMakeLists.txt

    * Fix missing set the include directory of gtest
    * Fix to use same format as other dependencies
    
    Change-Id: Iad193e219fc07f777988984db325ce97ad83b545
    masayuki038 authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    8707793 View commit details
    Browse the repository at this point in the history
  40. [Gandiva] Fixed extract second from time.

    Fixed the implementation of extract second from time.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    5af1119 View commit details
    Browse the repository at this point in the history
  41. [Gandiva] Add hash functions on all data types

    [Gandiva] Add hash functions on all data types
    
    [Gandiva] Fix stylecheck in travis to print diff
    
    [Gandiva] pick clang-format from llvm-binary dir
    
    [Gandiva] handle case when seed is null
    
    [Gandiva] Fix a style check
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    e001c1d View commit details
    Browse the repository at this point in the history
  42. [Gandiva] Fixed literals and nulls for time types.

    Added support for literals and null for time types.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    a2c3300 View commit details
    Browse the repository at this point in the history
  43. [Gandiva] Fixed reference initializations.

    Class references are local by default and eligible for GC.
    
    We would need to convert it to global reference on library load for it
    to be safely used for the program lifetime.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    8a794e8 View commit details
    Browse the repository at this point in the history
  44. [Gandiva] Add support for more date/time functions

    Add support for timestampaddXxx functions
    Add support for is_distinct_from, is_not_distinct_from, isnull, isnotnull, date_add/add, date_sub/subtract/date_diff, date_trunc_Xxx functions
    vvellanki authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    e490a7f View commit details
    Browse the repository at this point in the history
  45. [Gandiva] Match gandiva mod operator to dremio for mod zero.

    * Temporarily matching what the dremio does for mod zero.
    * Used the latest Arrow APIs for allocating buffers.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    ac661ff View commit details
    Browse the repository at this point in the history
  46. [Gandiva] Add support for filters

    - similar to projection, filter is built for a specific schema and
      condition 
    - the output of filter is a selection vector
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    c0eab74 View commit details
    Browse the repository at this point in the history
  47. [Gandiva] Add java bindings for filter expr

    * Add java bindings for filter expr
    * Mv selection vector impl to internal
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    743b3c1 View commit details
    Browse the repository at this point in the history
  48. [Gandiva] Fixed filter bugs.

    Fixed some bugs in the filter code path.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    7ce243a View commit details
    Browse the repository at this point in the history
  49. [Gandiva] Fixed selection vector array type

    Change the selection vector arrays as unsigned to match dremio.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    3cfbdb5 View commit details
    Browse the repository at this point in the history
  50. [Gandiva] Executing TPCH queries.

    1. Added lock to holder read to address potential race condition.
    2. Fixed log message.
    3, Addressed breaking arrow change.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    e8ee7f9 View commit details
    Browse the repository at this point in the history
  51. [Gandiva] Perf Improvments

    1. In evaluate to lookup module, first do without lock and fallback only if
       module is not found.
    2. Use release builds in travis.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    cf51280 View commit details
    Browse the repository at this point in the history
  52. [Gandiva] Caching projectors and filters for re-use.

    Introducing a cache to hold the projectors and filters for re-use.
    The cache is a LRU that can hold 100 entries.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    2cb9c0d View commit details
    Browse the repository at this point in the history
  53. [Gandiva] Fixed concurrency issue in cache.

    [Gandiva] Fixed concurrency issue in cache.
    
    Modifications were happening in get without a mutex.
    Wrote a test to verify and prevent regression.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    71aad3d View commit details
    Browse the repository at this point in the history
  54. [Gandiva] Fixed Literal ToString.

    Literal string coversion was ignoring types, leading
    to mismatch in hashing of expressions.
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    465e7e8 View commit details
    Browse the repository at this point in the history
  55. [Gandiva] Add support for sql regex functions

    - add a registry for "function holders" implemented in cpp
    - the function holder is instantiated at expression decomposition time
    - at eval time, the registered fn gets an extra param
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    28915eb View commit details
    Browse the repository at this point in the history
  56. [Gandiva] Add a helper library containing cpp stubs

    - To get around the java load issue, create a native library and load it in the LLVM module. 
       This module has the hooks for all the c++ function helpers.
    - for files that are compiled in libgandiva_helpers, add into  gandiva::helpers namespace.
    - merged status.cc into status.h
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    8beb066 View commit details
    Browse the repository at this point in the history
  57. [Gandiva] switch to a more efficient date impl

    - reduce benchmark iterations to 1M instead of 100M
    - add checks in benchmark test to verify that elapsed_time is
      atleast <= 2 * expected
    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    abb24a9 View commit details
    Browse the repository at this point in the history
  58. [Gandiva] switch from std::regex to re2

    pravindra authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    27673c8 View commit details
    Browse the repository at this point in the history
  59. Configuration menu
    Copy the full SHA
    b8e3492 View commit details
    Browse the repository at this point in the history
  60. ARROW-3182: [Gandiva] Integrate gandiva to arrow build. Update licens…

    …es to apache license.
    
    Fix clang-format, cpplint warnings, -Wconversion warnings and other warnings
    with -DBUILD_WARNING_LEVEL=CHECKIN. Fix some build toolchain issues, Arrow
    target dependencies. Remove some unused CMake code
    praveenbingo authored and wesm committed Sep 29, 2018
    Configuration menu
    Copy the full SHA
    8e9a915 View commit details
    Browse the repository at this point in the history