Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-4081][HUDI-4472] Addressing Spark SQL vs Spark DS performance gap #6213

Merged
merged 26 commits into from
Jul 28, 2022

Commits on Jul 28, 2022

  1. Tidying up

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    3d55f22 View commit details
    Browse the repository at this point in the history
  2. Avoid unnecessary RDD dereferencing of the Dataset

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    47e86f3 View commit details
    Browse the repository at this point in the history
  3. Tidying up

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    ed0b9c9 View commit details
    Browse the repository at this point in the history
  4. Cleaned up query ouput alignment seq to do proper validations and avo…

    …id unnecessary conversions;
    
    Tidying up
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    c5829c9 View commit details
    Browse the repository at this point in the history
  5. Relaxed ordering requirement

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    d6c34b0 View commit details
    Browse the repository at this point in the history
  6. Fixing compilation

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    4be35f1 View commit details
    Browse the repository at this point in the history
  7. Fixed validating sequence to properly assert whether query output con…

    …forms to the expected table's schema
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    0188b77 View commit details
    Browse the repository at this point in the history
  8. Fixed partition-spec assertion;

    Tidying up
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    023be63 View commit details
    Browse the repository at this point in the history
  9. Fixing tests

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    a8b0a25 View commit details
    Browse the repository at this point in the history
  10. Fixed invalid ref

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    27bb52f View commit details
    Browse the repository at this point in the history
  11. Duct-tape the issue of incorrect schema handling in `HoodieSparkSqlWr…

    …iter`
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    225d037 View commit details
    Browse the repository at this point in the history
  12. Revert back to relative-order based mathing (no name lookup)

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    c7f2688 View commit details
    Browse the repository at this point in the history
  13. Simplify scehma reconciliation, schema evolution handling

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    ec3cbfb View commit details
    Browse the repository at this point in the history
  14. Properly reconcile nullability attributes

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    b2f5007 View commit details
    Browse the repository at this point in the history
  15. Rebased InsertIntoHoodieTableCommand to rely on Spark's `TableSchem…

    …aResolver` instead of bespoke implementation
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    193fa73 View commit details
    Browse the repository at this point in the history
  16. Fixed tests

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    813ea6f View commit details
    Browse the repository at this point in the history
  17. Extracted query output resolving/reshaping into `HoodieCatalystPlanUt…

    …ils`
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    f7dae90 View commit details
    Browse the repository at this point in the history
  18. Fixed tests

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    83bec46 View commit details
    Browse the repository at this point in the history
  19. Added new method TableSchemaResolver#getTableLatestAvroSchema to re…

    …turn most recent table's schema;
    
    Rebased `HoodieSparkSqlWriter` onto `TableSchemaResolver#getTableLatestAvroSchema`
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    f37753a View commit details
    Browse the repository at this point in the history
  20. Fixing tests

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    52a46e8 View commit details
    Browse the repository at this point in the history
  21. Tidying up

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    3fa6184 View commit details
    Browse the repository at this point in the history
  22. Refactored schema handling in HoodieSparkSqlWriter to make sure all…

    … cases are handled correctly:
    
      - Full Schema Evolution
      - Schema reconciliation (w/o FSE)
      - No schema reconciliation at all
    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    e089df9 View commit details
    Browse the repository at this point in the history
  23. Reverting changes in TableSchemaResolver

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    0088598 View commit details
    Browse the repository at this point in the history
  24. Fixed tests

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    40b432e View commit details
    Browse the repository at this point in the history
  25. Fixed more tests (for Spark 2)

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    d098342 View commit details
    Browse the repository at this point in the history
  26. Tidying up

    Alexey Kudinkin committed Jul 28, 2022
    Configuration menu
    Copy the full SHA
    4a639a4 View commit details
    Browse the repository at this point in the history