Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1437] support more accurate spark JobGroup for better performance tracking #2318

Closed
wants to merge 2 commits into from

Conversation

lw309637554
Copy link
Contributor

Tips

What is the purpose of the pull request

some description in spark ui is not reality, Not good for performance tracking.
support more accuracy spark JobGroup for better performance tracking.

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@lw309637554 lw309637554 changed the title [HUDI-1437] support more spark JobGroup for better performance tracking [HUDI-1437] support more accurate spark JobGroup for better performance tracking Dec 9, 2020
@codecov-io
Copy link

codecov-io commented Dec 9, 2020

Codecov Report

Merging #2318 (57162d5) into master (3a91d26) will decrease coverage by 43.08%.
The diff coverage is 20.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #2318       +/-   ##
=============================================
- Coverage     53.49%   10.40%   -43.09%     
+ Complexity     2788       48     -2740     
=============================================
  Files           355       51      -304     
  Lines         16169     1787    -14382     
  Branches       1650      213     -1437     
=============================================
- Hits           8649      186     -8463     
+ Misses         6819     1588     -5231     
+ Partials        701       13      -688     
Flag Coverage Δ Complexity Δ
hudicli ? ?
hudiclient ? ?
hudicommon ? ?
hudihadoopmr ? ?
hudispark ? ?
huditimelineservice ? ?
hudiutilities 10.40% <20.00%> (-59.70%) 0.00 <1.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...apache/hudi/utilities/deltastreamer/DeltaSync.java 0.00% <0.00%> (-70.55%) 0.00 <0.00> (-49.00)
...i/utilities/deltastreamer/SourceFormatAdapter.java 0.00% <0.00%> (-86.49%) 0.00 <0.00> (-11.00)
...in/java/org/apache/hudi/utilities/UtilHelpers.java 35.54% <100.00%> (-28.92%) 11.00 <1.00> (-21.00)
...va/org/apache/hudi/utilities/IdentitySplitter.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-2.00%)
...va/org/apache/hudi/utilities/schema/SchemaSet.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-3.00%)
...a/org/apache/hudi/utilities/sources/RowSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
.../org/apache/hudi/utilities/sources/AvroSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-1.00%)
.../org/apache/hudi/utilities/sources/JsonSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-1.00%)
...rg/apache/hudi/utilities/sources/CsvDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-10.00%)
...g/apache/hudi/utilities/sources/JsonDFSSource.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
... and 307 more

lw309637554 and others added 2 commits December 10, 2020 22:58
* Fix flaky MOR unit test

* Update Spark APIs to make it be compatible with both spark2 & spark3

* Refactor bulk insert v2 part to make Hudi be able to compile with Spark3

* Add spark3 profile to handle fasterxml & spark version

* Create hudi-spark-common module & refactor hudi-spark related modules

Co-authored-by: Wenning Ding <wenningd@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants