Skip to content

[HUDI-8126] Use union to parallelize data and error table writes#12813

Merged
nsivabalan merged 5 commits intoapache:masterfrom
vinishjail97:HUDI-8126_error-table
Feb 23, 2025
Merged

[HUDI-8126] Use union to parallelize data and error table writes#12813
nsivabalan merged 5 commits intoapache:masterfrom
vinishjail97:HUDI-8126_error-table

Conversation

@vinishjail97
Copy link
Contributor

Change Logs

Enable writing of error and data table in parallel. This behavior is disabled by default and can enabled by setting error table config property: hoodie.errortable.write.union.enable to true.

Impact

The DAG's for data table + error table are sequential today, this change executes them in a union to better utilize the executor resources in the spark driver.

Risk level (write none, low medium or high below)

Low.

Documentation Update

  public static final ConfigProperty<Boolean> ENABLE_ERROR_TABLE_WRITE_UNIFICATION = ConfigProperty
      .key("hoodie.errortable.write.union.enable")
      .defaultValue(false)
      .withDocumentation("Enable error table union with data table when writing for improved commit performance. "
          + "By default it is disabled meaning data table and error table writes are sequential");
### Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 8, 2025
@nsivabalan
Copy link
Contributor

@the-other-tim-brown @rmahindra : Can you folks review this. once everything looks good and CI is green, lmk. I can help land the patch.

.defaultValue(ErrorWriteFailureStrategy.ROLLBACK_COMMIT.name())
.withDocumentation("The config specifies the failure strategy if error table write fails. "
+ "Use one of - " + Arrays.toString(ErrorWriteFailureStrategy.values()));
public static final ConfigProperty<Boolean> ENABLE_ERROR_TABLE_WRITE_UNIFICATION = ConfigProperty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we really need a flag for this. This seems like it will be more performant for users with the error table writer enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error table write implementation can't do a union if they implement the upsertAndCommit method so still need to support it behind a feature flag to avoid breaking things for existing users.

@vinishjail97
Copy link
Contributor Author

@nsivabalan How do I see logs for azure run ? Can we re-trigger the CI run again if possible ?
Nothing to show. Final logs are missing. This can happen when the job is cancelled or times out.

@vinishjail97
Copy link
Contributor Author

This seems to be because of testReadArchivedCommitsIncrementally, azure failed in another PR as well.

https://dev.azure.com/apachehudi/hudi-oss-ci/_build/results?buildId=3732&view=logs&j=d8698f62-59df-5d60-659e-8e4b90e4e5ba&t=7e6a176b-fa3b-5e2b-bced-8d295e83a4a8

@nsivabalan
Copy link
Contributor

have retriggered

@vinishjail97
Copy link
Contributor Author

Rebased with latest master.

@vinishjail97
Copy link
Contributor Author

Jacoco failures.

Jacoco CLI jar: jacoco-lib/lib/jacococli.jar
Hudi source directory: /home/vsts/work/1/s
[INFO] Loading execution data file /home/vsts/work/1/s/hudi-client/hudi-spark-client/target/jacoco-agent/jacoco1.exec.
[INFO] Loading execution data file /home/vsts/work/1/s/hudi-client/hudi-spark-client/target/jacoco-agent/jacoco2.exec.
Exception in thread "main" java.io.EOFException
	at java.base/java.io.DataInputStream.readUnsignedShort(DataInputStream.java:345)
	at java.base/java.io.DataInputStream.readUTF(DataInputStream.java:594)
	at java.base/java.io.DataInputStream.readUTF(DataInputStream.java:569)
	at org.jacoco.cli.internal.core.data.ExecutionDataReader.readExecutionData(ExecutionDataReader.java:149)
	at org.jacoco.cli.internal.core.data.ExecutionDataReader.readBlock(ExecutionDataReader.java:116)
	at org.jacoco.cli.internal.core.data.ExecutionDataReader.read(ExecutionDataReader.java:93)
	at org.jacoco.cli.internal.core.tools.ExecFileLoader.load(ExecFileLoader.java:60)
	at org.jacoco.cli.internal.core.tools.ExecFileLoader.load(ExecFileLoader.java:74)
	at org.jacoco.cli.internal.commands.Merge.loadExecutionData(Merge.java:61)
	at org.jacoco.cli.internal.commands.Merge.execute(Merge.java:45)
	at org.jacoco.cli.internal.Main.execute(Main.java:90)
	at org.jacoco.cli.internal.Main.main(Main.java:105)

##[error]Bash exited with code '1'.

@vinishjail97
Copy link
Contributor Author

vinishjail97 commented Feb 21, 2025

Test failures.

2025-02-21T19:32:37.9608824Z [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.143 s - in org.apache.hudi.utilities.TestManifestFileWriterSpark
2025-02-21T19:32:39.5325904Z [INFO] 
2025-02-21T19:32:39.5331397Z [INFO] Results:
2025-02-21T19:32:39.5338649Z [INFO] 
2025-02-21T19:32:39.5361383Z [ERROR] Errors: 
2025-02-21T19:32:39.5934520Z [ERROR]   TestHoodieIncrSourceE2E.testSyncE2ENoPrevCkpThenSyncMultipleTimes:284 » HoodieIO
2025-02-21T19:32:39.5935804Z [ERROR]   TestHoodieIncrSourceE2E.testSyncE2ENoPrevCkpThenSyncMultipleTimes:284 » HoodieIO
2025-02-21T19:32:39.5936486Z [ERROR]   TestHoodieIncrSourceE2EAutoUpgrade.testSyncE2ENoPrevCkpThenSyncMultipleTimes:221 » HoodieIO
2025-02-21T19:32:39.5936840Z [INFO] 
2025-02-21T19:32:39.5937117Z [ERROR] Tests run: 771, Failures: 0, Errors: 3, Skipped: 6
2025-02-21T19:32:39.5937390Z [INFO] 
2025-02-21T19:32:39.6887020Z [INFO] ------------------------------------------------------------------------
2025-02-21T19:32:39.6890829Z [INFO] BUILD FAILURE
2025-02-21T19:32:39.6895163Z [INFO] ------------------------------------------------------------------------

@nsivabalan
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit cb32e5e into apache:master Feb 23, 2025
43 checks passed
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 8, 2025
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 8, 2025
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 9, 2025
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-1.0.2 size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants