[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency #3233

yihua · 2021-07-07T04:59:43Z

What is the purpose of the pull request

This PR adds a new marker file strategy that optimizes the marker-related latency for file systems with non-trivial file I/O latency, such as Amazon S3. The existing marker file mechanism creates one marker file per data file written. When listing and deleting the marker files, S3 throttles the I/O operations if the number of marker files is huge and it can take 10 minutes for listing alone. When write concurrency is high, the marker creation can also be throttled by S3. Such behavior degrades the performance of the write. The new timeline-server-based marker file strategy delegates the marker creation and other marker-related operations from individual executors to the timeline server. The timeline server maintains the markers and consistency by periodically flushing the in-memory markers to a limited number of underlying files in the file system. In such a way, the number of actual file operations and latency related to markers can be reduced, thus improving the performance of the writes.

To improve the efficiency of processing marker creation requests, we design the batch processing in the handler of marker requests at the timeline server. Each marker creation request is handled asynchronously in the Javalin timeline server and queued before processing. For every batch interval, e.g., 20ms, a dispatching thread pulls the pending requests from the queue and sends them to the worker thread for processing. Each worker thread processes the marker creation requests, sets the responses, and flushes the new markers by overwriting the underlying file storing the markers in the file system. There can be multiple worker threads running concurrently, given that the file overwriting takes longer than the batch interval, and each worker thread writes to an exclusive file not touched by other threads, to guarantee consistency and correctness. Both the batch interval and the number of worker threads can be configured through the write options.

Note that the worker thread always checks whether the marker has already been created by comparing the marker name from the request with the memory copy of all markers maintained in the marker directory state class. The underlying files storing the markers are only read upon the first marker request (lazy loading). The responses of requests are only sent back once the new markers are flushed to the files, so that in the case of timeline server failure, the timeline server can recover the already created markers. These ensure consistency between the file system and the in-memory copy, and improve the performance of processing marker requests.

Since the new timeline-server-based marker mechanism is intended for cloud storage, it is not supported for HDFS at this point (an exception is thrown if timeline-server-based marker mechanism is configured for writes on HDFS).

Brief change log

New marker-related write options in HoodieWriteConfig
- hoodie.write.markers.type: marker type to use for the writes. Two modes are supported: direct and timeline_server_based.
- hoodie.markers.timeline_server_based.batch.num_threads: number of threads to use for batch processing marker creation requests at the timeline server for timeline-server-based marker strategy
- hoodie.markers.timeline_server_based.batch.interval_ms: the batch interval in milliseconds for marker creation batch processing for timeline-server-based marker strategy
Abstraction of marker mechanism (WriteMarkers class)
- Provides an interface for marker file strategy, containing different operations (create marker, delete marker directory, get all markers, etc.)
- Creates an enum, MarkerType, to indicate the marker type used for the current write
- Creates a new class, DirectWriteMarkers, for the existing mechanism which directly creates a marker file per data file in the file system from each executor
- Creates a factory class, WriteMarkersFactory, to generate WriteMarkers instance based on the marker type. Throws HoodieException if timeline_server_based is used for HDFS since the support for HDFS is not ready.
- Uses WriteMarkersFactory at all places where markers are needed (write handles, Java, Spark, Flink engines)
New timeline-server-based marker file implementation
- Creates a new class, TimelineServerBasedWriteMarkers, for the timeline-server-based marker file strategy, which sends requests to the timeline server for all marker-related operations
Handling marker requests at the timeline server
- Adds a class to store request URLs and parameters in MarkerOperation
- Creates MarkerDirState class to store the state of a marker directory and operate on the markers inside the directory
- Adds a future class, MarkerCreationFuture, to be able to process marker creation requests asynchronously
- Creates a request handler, MarkerHandler, for marker-related requests, which schedules the dispatching thread at a fixed rate based on the batching interval upon the first marker creation request
- Creates a MarkerCreationDispatchingRunnable for scheduling periodic, batched processing of marker creation requests
- Creates a BatchedMarkerCreationRunnable for batch processing marker creation requests
- Adds a context class, BatchedMarkerCreationContext, for passing the pending requests from the dispatching thread to the worker threads
New units around timeline-server-based marker file strategy
Code cleanup
- Refactoring the configuration of EmbeddedTimelineService, TimelineService and RequestHandler

Follow-up TODOs are summarized in this ticket: https://issues.apache.org/jira/projects/HUDI/issues/HUDI-2271

Verify this pull request

This direct marker file strategy is covered by existing tests in TestDirectWriteMarkers. This change adds unit tests in TestTimelineServerBasedWriteMarkers to verify the new timeline-server-based marker file strategy.

We have also manually verified the change by running multiple Spark jobs on large datasets in the cluster with different file system settings:

Hadoop/Yarn cluster with HDFS testing succeeds with both marker file strategies for 30GB input data (1000 parquet files, 400M records) producing 43k data files (Note that this is done by explicitly commenting out the exception throwing in the WriteMarkersFactory so the new timeline-server-based markers can be used on HDFS. By default, the job should throw an exception in this setup, which is also tested below. We plan to enhance the support of timeline-server-based markers for HDFS in follow-ups.)
- direct: 19.6 mins (1175943 ms)
- timeline-server-based: 19.7 mins (1179346 ms)
Amazon EMR with S3 testing succeeds with both marker file strategies for 100GB input data producing 165k data files, thanks to @nsivabalan 's help
- direct: 55 mins
- timeline-server-based: 38 mins
Testing of retried stages in Spark succeeded with expected cleanup of invalid data files and correct number of records in the table
- Enables spark speculative execution in the spark shell with the following config so that the Spark automatically kills the tasks beyond a certain task duration and retries with new attempts, emulating the failed stages intentionally
  - spark.speculation=true
  - spark.speculation.multiplier=1.0
  - spark.speculation.quantile=0.5
- Runs the same job with timeline-server-based marker file strategy
- Driver logs show that there are quite some tasks killed and retried: WARN scheduler.TaskSetManager: Lost task 1537.1 in stage 22.0 (TID 17375, executor 290): TaskKilled (another attempt succeeded)
- Executor logs show that invalid data files are indeed deleted at the end
  - Stage: Delete invalid files generated during the write operation, collect at HoodieSparkEngineContext.java:73
  - Log: 21/08/06 19:39:01 INFO table.HoodieTable: Deleting invalid data files=[(/tmp/temp_marker_test_table/asia/india/chennai,/tmp/temp_marker_test_table/asia/india/chennai/5dcae584-1563-4f13-9ce4-955864a2b616-38_162-12-2694_20210806192244.parquet),...
- Verifies that the number of records written to the table is the same as the input
  - val df2 = spark.read.parquet("/tmp/temp_marker_test_table/*/*/*/*.parquet")
  - df2.select("_hoodie_record_key").distinct.count outputs the same count (400M) as the input df
Testing of configuring timeline-server-based markers on HDFS and it throws HoodieException, which is expected:
- Caused by: org.apache.hudi.exception.HoodieException: Timeline-server-based markers are not supported for HDFS: base path /tmp/temp_marker_test_table

More details regarding the Hadoop/Yarn cluster with HDFS testing:

Preparing data

val df = spark.read.parquet("/tmp/testdata/*/*.parquet")

Spark shell command to run direct marker file strategy

spark.time(df.write.format("hudi").
|   option("hoodie.bulkinsert.shuffle.parallelism", "200").
|   option("hoodie.datasource.write.recordkey.field", "uuid").
|   option("hoodie.datasource.write.partitionpath.field", "partitionpath").
|   option("hoodie.datasource.write.operation", "bulk_insert").
|   option("hoodie.write.markers.type", "direct").
|   option("hoodie.parquet.max.file.size", "1048576").
|   option("hoodie.parquet.block.size", "1048576").
|   option("hoodie.table.name", "temp_marker_test_table").
|   mode(Overwrite).
|   save("/tmp/temp_marker_test_table/")
)

Spark shell command to run timeline-server-based marker file strategy

spark.time(df.write.format("hudi").
|   option("hoodie.bulkinsert.shuffle.parallelism", "200").
|   option("hoodie.datasource.write.recordkey.field", "uuid").
|   option("hoodie.datasource.write.partitionpath.field", "partitionpath").
|   option("hoodie.datasource.write.operation", "bulk_insert").
|   option("hoodie.write.markers.type", "timeline_server_based").
|   option("hoodie.markers.timeline_server_based.batch.num_threads", "20").
|   option("hoodie.markers.timeline_server_based.batch.interval_ms", "20").
|   option("hoodie.parquet.max.file.size", "1048576").
|   option("hoodie.parquet.block.size", "1048576").
|   option("hoodie.table.name", "temp_marker_test_table").
|   mode(Overwrite).
|   save("/tmp/temp_marker_test_table/")
)

Snapshot of files storing markers till the end of the job in timeline-server-based marker file strategy

hadoop fs -ls /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940
Found 20 items
251846 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS0
253029 2021-08-06 16:38 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS1
259197 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS10
252838 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS11
254775 2021-08-06 16:38 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS12
249139 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS13
278320 2021-08-06 16:38 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS14
266073 2021-08-06 16:38 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS15
252428 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS16
243916 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS17
260791 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS18
249626 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS19
252976 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS2
247380 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS3
253442 2021-08-06 16:38 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS4
248841 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS5
249906 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS6
246458 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS7
241654 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS8
243208 2021-08-06 16:39 /tmp/temp_marker_test_table/.hoodie/.temp/20210806161940/MARKERS9

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

hudi-bot · 2021-07-07T05:01:30Z

CI report:

2d22335c215ed620ce20018b1c83be189b7c70c6 UNKNOWN
230205edfab190cfaf687d0323ae8d704f425e1d UNKNOWN
77d1886c7f6c612adbe4ef81adc07075cd99bdb2 UNKNOWN
bdef8e64bdf1033b18dd2adea0dcb696976ab37e UNKNOWN
742c84c98600aae2d0837e9c3f29345b3f3c7729 UNKNOWN
336df80 UNKNOWN
8b05ab6 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run travis re-run the last Travis build
@hudi-bot run azure re-run the last Azure build

...udi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/MarkerFiles.java

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

yihua

I'm still addressing the rest of the reviews.

...lient/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java

...udi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

...-common/src/main/java/org/apache/hudi/common/table/view/RemoteHoodieTableFileSystemView.java

hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

...udi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java

...-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectMarkerFiles.java

hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java

codecov-commenter · 2021-07-20T10:54:20Z

Codecov Report

Merging #3233 (230205e) into master (a086d25) will decrease coverage by 20.20%.
The diff coverage is 5.03%.

@@              Coverage Diff              @@
##             master    #3233       +/-   ##
=============================================
- Coverage     47.74%   27.53%   -20.21%     
+ Complexity     5591     1325     -4266     
=============================================
  Files           938      394      -544     
  Lines         41823    15659    -26164     
  Branches       4213     1384     -2829     
=============================================
- Hits          19968     4312    -15656     
+ Misses        20070    11023     -9047     
+ Partials       1785      324     -1461

Flag	Coverage Δ
hudicli	`?`
hudiclient	`21.07% <5.26%> (-13.49%)`	⬇️
hudicommon	`?`
hudiflink	`?`
hudihadoopmr	`?`
hudisparkdatasource	`?`
hudisync	`4.88% <ø> (-51.10%)`	⬇️
huditimelineservice	`?`
hudiutilities	`59.78% <0.00%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../apache/hudi/client/AbstractHoodieWriteClient.java	`0.00% <0.00%> (ø)`
.../client/embedded/EmbeddedTimelineServerHelper.java	`0.00% <0.00%> (ø)`
.../hudi/client/embedded/EmbeddedTimelineService.java	`0.00% <0.00%> (ø)`
...java/org/apache/hudi/config/HoodieWriteConfig.java	`0.00% <0.00%> (-43.38%)`	⬇️
...ain/java/org/apache/hudi/io/HoodieWriteHandle.java	`0.00% <0.00%> (ø)`
...c/main/java/org/apache/hudi/table/HoodieTable.java	`0.00% <0.00%> (ø)`
...rg/apache/hudi/table/HoodieTimelineArchiveLog.java	`0.00% <0.00%> (ø)`
...le/action/rollback/BaseRollbackActionExecutor.java	`0.00% <0.00%> (ø)`
...rg/apache/hudi/table/marker/DirectMarkerFiles.java	`0.00% <0.00%> (ø)`
...java/org/apache/hudi/table/marker/MarkerFiles.java	`0.00% <0.00%> (ø)`
... and 622 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a086d25...230205e. Read the comment docs.

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

nsivabalan · 2021-07-21T20:11:50Z

...lient/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java

@@ -423,7 +424,8 @@ protected void preWrite(String instantTime, WriteOperationType writeOperationTyp
  protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, String instantTime, Option<Map<String, String>> extraMetadata) {
    try {
      // Delete the marker directory for the instant.
-      new MarkerFiles(table, instantTime).quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
+      MarkerFilesFactory.get(config.getMarkersIOMode(), table, instantTime)


something to remember.
We might have to add an upgrade step for migration from previous version to new one. I guess it is as simple as just cleaning up the old marker files and recreate marker in new format.

Right. I think the upgrade/downgrade functionality can be added in a follow-up PR if we'd like to merge this soon. I'll work on that.

do we really have to? as long as we can automatically "detect" the marker versions its fine right? Why migrate to this one (which involved one round of listing), just to do rollbacks?

This is a TODO in the follow-up PR: https://issues.apache.org/jira/projects/HUDI/issues/HUDI-2271

...udi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/MarkerFiles.java

.../hudi-client-common/src/main/java/org/apache/hudi/table/marker/TimelineBasedMarkerFiles.java

...c/test/java/org/apache/hudi/table/action/rollback/TestCopyOnWriteRollbackActionExecutor.java

hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

vinothchandar

High level comments. Main thing are the changes to TableFileSystemView, need to be avoided. Also we need to deduce the marker type dynamically.. i.e write a MARKERS.type file also in marker_dir and use it for rollback purposes. Actual writing uses the write configs.

Doing second pass on actual implementaiton/algorithms

vinothchandar · 2021-07-23T01:01:38Z

...lient/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java

@@ -423,7 +424,8 @@ protected void preWrite(String instantTime, WriteOperationType writeOperationTyp
  protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, String instantTime, Option<Map<String, String>> extraMetadata) {
    try {
      // Delete the marker directory for the instant.
-      new MarkerFiles(table, instantTime).quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
+      MarkerFilesFactory.get(config.getMarkersIOMode(), table, instantTime)


do we really have to? as long as we can automatically "detect" the marker versions its fine right? Why migrate to this one (which involved one round of listing), just to do rollbacks?

...lient-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineServerHelper.java

...udi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java

hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java

hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

...udi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java

hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java

hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java

vinothchandar

Probably one round of refining and we can land

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

...client-common/src/main/java/org/apache/hudi/table/marker/TimelineServerBasedMarkerFiles.java

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

vinothchandar · 2021-07-27T01:15:58Z

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map<String, Set<String>> allMarkersMap = new HashMap<>();


So we can handle multiple concurrent writers even? Which may all be writing to their own marker directory?

Yes, that's the intention behind this map. Each writer should write to an independent marker directory and the markers are maintained based on the marker directory containing the instant time.

The multiple concurrent writers haven't been tested against the timeline-server-based markers, though it is designed to consider such case. I created a TODO to verify this later on: https://issues.apache.org/jira/projects/HUDI/issues/HUDI-2271

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

vinothchandar · 2021-07-27T03:58:36Z

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

+        createMarkerFutures.removeAll(futuresToRemove);
+      }
+      LOG.info("Flush to MARKERS file .. ");
+      flushMarkersToFile(updatedMarkerDirPaths, markerFileIndex);


If we fail here, how do we notify the futures of the failure?

The futures time out in 30 seconds for now. We can optimize this later on.

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

yihua

I'm working on adding the MARKERS.type and the switching logic between different types of markers, and addressing other style issues.

nsivabalan

I feel the logic in MarkerDirState and associated code can be simplified.
I see we are having 4 synchronized blocks w/ 4 diff vars. It makes the code more complex and hard to maintain in the long run.
Let's sync up sometime today or tomorrow. We can see how to simplify it.
Some of the code blocks don't need a sync block as such. For some, we reorder some operations to see if we can club some sync blocks together.

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

...-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectMarkerFiles.java

hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java

...t/hudi-spark-client/src/main/java/org/apache/hudi/table/upgrade/ZeroToOneUpgradeHandler.java

hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/marker/TestMarkerFiles.java

...e-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

...e-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java

...java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationDispatchingRunnable.java

… style comments

…ionBatchingRunnable to MarkerCreationDispatchingRunnable

This reverts commit bdef8e64bdf1033b18dd2adea0dcb696976ab37e.

hudi-bot · 2021-08-08T02:24:59Z

CI report:

2d22335c215ed620ce20018b1c83be189b7c70c6 UNKNOWN
230205edfab190cfaf687d0323ae8d704f425e1d UNKNOWN
77d1886c7f6c612adbe4ef81adc07075cd99bdb2 UNKNOWN
bdef8e64bdf1033b18dd2adea0dcb696976ab37e UNKNOWN
742c84c98600aae2d0837e9c3f29345b3f3c7729 UNKNOWN
627374242e3726e1206490dea1e15b78cde1be23 Azure: FAILURE Azure: PENDING

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run travis re-run the last Travis build
@hudi-bot run azure re-run the last Azure build

hudi-bot · 2021-08-08T02:27:39Z

CI report:

2d22335c215ed620ce20018b1c83be189b7c70c6 UNKNOWN
230205edfab190cfaf687d0323ae8d704f425e1d UNKNOWN
77d1886c7f6c612adbe4ef81adc07075cd99bdb2 UNKNOWN
bdef8e64bdf1033b18dd2adea0dcb696976ab37e UNKNOWN
742c84c98600aae2d0837e9c3f29345b3f3c7729 UNKNOWN
Unknown: CANCELED
b7d7187 UNKNOWN

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run travis re-run the last Travis build
@hudi-bot run azure re-run the last Azure build

nsivabalan · 2021-08-08T02:34:45Z

@hudi-bot run azure

hudi-bot · 2021-08-08T02:48:46Z

CI report:

2d22335c215ed620ce20018b1c83be189b7c70c6 UNKNOWN
230205edfab190cfaf687d0323ae8d704f425e1d UNKNOWN
77d1886c7f6c612adbe4ef81adc07075cd99bdb2 UNKNOWN
bdef8e64bdf1033b18dd2adea0dcb696976ab37e UNKNOWN
742c84c98600aae2d0837e9c3f29345b3f3c7729 UNKNOWN
627374242e3726e1206490dea1e15b78cde1be23 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run travis re-run the last Travis build
@hudi-bot run azure re-run the last Azure build

yihua · 2021-08-08T03:05:38Z

@hudi-bot run azure

vinothchandar

Logic looks good to me overall. Added comments on naming, synchronization and more corner cases to think about

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java

...n/java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationCompletableFuture.java

...java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationDispatchingRunnable.java

...in/java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationBatchingRunnable.java

...e-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java

vinothchandar · 2021-08-09T15:40:01Z

...e-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java

+    } catch (IOException ioe) {
+      throw new HoodieIOException(ioe.getMessage(), ioe);
+    }
+    allMarkers.clear();


any synchrnoization needed? for e.g if there were concurent adds, this will throw a ConcurrentModificationException. Just think through those scenarios?

Based on the write process, for the same marker directory, there should not be any other requests when the deletion is happening, so we don't put a synchronization block here.

...e-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java

vinothchandar · 2021-08-09T15:46:13Z

...e-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java

+    try {
+      fsDataOutputStream = fileSystem.create(markersFilePath);
+      bufferedWriter = new BufferedWriter(new OutputStreamWriter(fsDataOutputStream, StandardCharsets.UTF_8));
+      bufferedWriter.write(fileMarkersMap.get(markerFileIndex).toString());


reg whether this works on HDFS. I think this write can fail midway, right? then we would n't be able to recover this partial file read? (or all the marker creations will timeout, causing new requests to be made and this marker file will be overwritten?)

The failed overwrite on HDFS can be a problem. As discussed offline, we'll throw an exception for timeline-based markers on HDFS for now and fix the issue later.

nsivabalan · 2021-08-11T15:46:43Z

Looks good. Awesome job on the patch! Great contribution :)

…ing marker-related latency (apache#3233) - Can be enabled for cloud stores like S3. Not supported for hdfs yet, due to partial write failures.

yihua force-pushed the HUDI-1138-marker-files branch 2 times, most recently from 42d1e64 to de5c236 Compare July 7, 2021 06:14

nsivabalan requested changes Jul 7, 2021

View reviewed changes

nsivabalan reviewed Jul 8, 2021

View reviewed changes

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java Outdated Show resolved Hide resolved

yihua force-pushed the HUDI-1138-marker-files branch from ab0bf2d to 2d22335 Compare July 14, 2021 05:00

yihua commented Jul 14, 2021

View reviewed changes

yihua force-pushed the HUDI-1138-marker-files branch from 2d22335 to 09c910a Compare July 14, 2021 05:38

lw309637554 reviewed Jul 14, 2021

View reviewed changes

...udi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java Outdated Show resolved Hide resolved

lw309637554 reviewed Jul 14, 2021

View reviewed changes

...-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectMarkerFiles.java Outdated Show resolved Hide resolved

lw309637554 reviewed Jul 14, 2021

View reviewed changes

hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java Outdated Show resolved Hide resolved

vinothchandar added this to Ready for Review in PR Tracker Board Jul 16, 2021

vinothchandar assigned vinothchandar and nsivabalan Jul 16, 2021

vinothchandar added the priority:blocker label Jul 16, 2021

yihua force-pushed the HUDI-1138-marker-files branch 2 times, most recently from 7d4d2b5 to 02f953f Compare July 20, 2021 07:08

yihua commented Jul 20, 2021

View reviewed changes

...-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java Outdated Show resolved Hide resolved

nsivabalan reviewed Jul 21, 2021

View reviewed changes

vinothchandar changed the title ~~[WIP][HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency~~ [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency Jul 23, 2021

vinothchandar reviewed Jul 23, 2021

View reviewed changes

yihua commented Jul 26, 2021

View reviewed changes

yihua force-pushed the HUDI-1138-marker-files branch from 1323318 to 1d7f4a2 Compare July 26, 2021 07:08

vinothchandar reviewed Jul 27, 2021

View reviewed changes

yihua force-pushed the HUDI-1138-marker-files branch from cc99aa7 to e519f64 Compare July 31, 2021 07:19

yihua commented Jul 31, 2021

View reviewed changes

nsivabalan requested changes Jul 31, 2021

View reviewed changes

PR Tracker Board automation moved this from Ready for Review to Nearing Landing Jul 31, 2021

nsivabalan reviewed Jul 31, 2021

View reviewed changes

...java/org/apache/hudi/timeline/service/handlers/marker/MarkerCreationDispatchingRunnable.java Outdated Show resolved Hide resolved

yihua added 15 commits August 7, 2021 19:24

Addres more style issues

cffde2e

Refactor MarkerHandler for better abstraction

abe3b99

Rebase master and fix build issues

9a2b135

Refactor batching logic in MarkerHandler in timeline server

3bed7a8

Adjust scheduling logic of marker creation reqruests and address some…

8e61d6e

… style comments

Fix style issues

378bf0a

Fix exception handling

5e3767c

Move lazy initialization of markers from file system from MarkerCreat…

92b3c59

…ionBatchingRunnable to MarkerCreationDispatchingRunnable

Address review comments

21257a9

Address additional review comments

c00494d

Address FSUtils and FileIOUtils related comments

1a6ae40

Fix around checking marker direcotry

4be31e4

Fix NPE in timeline server

18cdd9c

Revert "Address FSUtils and FileIOUtils related comments"

d82aebf

This reverts commit bdef8e64bdf1033b18dd2adea0dcb696976ab37e.

Fix timer

b7d7187

yihua force-pushed the HUDI-1138-marker-files branch from 6273742 to b7d7187 Compare August 8, 2021 02:24

vinothchandar reviewed Aug 9, 2021

View reviewed changes

vinothchandar approved these changes Aug 9, 2021

View reviewed changes

yihua added 3 commits August 10, 2021 17:56

Address comments around naming, synchronization and style

4505647

Address additional comments and revise javadocs

336df80

Throws exception for timeline-server-based markers on HDFS

8b05ab6

nsivabalan merged commit 4783176 into apache:master Aug 11, 2021

PR Tracker Board automation moved this from Nearing Landing to Done Aug 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency #3233

[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency #3233

yihua commented Jul 7, 2021 •

edited

hudi-bot commented Jul 7, 2021 •

edited

yihua left a comment

codecov-commenter commented Jul 20, 2021 •

edited

nsivabalan Jul 21, 2021 •

edited

yihua Jul 21, 2021

vinothchandar Jul 23, 2021

yihua Aug 7, 2021 •

edited

vinothchandar left a comment •

edited

vinothchandar Jul 23, 2021

vinothchandar left a comment

vinothchandar Jul 27, 2021

yihua Jul 27, 2021

yihua Aug 7, 2021 •

edited

vinothchandar Jul 27, 2021

yihua Aug 7, 2021

yihua left a comment

nsivabalan left a comment

hudi-bot commented Aug 8, 2021

hudi-bot commented Aug 8, 2021

nsivabalan commented Aug 8, 2021

hudi-bot commented Aug 8, 2021

yihua commented Aug 8, 2021

vinothchandar left a comment

vinothchandar Aug 9, 2021

yihua Aug 11, 2021

vinothchandar Aug 9, 2021

yihua Aug 11, 2021

nsivabalan commented Aug 11, 2021

[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency #3233

[HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency #3233

Conversation

yihua commented Jul 7, 2021 • edited

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

hudi-bot commented Jul 7, 2021 • edited

CI report:

yihua left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 20, 2021 • edited

Codecov Report

nsivabalan Jul 21, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yihua Aug 7, 2021 • edited

Choose a reason for hiding this comment

vinothchandar left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinothchandar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yihua Aug 7, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yihua left a comment

Choose a reason for hiding this comment

nsivabalan left a comment

Choose a reason for hiding this comment

hudi-bot commented Aug 8, 2021

CI report:

hudi-bot commented Aug 8, 2021

CI report:

nsivabalan commented Aug 8, 2021

hudi-bot commented Aug 8, 2021

CI report:

yihua commented Aug 8, 2021

vinothchandar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsivabalan commented Aug 11, 2021

yihua commented Jul 7, 2021 •

edited

hudi-bot commented Jul 7, 2021 •

edited

codecov-commenter commented Jul 20, 2021 •

edited

nsivabalan Jul 21, 2021 •

edited

yihua Aug 7, 2021 •

edited

vinothchandar left a comment •

edited

yihua Aug 7, 2021 •

edited