-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7099] Providing metrics for archive and defining some string constants #10101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, left some minor comments.
this.logCompactionTimerName = getMetricsName(TIMER_ACTION, HoodieTimeline.LOG_COMPACTION_ACTION); | ||
this.indexTimerName = getMetricsName(TIMER_ACTION, INDEX_ACTION); | ||
this.conflictResolutionTimerName = getMetricsName(TIMER_ACTION, CONFLICT_RESOLUTION_STR); | ||
this.conflictResolutionSuccessCounterName = getMetricsName(COUNTER_ACTION, CONFLICT_RESOLUTION_STR + ".success"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
CONFLICT_RESOLUTION_STR + ".success" -> CONFLICT_RESOLUTION_SUCCESS_STR
CONFLICT_RESOLUTION_STR + ".failure" -> CONFLICT_RESOLUTION_FAILURE_STR
.requested -> HoodieTimeline.REQUESTED_EXTENSION
.completed -> HoodieTimeline.COMPLETED_EXTENSION
} | ||
} | ||
|
||
public void updateArchiveMetrics(long durationInMs, int numFilesDeleted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numFilesDeleted -> numInstantsArchived?
metrics.registerGauge(getMetricsName("finalize", "duration"), durationInMs); | ||
metrics.registerGauge(getMetricsName("finalize", "numFilesFinalized"), numFilesFinalized); | ||
metrics.registerGauge(getMetricsName(FINALIZE_ACTION, DURATION_STR), durationInMs); | ||
metrics.registerGauge(getMetricsName(FINALIZE_ACTION, FINALIZED_FILES_NUM_STR), numFilesFinalized); | ||
} | ||
} | ||
|
||
public void updateIndexMetrics(final String action, final long durationInMs) { | ||
if (config.isMetricsOn()) { | ||
LOG.info(String.format("Sending index metrics (%s.duration, %d)", action, durationInMs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also update the string literal in the log here
@@ -255,48 +277,57 @@ private void updateCommitTimingMetrics(long commitEpochTimeInMs, long durationIn | |||
Pair<Option<Long>, Option<Long>> eventTimePairMinMax = metadata.getMinAndMaxEventTime(); | |||
if (eventTimePairMinMax.getLeft().isPresent()) { | |||
long commitLatencyInMs = commitEpochTimeInMs + durationInMs - eventTimePairMinMax.getLeft().get(); | |||
metrics.registerGauge(getMetricsName(actionType, "commitLatencyInMs"), commitLatencyInMs); | |||
metrics.registerGauge(getMetricsName(actionType, COMMIT_LATENCY_STR), commitLatencyInMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COMMIT_LATENCY_STR -> COMMIT_LATENCY_IN_MS_STR
@@ -117,6 +122,10 @@ public boolean archiveIfRequired(HoodieEngineContext context, boolean acquireLoc | |||
} else { | |||
LOG.info("No Instants to archive"); | |||
} | |||
if (success && timerContext != null) { | |||
long durationMs = metrics.getDurationInMs(timerContext.stop()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the metrics handling to the write client or the service client, the cleaning and rollback alreay follow this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does indeed make the style more consistent, but there is one issue – the archive action does not return any metadata, so we are unable to obtain information about the cleaned instants. Do we need to modify the return content of archiveIfRequired to support this improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, need to think through what the return type should look like.
1dddd78
to
3a4f9e9
Compare
Now, the archive metrics have been moved to |
@@ -256,8 +256,8 @@ public void testArchiveEmptyTable() throws Exception { | |||
metaClient = HoodieTableMetaClient.reload(metaClient); | |||
HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); | |||
HoodieTimelineArchiver archiver = new HoodieTimelineArchiver(cfg, table); | |||
boolean result = archiver.archiveIfRequired(context); | |||
assertTrue(result); | |||
int result = archiver.archiveIfRequired(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add some tests for archive metrics like what we done for compaction metrics in #8759.
// triggers compaction and cleaning only after archiving action | ||
this.timelineWriter.compactAndClean(context); | ||
} else { | ||
LOG.info("No Instants to archive"); | ||
} | ||
return success; | ||
return instantsToArchive.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also include the success
flag also as a metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current archive code, it seems that success
is always set to true. The success
variable is initialized as true, and deleteArchivedInstants
always returns true unless it fails. However, if there is a failure, the current implementation should throw an exception and terminate without returning any value. Therefore, I believe the success
variable is meaningless in the current context. Alternatively, should we catch these exceptions and return false? I'm not sure if this would be reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can address it in another PR, we kind of do not have atomicity for current 2 steps:
- flush of archived timeline
- deletion of active metadata files
My roughly thought is we can fix the left over active metadata files (should be deleted from active timeline) in the next round of archiving, imagine the latest instant time in archived timeline is t10 and the oldest instant in active timeline is t7, we should retry the deletion of instant metadat files from t7 ~ 10 at the very beginning.
3a4f9e9
to
29b2f98
Compare
commit dfa3bde Merge: bfc0a85 473cf9a Author: Jonathan Vexler <=> Date: Wed Nov 29 15:01:45 2023 -0500 Merge branch 'master' into fg_reader_implement_bootstrap commit bfc0a85 Author: Jonathan Vexler <=> Date: Wed Nov 29 14:55:57 2023 -0500 fix bug with nested required fields due to spark nested schema pruning bug commit 473cf9a Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Date: Wed Nov 29 08:37:40 2023 -0800 [HUDI-7138] Fix error table writer and schema registry provider (apache#10173) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> commit 91eabab Author: Lin Liu <141371752+linliu-code@users.noreply.github.com> Date: Tue Nov 28 23:49:37 2023 -0800 [HUDI-7103] Support time travel queies for COW tables (apache#10109) This is based on HadoopFsRelation. commit b300728 Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Date: Tue Nov 28 22:31:12 2023 -0800 [HUDI-7086] Fix the default for gcp pub sub max sync time to 1min (apache#10171) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> commit 8370c62 Author: Shiyan Xu <2701446+xushiyan@users.noreply.github.com> Date: Tue Nov 28 22:31:34 2023 -0600 [HUDI-7149] Add a dbt example project with CDC capability (apache#10192) commit 817d81a Author: zhuanshenbsj1 <34104400+zhuanshenbsj1@users.noreply.github.com> Date: Wed Nov 29 11:46:20 2023 +0800 [MINOR] Add log to print wrong number of instant metadata files (apache#10196) commit cadeade Author: leixin <1403342953@qq.com> Date: Wed Nov 29 11:45:24 2023 +0800 [minor] when metric prefix length is 0 ignore the metric prefix (apache#10190) Co-authored-by: leixin1 <leixin1@jd.com> commit 91daa7d Author: Lin Liu <141371752+linliu-code@users.noreply.github.com> Date: Tue Nov 28 19:03:50 2023 -0800 [HUDI-7102] Fix bugs related to time travel queries (apache#10102) commit d1dfa5b Author: Dongsj <90449228+eric9204@users.noreply.github.com> Date: Wed Nov 29 10:49:38 2023 +0800 [HUDI-7148] Add an additional fix to the potential thread insecurity problem of heartbeat client (apache#10188) Co-authored-by: dongsj <dongsj@asiainfo.com> commit b0b711e Author: Jonathan Vexler <=> Date: Tue Nov 28 21:35:20 2023 -0500 nested schema kinda fix commit 77cfb3a Author: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Date: Wed Nov 29 09:46:53 2023 +0800 [HUDI-7147] Fix CDC write flush bug (apache#10186) * Using iterator instead of values to avoid unsupported operation exception * check style commit b144ee0 Author: Jon Vexler <jbvexler@gmail.com> Date: Tue Nov 28 14:23:46 2023 -0500 Update hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> commit 89fab14 Author: Jonathan Vexler <=> Date: Tue Nov 28 14:23:03 2023 -0500 fix failing tests and address some of sagar pr review commit 675abf1 Author: Tim Brown <tim@onehouse.ai> Date: Mon Nov 27 23:21:56 2023 -0600 [MINOR] Schema Converter should use default identity transform if not specified (apache#10178) commit 5450aff Author: Jonathan Vexler <=> Date: Mon Nov 27 22:21:06 2023 -0500 disable vector for bootstrap commit fb062df Author: Danny Chan <yuzhao.cyz@gmail.com> Date: Tue Nov 28 10:52:33 2023 +0800 [Minor] Fix the flaky tests in TestRemoteHoodieTableFileSystemView (apache#10179) commit 3ae4d30 Author: Jonathan Vexler <=> Date: Mon Nov 27 21:07:17 2023 -0500 fix various issues that caused failing tests commit a045da6 Author: Jonathan Vexler <=> Date: Mon Nov 27 18:00:46 2023 -0500 see if this works commit 91be81a Author: Jonathan Vexler <=> Date: Mon Nov 27 17:07:30 2023 -0500 use java to create unary operator commit c22d1db Merge: 38b2603 4c3a1db Author: Jonathan Vexler <=> Date: Mon Nov 27 15:56:39 2023 -0500 Merge branch 'master' into fg_reader_implement_bootstrap commit 38b2603 Author: Jonathan Vexler <=> Date: Mon Nov 27 15:42:22 2023 -0500 set precombine in test commit 2a9a363 Author: Jonathan Vexler <=> Date: Mon Nov 27 13:27:38 2023 -0500 try to fix scala2.11 unary operator issue commit 60bdf14 Author: Jonathan Vexler <=> Date: Mon Nov 27 13:02:16 2023 -0500 try fix ci commit 4c3a1db Author: majian <47964462+majian1998@users.noreply.github.com> Date: Mon Nov 27 16:44:25 2023 +0800 [HUDI-7110][FOLLOW-UP] Improve call procedure for show column stats information (apache#10169) commit 499423c Author: zhuanshenbsj1 <34104400+zhuanshenbsj1@users.noreply.github.com> Date: Sun Nov 26 10:13:46 2023 +0800 [HUDI-7041] Optimize the memory usage of timeline server for table service (apache#10002) commit 4f875ed Author: Y Ethan Guo <ethan.guoyihua@gmail.com> Date: Sat Nov 25 15:10:37 2023 -0800 [HUDI-7139] Fix operation type for bulk insert with row writer in Hudi Streamer (apache#10175) This commit fixes the bug which causes the `operationType` to be null in the commit metadata of bulk insert operation with row writer enabled in Hudi Streamer (`hoodie.datasource.write.row.writer.enable=true`). `HoodieStreamerDatasetBulkInsertCommitActionExecutor` is updated so that `#preExecute` and `#afterExecute` should run the same logic as regular bulk insert operation without row writer. commit 332e7e8 Author: harshal <harshal.j.patil@gmail.com> Date: Sat Nov 25 14:04:29 2023 +0530 [HUDI-7006] Reduce unnecessary is_empty rdd calls in StreamSync (apache#10158) --------- Co-authored-by: sivabalan <n.siva.b@gmail.com> commit 86232d2 Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Thu Nov 23 19:27:50 2023 -0800 [HUDI-7095] Making perf enhancements to JSON serde (apache#10097) commit a7fd27c Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Thu Nov 23 19:20:01 2023 -0800 [HUDI-7086] Scaling gcs event source (apache#10073) - Scaling gcs event source --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> commit bb42c4b Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Thu Nov 23 18:33:32 2023 -0800 [HUDI-7097] Fix instantiation of Hms Uri with HiveSync tool (apache#10099) commit 0b7f47a Author: Jonathan Vexler <=> Date: Thu Nov 23 16:27:36 2023 -0500 decently working commit bcb974b Author: VitoMakarevich <vitaliy.makarevich.work@gmail.com> Date: Thu Nov 23 11:22:14 2023 +0100 [HUDI-7034] Fix refresh table/view (apache#10151) * [HUDI-7034] Refresh index fix - remove cached file slices within partitions --------- Co-authored-by: vmakarevich <vitali.makarevich@instructure.com> Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> commit b77eff2 Author: Lokesh Jain <ljain@apache.org> Date: Thu Nov 23 10:47:40 2023 +0530 [HUDI-7120] Performance improvements in deltastreamer executor code path (apache#10135) commit 405be17 Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Wed Nov 22 21:00:33 2023 -0800 [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) (apache#10095) * Making misc fixes to deltastreamer sources * Fixing test failures * adding inference to CloudSourceconfig... cloud.data.datafile.format * Fix the tests for s3 events source * Fix the tests for s3 events source --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> commit 3d21285 Author: Tim Brown <tim@onehouse.ai> Date: Wed Nov 22 22:51:14 2023 -0600 [HUDI-7112] Reuse existing timeline server and performance improvements (apache#10122) - Reuse timeline server across tables. --------- Co-authored-by: sivabalan <n.siva.b@gmail.com> commit 72ff9a7 Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Date: Wed Nov 22 20:49:15 2023 -0800 [HUDI-7052] Fix partition key validation for custom key generators. (apache#10014) --------- Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> commit 8d6d043 Author: majian <47964462+majian1998@users.noreply.github.com> Date: Thu Nov 23 10:08:17 2023 +0800 [HUDI-7110] Add call procedure for show column stats information (apache#10120) commit aabaa99 Author: huangxiaoping <1754789345@qq.com> Date: Thu Nov 23 09:06:45 2023 +0800 [MINOR] Remove unused import (apache#10159) commit f88a73f Author: Y Ethan Guo <ethan.guoyihua@gmail.com> Date: Wed Nov 22 10:48:48 2023 -0800 [HUDI-7123] Improve CI scripts (apache#10136) Improves the CI scripts in the following aspects: - Removes `hudi-common` tests from `test-spark` job in GH CI as they are already covered by Azure CI - Removes unnecesary bundle validation jobs and adds new bundle validation images (`flink1153hive313spark323`, `flink1162hive313spark331`) - Updates `validate-release-candidate-bundles` jobs - Moves functional tests of `hudi-spark-datasource/hudi-spark` from job 4 (3 hours) to job 2 (1 hour) in Azure CI to rebalance the finish time. commit 38c87b7 Author: harshal <harshal.j.patil@gmail.com> Date: Wed Nov 22 20:53:42 2023 +0530 [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources (apache#10152) commit d0edfb5 Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Wed Nov 22 10:22:53 2023 -0500 [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker (apache#10150) - Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custom delete marker across all delete apis commit cda9dbc Author: Jing Zhang <beyond1920@gmail.com> Date: Wed Nov 22 18:04:39 2023 +0800 [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure (apache#10147) commit 18f7181 Author: Shiyan Xu <2701446+xushiyan@users.noreply.github.com> Date: Wed Nov 22 02:00:27 2023 -0600 [HUDI-7133] Improve dbt example for better guidance (apache#10155) commit c5af85d Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Wed Nov 22 01:33:49 2023 -0500 [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata (apache#10098) commit 2522f6d Author: xuzifu666 <xuyu@zepp.com> Date: Wed Nov 22 11:53:21 2023 +0800 [HUDI-7128] DeleteMarkerProcedures support delete in batch mode (apache#10148) Co-authored-by: xuyu <11161569@vivo.com> commit a1afcdd Author: Tim Brown <tim@onehouse.ai> Date: Tue Nov 21 14:58:12 2023 -0600 [HUDI-7115] Add in new options for the bigquery sync (apache#10125) - Add in new options for the bigquery sync commit 35cd873 Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Tue Nov 21 13:11:21 2023 -0500 [HUDI-7084] Fixing schema retrieval for table w/ no commits (apache#10069) * Fixing schema retrieval for table w/ no commits * fixing compilation failure commit 74793d5 Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Date: Tue Nov 21 09:53:12 2023 -0800 [HUDI-7106] Fix sqs deletes, deltasync service close and error table default configs. (apache#10117) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> commit b981877 Author: harshal <harshal.j.patil@gmail.com> Date: Tue Nov 21 22:52:28 2023 +0530 [HUDI-7003] Add option to fallback to full table scan if files are deleted due to cleaner (apache#9941) commit 600fd4d Author: Akira Ajisaka <akiraaj@amazon.com> Date: Wed Nov 22 01:24:37 2023 +0900 [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format (apache#9567) * [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format This reverts commit 2567ada. Conflicts: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergeOnReadTableInputFormat.java * Always use file index if files partition is available --------- Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> commit 9e2500c Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Tue Nov 21 09:55:23 2023 -0500 [HUDI-7083] Adding support for multiple tables with Prometheus Reporter (apache#10068) * Adding support for multiple tables with Prometheus Reporter * Fixing closure of http server * Remove entry from port-collector registry map after stopping http server --------- Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> commit baffe1d Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Tue Nov 21 09:32:39 2023 -0500 [MINOR] Misc fixes in deltastreamer (apache#10067) commit 0c4f3a3 Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Tue Nov 21 02:17:13 2023 -0500 [HUDI-7127] Fixing set up and tear down in tests (apache#10146) commit eaba114 Author: Akira Ajisaka <akiraaj@amazon.com> Date: Tue Nov 21 11:37:47 2023 +0900 [HUDI-7107] Reused MetricsReporter fails to publish metrics in Spark streaming job (apache#10132) commit 578e756 Author: Jing Zhang <beyond1920@gmail.com> Date: Tue Nov 21 10:04:33 2023 +0800 [HUDI-7118] Set conf 'spark.sql.parquet.enableVectorizedReader' to true automatically only if the value is not explicitly set (apache#10134) commit d24220a Author: Jing Zhang <beyond1920@gmail.com> Date: Tue Nov 21 09:56:07 2023 +0800 [HUDI-7111] Fix performance regression of tag when written into simple bucket index table (apache#10130) commit 84990ae Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com> Date: Mon Nov 20 11:17:45 2023 -0800 Fix schema refresh for KafkaAvroSchemaDeserializer (apache#10118) Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local> commit 979132b Author: majian <47964462+majian1998@users.noreply.github.com> Date: Mon Nov 20 10:43:11 2023 +0800 [HUDI-7099] Providing metrics for archive and defining some string constants (apache#10101) commit 3225625 Author: Fabio Buso <dev.siroibaf@gmail.com> Date: Mon Nov 20 03:19:41 2023 +0100 [MINOR] Add Hopsworks File System to StorageSchemes (apache#10141) commit 3913dca Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Sat Nov 18 23:50:37 2023 -0500 [HUDI-7098] Add max bytes per partition with cloud stores source in DS (apache#10100) commit 4c295b2 Author: hehuiyuan <471627698@qq.com> Date: Sun Nov 19 09:43:52 2023 +0800 [HUDI-7119] Don't write precombine field to hoodie.properties when the ts field does not exist for append mode (apache#10133) commit b2f4493 Author: Jing Zhang <beyond1920@gmail.com> Date: Sun Nov 19 09:35:54 2023 +0800 [HUDI-7072] Remove support for Flink 1.13 (apache#10052) commit dfe1674 Author: Sagar Lakshmipathy <18vidhyasagar@gmail.com> Date: Fri Nov 17 18:43:07 2023 -0800 [Minor] Fixed twitter link to redirect to twitter (apache#10139) commit f58d9cb Author: Jonathan Vexler <=> Date: Fri Nov 17 18:10:00 2023 -0500 current point commit 184858b Author: Jonathan Vexler <=> Date: Fri Nov 17 16:21:56 2023 -0500 non-working. Want to review with team that this makes sense commit 8240b6a Author: Y Ethan Guo <ethan.guoyihua@gmail.com> Date: Fri Nov 17 11:20:57 2023 -0800 [HUDI-7113] Update release scripts and docs for Spark 3.5 support (apache#10123) commit 216aeb4 Author: Danny Chan <yuzhao.cyz@gmail.com> Date: Fri Nov 17 14:35:17 2023 +0800 [HUDI-7116] Add docker image for flink 1.14 and spark 2.4.8 (apache#10126) commit 3d0c450 Author: YueZhang <69956021+zhangyue19921010@users.noreply.github.com> Date: Fri Nov 17 09:48:59 2023 +0800 [HUDI-7109] Fix Flink may re-use a committed instant in append mode (apache#10119) commit f06ff5b Author: hehuiyuan <471627698@qq.com> Date: Fri Nov 17 09:43:21 2023 +0800 [HUDI-7090] Set the maxParallelism for singleton operator (apache#10090) commit faa73e9 Author: Y Ethan Guo <ethan.guoyihua@gmail.com> Date: Thu Nov 16 12:12:22 2023 -0800 [MINOR] Disable failed test on master (apache#10124) commit 6cc39bf Author: Sivabalan Narayanan <n.siva.b@gmail.com> Date: Thu Nov 16 06:00:54 2023 -0500 [MINOR] Removing unnecessary guards to row writer (apache#10004) commit 4ea752f Author: voonhous <voonhousu@gmail.com> Date: Thu Nov 16 16:53:28 2023 +0800 [MINOR] Modified description to include missing trigger strategy (apache#10114) commit 874b5de Author: Shawn Chang <42792772+CTTY@users.noreply.github.com> Date: Wed Nov 15 21:57:14 2023 -0800 [HUDI-6806] Support Spark 3.5.0 (apache#9717) --------- Co-authored-by: Shawn Chang <yxchang@amazon.com> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> commit 35af64d Author: Shawn Chang <42792772+CTTY@users.noreply.github.com> Date: Wed Nov 15 18:36:42 2023 -0800 [Minor] Throw exceptions when cleaner/compactor fail (apache#10108) Co-authored-by: Shawn Chang <yxchang@amazon.com> commit bada5d9 Author: Shawn Chang <42792772+CTTY@users.noreply.github.com> Date: Wed Nov 15 16:50:38 2023 -0800 [HUDI-5936] Fix serialization problem when FileStatus is not serializable (apache#10065) Co-authored-by: Shawn Chang <yxchang@amazon.com> commit dcd5a81 Author: majian <47964462+majian1998@users.noreply.github.com> Date: Wed Nov 15 16:10:15 2023 +0800 [HUDI-7069] Optimize metaclient construction and include table config options (apache#10048) commit f218e54 Author: Jing Zhang <beyond1920@gmail.com> Date: Wed Nov 15 16:07:04 2023 +0800 [MINOR] Add detailed error logs in RunCompactionProcedure (apache#10070) * add detailed error logs in RunCompactionProcedure * only print 100 error file paths into logs commit 2185abb Author: Jing Zhang <beyond1920@gmail.com> Date: Wed Nov 15 16:03:23 2023 +0800 [HUDI-7094] AlterTableAddColumnCommand/AlterTableChangeColumnCommand update table with ro/rt suffix (apache#10094) commit abd3afc Author: Hussein Awala <hussein@awala.fr> Date: Wed Nov 15 06:55:47 2023 +0200 [HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role (apache#9260) commit 424e0ce Author: chao chen <59957056+waywtdcc@users.noreply.github.com> Date: Wed Nov 15 12:20:10 2023 +0800 [HUDI-7050] Flink HoodieHiveCatalog supports hadoop parameters (apache#10013) commit 19b3e7f Author: leixin <1403342953@qq.com> Date: Wed Nov 15 09:24:29 2023 +0800 [Minor] Throws an exception when using bulk_insert and stream mode (apache#10082) Co-authored-by: leixin1 <leixin1@jd.com>
In the existing table service,
HoodieMetrics
registers the duration and other relevant information for compaction, clustering, and clean operations. However, there are no corresponding metrics for the archive operation. Therefore, we have implemented the necessary metrics for the archive operation.Additionally, we have defined string constants in the field to extract string literal in
HoodieMetrics
.Change Logs
Providing metrics for archive.
Impact
None
Risk level (write none, low medium or high below)
None
Documentation Update
None
Contributor's checklist