Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7099] Providing metrics for archive and defining some string constants #10101

Merged
merged 5 commits into from
Nov 20, 2023

Conversation

majian1998
Copy link
Contributor

@majian1998 majian1998 commented Nov 15, 2023

In the existing table service, HoodieMetrics registers the duration and other relevant information for compaction, clustering, and clean operations. However, there are no corresponding metrics for the archive operation. Therefore, we have implemented the necessary metrics for the archive operation.

Additionally, we have defined string constants in the field to extract string literal in HoodieMetrics.

Change Logs

Providing metrics for archive.

Impact

None

Risk level (write none, low medium or high below)

None

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@majian1998 majian1998 changed the title [HUDI-7099] Providing metrics for archive and defining som string constants [HUDI-7099] Providing metrics for archive and defining some string constants Nov 15, 2023
@majian1998 majian1998 closed this Nov 15, 2023
@majian1998 majian1998 reopened this Nov 15, 2023
@nsivabalan nsivabalan self-assigned this Nov 15, 2023
@nsivabalan nsivabalan added the priority:critical production down; pipelines stalled; Need help asap. label Nov 15, 2023
Copy link
Contributor

@stream2000 stream2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, left some minor comments.

this.logCompactionTimerName = getMetricsName(TIMER_ACTION, HoodieTimeline.LOG_COMPACTION_ACTION);
this.indexTimerName = getMetricsName(TIMER_ACTION, INDEX_ACTION);
this.conflictResolutionTimerName = getMetricsName(TIMER_ACTION, CONFLICT_RESOLUTION_STR);
this.conflictResolutionSuccessCounterName = getMetricsName(COUNTER_ACTION, CONFLICT_RESOLUTION_STR + ".success");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
CONFLICT_RESOLUTION_STR + ".success" -> CONFLICT_RESOLUTION_SUCCESS_STR
CONFLICT_RESOLUTION_STR + ".failure" -> CONFLICT_RESOLUTION_FAILURE_STR
.requested -> HoodieTimeline.REQUESTED_EXTENSION
.completed -> HoodieTimeline.COMPLETED_EXTENSION

}
}

public void updateArchiveMetrics(long durationInMs, int numFilesDeleted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numFilesDeleted -> numInstantsArchived?

metrics.registerGauge(getMetricsName("finalize", "duration"), durationInMs);
metrics.registerGauge(getMetricsName("finalize", "numFilesFinalized"), numFilesFinalized);
metrics.registerGauge(getMetricsName(FINALIZE_ACTION, DURATION_STR), durationInMs);
metrics.registerGauge(getMetricsName(FINALIZE_ACTION, FINALIZED_FILES_NUM_STR), numFilesFinalized);
}
}

public void updateIndexMetrics(final String action, final long durationInMs) {
if (config.isMetricsOn()) {
LOG.info(String.format("Sending index metrics (%s.duration, %d)", action, durationInMs));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also update the string literal in the log here

@@ -255,48 +277,57 @@ private void updateCommitTimingMetrics(long commitEpochTimeInMs, long durationIn
Pair<Option<Long>, Option<Long>> eventTimePairMinMax = metadata.getMinAndMaxEventTime();
if (eventTimePairMinMax.getLeft().isPresent()) {
long commitLatencyInMs = commitEpochTimeInMs + durationInMs - eventTimePairMinMax.getLeft().get();
metrics.registerGauge(getMetricsName(actionType, "commitLatencyInMs"), commitLatencyInMs);
metrics.registerGauge(getMetricsName(actionType, COMMIT_LATENCY_STR), commitLatencyInMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMMIT_LATENCY_STR -> COMMIT_LATENCY_IN_MS_STR

@@ -117,6 +122,10 @@ public boolean archiveIfRequired(HoodieEngineContext context, boolean acquireLoc
} else {
LOG.info("No Instants to archive");
}
if (success && timerContext != null) {
long durationMs = metrics.getDurationInMs(timerContext.stop());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the metrics handling to the write client or the service client, the cleaning and rollback alreay follow this pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does indeed make the style more consistent, but there is one issue – the archive action does not return any metadata, so we are unable to obtain information about the cleaned instants. Do we need to modify the return content of archiveIfRequired to support this improvement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, need to think through what the return type should look like.

@majian1998
Copy link
Contributor Author

Now, the archive metrics have been moved to BaseHoodieTableServiceClient, and the return value of archiveIfRequired has been modified. In the previous implementation, the success variable always seemed to be true? So, I have also removed the relevant checks in the unit tests. cc@danny0405

@@ -256,8 +256,8 @@ public void testArchiveEmptyTable() throws Exception {
metaClient = HoodieTableMetaClient.reload(metaClient);
HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
HoodieTimelineArchiver archiver = new HoodieTimelineArchiver(cfg, table);
boolean result = archiver.archiveIfRequired(context);
assertTrue(result);
int result = archiver.archiveIfRequired(context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add some tests for archive metrics like what we done for compaction metrics in #8759.

// triggers compaction and cleaning only after archiving action
this.timelineWriter.compactAndClean(context);
} else {
LOG.info("No Instants to archive");
}
return success;
return instantsToArchive.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include the success flag also as a metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current archive code, it seems that success is always set to true. The success variable is initialized as true, and deleteArchivedInstants always returns true unless it fails. However, if there is a failure, the current implementation should throw an exception and terminate without returning any value. Therefore, I believe the success variable is meaningless in the current context. Alternatively, should we catch these exceptions and return false? I'm not sure if this would be reasonable.

Copy link
Contributor

@danny0405 danny0405 Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can address it in another PR, we kind of do not have atomicity for current 2 steps:

  • flush of archived timeline
  • deletion of active metadata files

My roughly thought is we can fix the left over active metadata files (should be deleted from active timeline) in the next round of archiving, imagine the latest instant time in archived timeline is t10 and the oldest instant in active timeline is t7, we should retry the deletion of instant metadat files from t7 ~ 10 at the very beginning.

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 979132b into apache:master Nov 20, 2023
34 checks passed
jonvex pushed a commit to jonvex/hudi that referenced this pull request Nov 29, 2023
commit dfa3bde
Merge: bfc0a85 473cf9a
Author: Jonathan Vexler <=>
Date:   Wed Nov 29 15:01:45 2023 -0500

    Merge branch 'master' into fg_reader_implement_bootstrap

commit bfc0a85
Author: Jonathan Vexler <=>
Date:   Wed Nov 29 14:55:57 2023 -0500

    fix bug with nested required fields due to spark nested schema pruning bug

commit 473cf9a
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Wed Nov 29 08:37:40 2023 -0800

    [HUDI-7138] Fix error table writer and schema registry provider (apache#10173)

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 91eabab
Author: Lin Liu <141371752+linliu-code@users.noreply.github.com>
Date:   Tue Nov 28 23:49:37 2023 -0800

    [HUDI-7103] Support time travel queies for COW tables (apache#10109)

    This is based on HadoopFsRelation.

commit b300728
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Tue Nov 28 22:31:12 2023 -0800

    [HUDI-7086] Fix the default for gcp pub sub max sync time to 1min (apache#10171)

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 8370c62
Author: Shiyan Xu <2701446+xushiyan@users.noreply.github.com>
Date:   Tue Nov 28 22:31:34 2023 -0600

    [HUDI-7149] Add a dbt example project with CDC capability (apache#10192)

commit 817d81a
Author: zhuanshenbsj1 <34104400+zhuanshenbsj1@users.noreply.github.com>
Date:   Wed Nov 29 11:46:20 2023 +0800

    [MINOR] Add log to print wrong number of instant metadata files (apache#10196)

commit cadeade
Author: leixin <1403342953@qq.com>
Date:   Wed Nov 29 11:45:24 2023 +0800

    [minor] when metric prefix length is 0 ignore the metric prefix (apache#10190)

    Co-authored-by: leixin1 <leixin1@jd.com>

commit 91daa7d
Author: Lin Liu <141371752+linliu-code@users.noreply.github.com>
Date:   Tue Nov 28 19:03:50 2023 -0800

    [HUDI-7102] Fix bugs related to time travel queries (apache#10102)

commit d1dfa5b
Author: Dongsj <90449228+eric9204@users.noreply.github.com>
Date:   Wed Nov 29 10:49:38 2023 +0800

    [HUDI-7148] Add an additional fix to the potential thread insecurity problem of heartbeat client (apache#10188)

    Co-authored-by: dongsj <dongsj@asiainfo.com>

commit b0b711e
Author: Jonathan Vexler <=>
Date:   Tue Nov 28 21:35:20 2023 -0500

    nested schema kinda fix

commit 77cfb3a
Author: YueZhang <69956021+zhangyue19921010@users.noreply.github.com>
Date:   Wed Nov 29 09:46:53 2023 +0800

    [HUDI-7147] Fix CDC write flush bug (apache#10186)

    * Using iterator instead of values to avoid unsupported operation exception

    * check style

commit b144ee0
Author: Jon Vexler <jbvexler@gmail.com>
Date:   Tue Nov 28 14:23:46 2023 -0500

    Update hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala

    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit 89fab14
Author: Jonathan Vexler <=>
Date:   Tue Nov 28 14:23:03 2023 -0500

    fix failing tests and address some of sagar pr review

commit 675abf1
Author: Tim Brown <tim@onehouse.ai>
Date:   Mon Nov 27 23:21:56 2023 -0600

    [MINOR] Schema Converter should use default identity transform if not specified (apache#10178)

commit 5450aff
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 22:21:06 2023 -0500

    disable vector for bootstrap

commit fb062df
Author: Danny Chan <yuzhao.cyz@gmail.com>
Date:   Tue Nov 28 10:52:33 2023 +0800

    [Minor] Fix the flaky tests in TestRemoteHoodieTableFileSystemView (apache#10179)

commit 3ae4d30
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 21:07:17 2023 -0500

    fix various issues that caused failing tests

commit a045da6
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 18:00:46 2023 -0500

    see if this works

commit 91be81a
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 17:07:30 2023 -0500

    use java to create unary operator

commit c22d1db
Merge: 38b2603 4c3a1db
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 15:56:39 2023 -0500

    Merge branch 'master' into fg_reader_implement_bootstrap

commit 38b2603
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 15:42:22 2023 -0500

    set precombine in test

commit 2a9a363
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 13:27:38 2023 -0500

    try to fix scala2.11 unary operator issue

commit 60bdf14
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 13:02:16 2023 -0500

    try fix ci

commit 4c3a1db
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Mon Nov 27 16:44:25 2023 +0800

    [HUDI-7110][FOLLOW-UP] Improve call procedure for show column stats information (apache#10169)

commit 499423c
Author: zhuanshenbsj1 <34104400+zhuanshenbsj1@users.noreply.github.com>
Date:   Sun Nov 26 10:13:46 2023 +0800

    [HUDI-7041] Optimize the memory usage of timeline server for table service (apache#10002)

commit 4f875ed
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Sat Nov 25 15:10:37 2023 -0800

    [HUDI-7139] Fix operation type for bulk insert with row writer in Hudi Streamer (apache#10175)

    This commit fixes the bug which causes the `operationType` to be null in the commit metadata of bulk insert operation with row writer enabled in Hudi Streamer (`hoodie.datasource.write.row.writer.enable=true`).  `HoodieStreamerDatasetBulkInsertCommitActionExecutor` is updated so that `#preExecute` and `#afterExecute` should run the same logic as regular bulk insert operation without row writer.

commit 332e7e8
Author: harshal <harshal.j.patil@gmail.com>
Date:   Sat Nov 25 14:04:29 2023 +0530

    [HUDI-7006] Reduce unnecessary is_empty rdd calls in StreamSync (apache#10158)

    ---------

    Co-authored-by: sivabalan <n.siva.b@gmail.com>

commit 86232d2
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 23 19:27:50 2023 -0800

    [HUDI-7095] Making perf enhancements to JSON serde (apache#10097)

commit a7fd27c
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 23 19:20:01 2023 -0800

    [HUDI-7086] Scaling gcs event source (apache#10073)

    -  Scaling gcs event source

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit bb42c4b
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 23 18:33:32 2023 -0800

    [HUDI-7097] Fix instantiation of Hms Uri with HiveSync tool (apache#10099)

commit 0b7f47a
Author: Jonathan Vexler <=>
Date:   Thu Nov 23 16:27:36 2023 -0500

    decently working

commit bcb974b
Author: VitoMakarevich <vitaliy.makarevich.work@gmail.com>
Date:   Thu Nov 23 11:22:14 2023 +0100

    [HUDI-7034] Fix refresh table/view (apache#10151)

    * [HUDI-7034] Refresh index fix - remove cached file slices within partitions

    ---------

    Co-authored-by: vmakarevich <vitali.makarevich@instructure.com>
    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit b77eff2
Author: Lokesh Jain <ljain@apache.org>
Date:   Thu Nov 23 10:47:40 2023 +0530

    [HUDI-7120] Performance improvements in deltastreamer executor code path (apache#10135)

commit 405be17
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Wed Nov 22 21:00:33 2023 -0800

    [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) (apache#10095)

    * Making misc fixes to deltastreamer sources

    * Fixing test failures

    * adding inference to CloudSourceconfig... cloud.data.datafile.format

    * Fix the tests for s3 events source

    * Fix the tests for s3 events source

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 3d21285
Author: Tim Brown <tim@onehouse.ai>
Date:   Wed Nov 22 22:51:14 2023 -0600

    [HUDI-7112] Reuse existing timeline server and performance improvements (apache#10122)

    - Reuse timeline server across tables.

    ---------

    Co-authored-by: sivabalan <n.siva.b@gmail.com>

commit 72ff9a7
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Wed Nov 22 20:49:15 2023 -0800

    [HUDI-7052] Fix partition key validation for custom key generators. (apache#10014)

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 8d6d043
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Thu Nov 23 10:08:17 2023 +0800

    [HUDI-7110] Add call procedure for show column stats information (apache#10120)

commit aabaa99
Author: huangxiaoping <1754789345@qq.com>
Date:   Thu Nov 23 09:06:45 2023 +0800

    [MINOR] Remove unused import (apache#10159)

commit f88a73f
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Wed Nov 22 10:48:48 2023 -0800

    [HUDI-7123] Improve CI scripts (apache#10136)

    Improves the CI scripts in the following aspects:
    - Removes `hudi-common` tests from `test-spark` job in GH CI as they are already covered by Azure CI
    - Removes unnecesary bundle validation jobs and adds new bundle validation images (`flink1153hive313spark323`, `flink1162hive313spark331`)
    - Updates `validate-release-candidate-bundles` jobs
    - Moves functional tests of `hudi-spark-datasource/hudi-spark` from job 4 (3 hours) to job 2 (1 hour) in Azure CI to rebalance the finish time.

commit 38c87b7
Author: harshal <harshal.j.patil@gmail.com>
Date:   Wed Nov 22 20:53:42 2023 +0530

    [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources (apache#10152)

commit d0edfb5
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Wed Nov 22 10:22:53 2023 -0500

    [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker (apache#10150)

    - Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custom delete marker across all delete apis

commit cda9dbc
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Wed Nov 22 18:04:39 2023 +0800

    [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure (apache#10147)

commit 18f7181
Author: Shiyan Xu <2701446+xushiyan@users.noreply.github.com>
Date:   Wed Nov 22 02:00:27 2023 -0600

    [HUDI-7133] Improve dbt example for better guidance (apache#10155)

commit c5af85d
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Wed Nov 22 01:33:49 2023 -0500

    [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata (apache#10098)

commit 2522f6d
Author: xuzifu666 <xuyu@zepp.com>
Date:   Wed Nov 22 11:53:21 2023 +0800

    [HUDI-7128] DeleteMarkerProcedures support delete in batch mode (apache#10148)

    Co-authored-by: xuyu <11161569@vivo.com>

commit a1afcdd
Author: Tim Brown <tim@onehouse.ai>
Date:   Tue Nov 21 14:58:12 2023 -0600

    [HUDI-7115] Add in new options for the bigquery sync (apache#10125)

    - Add in new options for the bigquery sync

commit 35cd873
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 13:11:21 2023 -0500

    [HUDI-7084] Fixing schema retrieval for table w/ no commits (apache#10069)

    * Fixing schema retrieval for table w/ no commits

    * fixing compilation failure

commit 74793d5
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Tue Nov 21 09:53:12 2023 -0800

    [HUDI-7106] Fix sqs deletes, deltasync service close and error table default configs. (apache#10117)

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit b981877
Author: harshal <harshal.j.patil@gmail.com>
Date:   Tue Nov 21 22:52:28 2023 +0530

    [HUDI-7003] Add option to fallback to full table scan if files are deleted due to cleaner (apache#9941)

commit 600fd4d
Author: Akira Ajisaka <akiraaj@amazon.com>
Date:   Wed Nov 22 01:24:37 2023 +0900

    [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format (apache#9567)

    * [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format

    This reverts commit 2567ada.

     Conflicts:
    	hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
    	hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergeOnReadTableInputFormat.java

    * Always use file index if files partition is available

    ---------

    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit 9e2500c
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 09:55:23 2023 -0500

    [HUDI-7083] Adding support for multiple tables with Prometheus Reporter (apache#10068)

    * Adding support for multiple tables with Prometheus Reporter

    * Fixing closure of http server

    * Remove entry from port-collector registry map after stopping http server

    ---------

    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit baffe1d
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 09:32:39 2023 -0500

    [MINOR] Misc fixes in deltastreamer (apache#10067)

commit 0c4f3a3
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 02:17:13 2023 -0500

    [HUDI-7127] Fixing set up and tear down in tests (apache#10146)

commit eaba114
Author: Akira Ajisaka <akiraaj@amazon.com>
Date:   Tue Nov 21 11:37:47 2023 +0900

    [HUDI-7107] Reused MetricsReporter fails to publish metrics in Spark streaming job (apache#10132)

commit 578e756
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Tue Nov 21 10:04:33 2023 +0800

    [HUDI-7118] Set conf 'spark.sql.parquet.enableVectorizedReader' to true automatically only if the value is not explicitly set (apache#10134)

commit d24220a
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Tue Nov 21 09:56:07 2023 +0800

    [HUDI-7111] Fix performance regression of tag when written into simple bucket index table (apache#10130)

commit 84990ae
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Mon Nov 20 11:17:45 2023 -0800

    Fix schema refresh for KafkaAvroSchemaDeserializer (apache#10118)

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 979132b
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Mon Nov 20 10:43:11 2023 +0800

    [HUDI-7099] Providing metrics for archive and defining some string constants (apache#10101)

commit 3225625
Author: Fabio Buso <dev.siroibaf@gmail.com>
Date:   Mon Nov 20 03:19:41 2023 +0100

    [MINOR] Add Hopsworks File System to StorageSchemes (apache#10141)

commit 3913dca
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Sat Nov 18 23:50:37 2023 -0500

    [HUDI-7098] Add max bytes per partition with cloud stores source in DS (apache#10100)

commit 4c295b2
Author: hehuiyuan <471627698@qq.com>
Date:   Sun Nov 19 09:43:52 2023 +0800

    [HUDI-7119] Don't write precombine field to hoodie.properties when the ts field does not exist for append mode (apache#10133)

commit b2f4493
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Sun Nov 19 09:35:54 2023 +0800

    [HUDI-7072] Remove support for Flink 1.13 (apache#10052)

commit dfe1674
Author: Sagar Lakshmipathy <18vidhyasagar@gmail.com>
Date:   Fri Nov 17 18:43:07 2023 -0800

    [Minor] Fixed twitter link to redirect to twitter (apache#10139)

commit f58d9cb
Author: Jonathan Vexler <=>
Date:   Fri Nov 17 18:10:00 2023 -0500

    current point

commit 184858b
Author: Jonathan Vexler <=>
Date:   Fri Nov 17 16:21:56 2023 -0500

    non-working. Want to review with team that this makes sense

commit 8240b6a
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Fri Nov 17 11:20:57 2023 -0800

    [HUDI-7113] Update release scripts and docs for Spark 3.5 support (apache#10123)

commit 216aeb4
Author: Danny Chan <yuzhao.cyz@gmail.com>
Date:   Fri Nov 17 14:35:17 2023 +0800

    [HUDI-7116] Add docker image for flink 1.14 and spark 2.4.8 (apache#10126)

commit 3d0c450
Author: YueZhang <69956021+zhangyue19921010@users.noreply.github.com>
Date:   Fri Nov 17 09:48:59 2023 +0800

    [HUDI-7109] Fix Flink may re-use a committed instant in append mode (apache#10119)

commit f06ff5b
Author: hehuiyuan <471627698@qq.com>
Date:   Fri Nov 17 09:43:21 2023 +0800

    [HUDI-7090] Set the maxParallelism for singleton operator  (apache#10090)

commit faa73e9
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Thu Nov 16 12:12:22 2023 -0800

    [MINOR] Disable failed test on master (apache#10124)

commit 6cc39bf
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 16 06:00:54 2023 -0500

    [MINOR] Removing unnecessary guards to row writer (apache#10004)

commit 4ea752f
Author: voonhous <voonhousu@gmail.com>
Date:   Thu Nov 16 16:53:28 2023 +0800

    [MINOR] Modified description to include missing trigger strategy (apache#10114)

commit 874b5de
Author: Shawn Chang <42792772+CTTY@users.noreply.github.com>
Date:   Wed Nov 15 21:57:14 2023 -0800

    [HUDI-6806] Support Spark 3.5.0 (apache#9717)

    ---------

    Co-authored-by: Shawn Chang <yxchang@amazon.com>
    Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>

commit 35af64d
Author: Shawn Chang <42792772+CTTY@users.noreply.github.com>
Date:   Wed Nov 15 18:36:42 2023 -0800

    [Minor] Throw exceptions when cleaner/compactor fail (apache#10108)

    Co-authored-by: Shawn Chang <yxchang@amazon.com>

commit bada5d9
Author: Shawn Chang <42792772+CTTY@users.noreply.github.com>
Date:   Wed Nov 15 16:50:38 2023 -0800

    [HUDI-5936] Fix serialization problem when FileStatus is not serializable (apache#10065)

    Co-authored-by: Shawn Chang <yxchang@amazon.com>

commit dcd5a81
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Wed Nov 15 16:10:15 2023 +0800

    [HUDI-7069] Optimize metaclient construction and include table config options (apache#10048)

commit f218e54
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Wed Nov 15 16:07:04 2023 +0800

    [MINOR] Add detailed error logs in RunCompactionProcedure (apache#10070)

    * add detailed error logs in RunCompactionProcedure
    * only print 100 error file paths into logs

commit 2185abb
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Wed Nov 15 16:03:23 2023 +0800

    [HUDI-7094] AlterTableAddColumnCommand/AlterTableChangeColumnCommand update table with ro/rt suffix (apache#10094)

commit abd3afc
Author: Hussein Awala <hussein@awala.fr>
Date:   Wed Nov 15 06:55:47 2023 +0200

    [HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role (apache#9260)

commit 424e0ce
Author: chao chen <59957056+waywtdcc@users.noreply.github.com>
Date:   Wed Nov 15 12:20:10 2023 +0800

    [HUDI-7050] Flink HoodieHiveCatalog supports hadoop parameters (apache#10013)

commit 19b3e7f
Author: leixin <1403342953@qq.com>
Date:   Wed Nov 15 09:24:29 2023 +0800

    [Minor] Throws an exception when using bulk_insert and stream mode (apache#10082)

    Co-authored-by: leixin1 <leixin1@jd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:critical production down; pipelines stalled; Need help asap. release-0.14.1
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

5 participants