Skip to content

HIVE-28975: [HiveAcidReplication] Remove dangling txns from Target side post incremental replication #5874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2025

Conversation

harshal-16
Copy link
Contributor

@harshal-16 harshal-16 commented Jun 16, 2025

Details:

  • Currently, if for some reason on the source side, there is any entry missing in notification_log / txn_write_notification_log, then it may lead to a dangling open transaction on the Target side post replication
  • Repl created an Open transaction can lead to a repl_incompatible error once the ACIDHouseKeeper thread kills the dangling transaction
  • Instead of that, we can proactively clean the dangling transactions on the Target side, preventing replication failure
  • It is a proactive measure to protect replication against future bugs similar to HIVE-27797

Testing:

  • Tested on Cluster
  • Added test case

What changes were proposed in this pull request?

  • Currently, if for some reason on the source side, there is any entry missing in notification_log / txn_write_notification_log, then it may lead to a dangling open transaction on the Target side post replication
  • Repl created an Open transaction can lead to a repl_incompatible error once the ACIDHouseKeeper thread kills the dangling transaction
  • Instead of that, we can proactively clean the dangling transactions on the Target side, preventing replication failure
  • It is a proactive measure to protect replication against future bugs similar to HIVE-27797

Why are the changes needed?

  • This is an additional guardrail, which prevents the replication from becoming incompatible in case of any dangling transaction

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Tested on actual cluster
  • Added test case

…de post incremental replication

Details:
	* Currently, if for some reason on the source side if there is any entry missing in notification_log / txn_write_notification_log, then it may lead to dangling open transaction on Target side post replication
	* Repl created Open transaction can lead to repl_incompatible error once ACIDHouseKeeper thread kills the dangling transaction
	* Instead of that we can proactively clean the dangling transactions on the Target side preventing replication failure
	* It is proactive measure to protect replication against future bugs similar to HIVE-27797

Testing:
	* Tested on Cluster
	* Added test case
Copy link

@pudidic
Copy link
Contributor

pudidic commented Jun 24, 2025

+1. LGTM.

@pudidic pudidic merged commit 90b48ff into apache:master Jun 25, 2025
4 checks passed
@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 25, 2025

@harshal-16
Copy link
Contributor Author

I did not check that, I will check it

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 25, 2025

why there are so many gen classes updated when you introduced just 1 method? What version of protoc-gen-grpc-java have you used?

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 25, 2025

I did not check that, I will check it

@harshal-16 then why did you merge. Guys that is a bad practice!! Just check the Hive code report: https://sonarcloud.io/project/issues?impactSeverities=BLOCKER&issueStatuses=OPEN%2CCONFIRMED&id=apache_hive
And the list of issues keeps getting longer

@ayushtkn
Copy link
Member

Folks, I think the thrift code is generated with some wrong version of thrift. We are on the urge of cutting branch for the 4.1.0 release

Lets revert it for now & we can rework it

@harshal-16
Copy link
Contributor Author

@ayushtkn Where can I find this detail? I checked the generated files again, and it also says * Autogenerated by Thrift Compiler (0.16.0) as earlier
If it's blocking the release, then I don't mind reverting it and merging it later

@zhangbutao
Copy link
Contributor

I'll be performing the branch-cut operation in about two hours.
Regarding this PR, should we include it in the 4.1.0 branch?

If we haven’t reached an agreement yet, we can revert this PR first.

Thanks.

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 25, 2025

Did the thrift gen validation.

I am getting just 1 java file modified (I am not sure why we need rest: cpp, php, py and rb) - ThriftHiveMetastore.java! Had to move the ReplayedTxnsForPolicyResult definition to the end of the list

brew tap homebrew/core --force
brew tap-new $USER/local-tap
brew extract --version='0.16.0' thrift $USER/local-tap
brew install rbenv/tap/openssl@1.1
brew install thrift@0.16.0
wget https://raw.githubusercontent.com/apache/thrift/refs/tags/v0.16.0/contrib/fb303/if/fb303.thrift
mkdir -p $TH/share/fb303/if
cp fb303.thrift $TH/share/fb303/if/
mvn clean install -Pthriftif -DskipTests -Dthrift.home=$TH

Diff

Subject: [PATCH] refactor
---
Index: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift b/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
--- a/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift	(revision fac17feed39473c4219f3b3a2c63aeed4fc92701)
+++ b/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift	(date 1750851518454)
@@ -1446,7 +1446,6 @@
     22: optional i64 txnId,
     23: optional i64 commitTime,
     24: optional i64 hightestWriteId
-
 }
 
 struct ShowCompactResponse {
@@ -1504,7 +1503,6 @@
     6: optional string partitionname
 }
 
-
 struct NotificationEventRequest {
     1: required i64 lastEvent,
     2: optional i32 maxEvents,
@@ -1622,7 +1620,6 @@
   ORC_SARG = 1
 }
 
-
 // Request type for get_file_metadata_by_expr
 struct GetFileMetadataByExprRequest {
   1: required list<i64> fileIds,
@@ -2531,6 +2528,10 @@
   7: optional bool tableLevel = false
 }
 
+struct ReplayedTxnsForPolicyResult {
+  1: map<string, string> replTxnMapEntry
+}
+
 // Exceptions.
 
 exception MetaException {
@@ -3162,6 +3163,7 @@
   void abort_txns(1:AbortTxnsRequest rqst) throws (1:NoSuchTxnException o1)
   void commit_txn(1:CommitTxnRequest rqst) throws (1:NoSuchTxnException o1, 2:TxnAbortedException o2)
   i64 get_latest_txnid_in_conflict(1:i64 txnId) throws (1:MetaException o1)
+  ReplayedTxnsForPolicyResult get_replayed_txns_for_policy(1: string policyName) throws (1: MetaException o1)
   void repl_tbl_writeid_state(1: ReplTblWriteIdStateRequest rqst)
   GetValidWriteIdsResponse get_valid_write_ids(1:GetValidWriteIdsRequest rqst)
       throws (1:NoSuchTxnException o1, 2:MetaException o2)

@harshal-16, @ayushtkn, @zhangbutao my vote is to revert, update and resubmit.

deniskuzZ added a commit that referenced this pull request Jun 25, 2025
…arget side post incremental replication (#5874)"

This reverts commit 90b48ff.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants