-
Notifications
You must be signed in to change notification settings - Fork 1.9k
IGNITE-18550 Add idle_verify check of incremental snapshots #10617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGNITE-18550 Add idle_verify check of incremental snapshots #10617
Conversation
| } | ||
| }); | ||
|
|
||
| // All active transactions that didn't log COMMITTED or ROLL_BACK records are considered committed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can transactions without WAL record be rolled back during node recovery?
In case node fails after incremental snapshot created but before TX_RECORD loged
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Outdated
Show resolved
Hide resolved
| /** Holder for parition hashes. */ | ||
| private static class PartitionHashHolder { | ||
| /** */ | ||
| private int hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these fields should be long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it as int, because PartitionHashRecordV2 stores hashes as int too: partHash, partVerHash.
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Show resolved
Hide resolved
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Show resolved
Hide resolved
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Show resolved
Hide resolved
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Outdated
Show resolved
Hide resolved
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Outdated
Show resolved
Hide resolved
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Outdated
Show resolved
Hide resolved
| } | ||
| }; | ||
|
|
||
| short locShortId = blt.consistentIdMapping().get(consId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locNodeId?
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Show resolved
Hide resolved
| Map<PartitionKeyV2, PartitionHashHolder> partMap = new HashMap<>(); | ||
| List<Exception> exceptions = new ArrayList<>(); | ||
|
|
||
| BiConsumer<GridCacheVersion, Set<Short>> calcTransactionHash = (xid, partNodes) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calcTxHash?
| List<Exception> exceptions = new ArrayList<>(); | ||
|
|
||
| BiConsumer<GridCacheVersion, Set<Short>> calcTransactionHash = (xid, partNodes) -> { | ||
| for (short shortId: partNodes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeId?
...nite/internal/processors/cache/persistence/snapshot/IncrementalSnapshotVerificationTask.java
Outdated
Show resolved
Hide resolved
| Map<GridCacheVersion, Set<Short>> txPrimPartNodes = new HashMap<>(); | ||
| Map<Short, PartitionHashHolder> nodesTxHash = new HashMap<>(); | ||
|
|
||
| Set<GridCacheVersion> partialCommittedTxs = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partiallyCommittedTxs here and in all other places?
| Map<Short, PartitionHashHolder> nodesTxHash = new HashMap<>(); | ||
|
|
||
| Set<GridCacheVersion> partialCommittedTxs = new HashSet<>(); | ||
| Map<PartitionKeyV2, PartitionHashHolder> partMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add comment that hashes in this map calculated based on WAL records, not part-X.bin data so they will differs from idle_verify produced hashes.
| hash.increment(valHash, verHash); | ||
| } | ||
| catch (IgniteCheckedException ex) { | ||
| exceptions.add(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to continue task in case of any error?
Looks like we can just skip all further actions like:
proc.process(dataEntry -> {
if (dataEntry.op() == GridCacheOperation.READ || !exceptions.isEmpty())
return;
...
}, txRec -> {
if (!exceptions.isEmpty())
return;
if (log.isDebugEnabled())
log.debug("Checking tx record [txRec=" + txRec + ']');
| /** Transaction hashes collection. */ | ||
| private Map<Object, TransactionsHashRecord> txHashRes; | ||
|
|
||
| /** Partition hashes collection. It is calculated on data entries included into only incremental part of snapshot. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** Partition hashes collection. It is calculated on data entries included into only incremental part of snapshot. */ | |
| /** | |
| * Partition hashes collection. | |
| * Hash of data entries({@link DataEntry}) from WAL segments included in incremental snapshot. | |
| */ |
| null) | ||
| )); | ||
|
|
||
| return new IncrementalSnapshotVerificationTaskResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's locally log verification results here on info level.
Including number of dataEntry (proc.applied) and transactions
addbcee to
91ccb4e
Compare
|
SonarCloud Quality Gate failed. |








Thank you for submitting the pull request to the Apache Ignite.
In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:
The Contribution Checklist
The description explains WHAT and WHY was made instead of HOW.
The following pattern must be used:
IGNITE-XXXX Change summarywhereXXXX- number of JIRA issue.(see the Maintainers list)
the
green visaattached to the JIRA ticket (see TC.Bot: Check PR)Notes
If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.