-
Notifications
You must be signed in to change notification settings - Fork 1.9k
IGNITE-14999v3 : Support dynamic restoration of encrypted snapshots. #9313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| /** | ||
| * @return Chipered encryption key for this cache or cache group. {@code Null} if not encrypted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, replace "Chipered" with "Ciphered"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| } | ||
| } | ||
|
|
||
| this.encrGrpIds.addAll(encrGrpIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You guard initialization of encGrpIds with this, then it's assumed that code runs in concurrent env. But the map is an ordinary HashMap, and it's allowed to being modified (addAll) in separate thread without sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous comment you said that it's possible for encrGrpIds be nulled after encrGrpIds != null. So we need a guard for this case there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Once encrGrpIds gets assigned, it never takes null. User cannot null it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no encrypted groups ids reside in SnapshotMeta any more. inactual
| * @return {@code True} if cache group is encrypted. {@code False} otherwise. | ||
| */ | ||
| public boolean isCacheGroupEncrypted(int grpId) { | ||
| Set<Integer> encrGrpIds = this.encrGrpIds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a purpose of this local var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid NPE if class member has been nulled after encrGrpIds != null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the reference encrGrpIds is not synced: reference isn't volatile, method doesn't acquire a lock. So this reference isn't observable for other threads within this method, so this var doesn't make sense.
Make var volatile, or use lock there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Local variable which is not affected by other thred. It guaranties the reference won't change during several reads. Please check #9377 , EncryptedSnapshotTest.testNPE(). Try uncomment // Set<Integer> encrGrpIds = this.encrGrpIds; in SnapshotMetadata.isCacheGroupEncrypted()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is common technique to avoid NPE to me, introducing local varaible. But in our case, as mentioned, once encrGrpIds is assigned, it won't set to null. We can check it once. But for general purpose I wrote it correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've initialized encrGrpIds with the DCL pattern (that requires the object to be declared with volatile keyword, and you missed it). It means that encrGrpIds is assigned from concurrent context, but you just copy thread local var (that is not observable from other threads and may can't see that encrGrpIds is initialized) to local var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. This brings thread-safety. The test shows it. Makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No encrypted groups ids reside in SnapshotMeta any more. Inactual.
| private final List<Integer> grpIds; | ||
|
|
||
| /** Encrypted group ids. */ | ||
| private Set<Integer> encrGrpIds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be marked with GridToStringInclude? The same question for the masterKeyDigest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| for (SnapshotMetadata meta : nodeMetas) | ||
| for (SnapshotMetadata meta : nodeMetas) { | ||
| if (meta.masterKeyDigest() != null && !Arrays.equals(meta.masterKeyDigest(), | ||
| kctx0.config().getEncryptionSpi().masterKeyDigest())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getEncryptionSpi().masterKeyDigest() performs KeystoreEncryptionSpi#makeDigest operation on every iteration for this loop. Let's make a var for it before this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to get NPE for kctx0.config().getEncryptionSpi().masterKeyDigest() for config with disabled encryption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"performs KeystoreEncryptionSpi#makeDigest operation on every" But verifying snapshot is slow operation which involves many disk reading, page reading one by one from the partition files. Is it worth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It's possible to get NPE for kctx0.config().getEncryptionSpi().masterKeyDigest() for config with disabled encryption." Impossible. Toy can find if (cfg.getEncryptionSpi() == null) cfg.setEncryptionSpi(new NoopEncryptionSpi()); in IgniteEx.initializeDefaultSpi() and in EncryptedSnapshotTest.testValidatingSnapshotFailsWithNoEncryption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth?
But masterKeyDigest() is method of public interface EncryptionSpi that can be overloaded with Ignite user, you never knows what is going under the hood. It can goes by to external storage over unreliable network. Let's prevent this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| "prior to snapshot operation start: " + leftNodes)); | ||
| } | ||
|
|
||
| if (!cctx.localNode().isClient() && cctx.kernalContext().encryption().isMasterKeyChangeInProgress() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the construction A && B || C the C operation will be invoked despite of whether the A is true of false. So it may lead to NPE, as cctx is null for client nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If cctx is null for client, we fail at start: if (cctx.kernalContext().clientNode(). Also snapshots is disables for client nodes because clisents do not have data and have nothing to snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we fail at start
Yes, so, do we need an explicit check for client, if it always pass or raise NPE? Maybe replace it with assert cctx != null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. We do not affect snapshot-on-client behavior in this ticket. It has been working well till now. Shapshots-on-client hasn't raised NPE before the ticket. I believe, because this DistributedProcess doesn't work on client regardless encrypted or not. There is cctx.kernalContext().clientNode() before. It hasn't been asserted earlier. I don't think we should care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. GridCacheContext is null for client node in GridQueryProcessor only.
So, what about to fix boolean logic then - A && (B || C)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
# Conflicts: # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotPartitionsVerifyTask.java # modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java # modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java # modules/core/src/test/java/org/apache/ignite/testsuites/IgniteBasicWithPersistenceTestSuite.java
| "The previous change was not completed.")); | ||
| } | ||
|
|
||
| if (ctx.cache().context().snapshotMgr().isSnapshotCreating() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if snapshot create/restore started after this check and before the next line masterKeyChangeRequest = req;? Do we need a lock (instead of volatile var) that syncs changing of master key and snapshot process?
I see it like this:
- SnapshotManager tries to acquire the lock on master key, then wait for lock if other process own it.
- MasterKeyChangeProcess tries to acquire lock, and if failed then return error to user (without wait for lock).
WDYT? Is the sync of those processes are already guaranteed with different approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Master key change and snapshots are distributed processes based on discovery messages which are guaranteed be sequential. Once one process is active, another process will see this state. There is no pure concurrency. Master key change isn't allowed during snapshot. Snapshot is not allowed during master key change and re-encryption. You can check testSnapshotFailsDuringCacheKeyChange(), testSnapshotFailsDuringMasterKeyChange(), testMasterKeyChangeDuringSnapshot(), testMasterKeyChangeDuringRestore(), testReencryptionDuringRestore(), testReencryptionDuringSnapshot()
|
|
||
| GridPlainClosure2<Collection<byte[]>, byte[], IgniteInternalFuture<Boolean>> startCacheClsr = | ||
| (grpKeys, masterKeyDigest) -> { | ||
| List<DynamicCacheChangeRequest> srvReqs = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please revert tabulation of this block? It's pretty hard to understand what is exactly changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we follow the checkstyle? This is the code format. What's wrong?
| "prior to snapshot operation start: " + leftNodes)); | ||
| } | ||
|
|
||
| if (!cctx.localNode().isClient() && cctx.kernalContext().encryption().isMasterKeyChangeInProgress() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we fail at start
Yes, so, do we need an explicit check for client, if it always pass or raise NPE? Maybe replace it with assert cctx != null?
| * @return {@code True} if cache group is encrypted. {@code False} otherwise. | ||
| */ | ||
| public boolean isCacheGroupEncrypted(int grpId) { | ||
| Set<Integer> encrGrpIds = this.encrGrpIds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the reference encrGrpIds is not synced: reference isn't volatile, method doesn't acquire a lock. So this reference isn't observable for other threads within this method, so this var doesn't make sense.
Make var volatile, or use lock there.
| } | ||
| } | ||
|
|
||
| this.encrGrpIds.addAll(encrGrpIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous comment you said that it's possible for encrGrpIds be nulled after encrGrpIds != null. So we need a guard for this case there too.
| encrKey != null ? masterKeyDigest : null); | ||
|
|
||
| if (encrKey != null) | ||
| req.encryptionKeyId(encrKey.id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this setting inside the prepareCacheChangeRequest method. Now you have 2 places to fill DynamicCacheChangeRequest with info about encrypted key - key and id are set separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| for (SnapshotMetadata meta : nodeMetas) | ||
| for (SnapshotMetadata meta : nodeMetas) { | ||
| if (meta.masterKeyDigest() == null) | ||
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the meta partitions from the grpIds collection (see 1113 line)? Now your changes affect only encrypted cached, what about non-encrypted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "meta partitions"? it is snapshot metadata. We check encryption, master keys here. However, another fixes applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| Collection<ClusterNode> bltNodes = F.view(cctx.discovery().serverNodes(AffinityTopologyVersion.NONE), | ||
| (node) -> CU.baselineNode(node, kctx0.state().clusterState())); | ||
|
|
||
| kctx0.task().setThreadContext(TC_SKIP_AUTH, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It duplicates lines below (L1132). What is a purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| */ | ||
| private static class SnapshotEncrKeyProvider implements EncryptionCacheKeyProvider { | ||
| /** Kernal context */ | ||
| private GridKernalContext ctx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, make field final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| * Cleans persistence directory including snapshots. | ||
| */ | ||
| protected void cleanPersistenceDir() throws Exception { | ||
| cleanPersistenceDir(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like an idea that we change a signature of this wide used method. It is extremely rare case to not clean snapshots in tests. What are alternatives? For example, create snapshots in different directory for tests that don't clean them, or introduce a method for copy-pasting snapshots. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we didn't change the signature. We added new method. You can use previous method as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
# Conflicts: # modules/core/src/main/java/org/apache/ignite/internal/managers/encryption/GridEncryptionManager.java # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/file/EncryptedFileIOFactory.java # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/file/FilePageStoreManager.java # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java # modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/AbstractSnapshotSelfTest.java # modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/EncryptedSnapshotTest.java # modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java # modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
# Conflicts: # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheUtils.java
# Conflicts: # modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
This PR extends 'static' encrypted snapshots #9269.
The new general features are:
IgniteSnapshotManager#restoreSnapshot())IgniteSnapshotManager#checkSnapshot())Same master key is still required.