Skip to content

Conversation

@Vladsz83
Copy link
Contributor

@Vladsz83 Vladsz83 commented Aug 10, 2021

This PR extends 'static' encrypted snapshots #9269.

The new general features are:

  1. Dynamic restoration of encrypted snapshot on a running node ( IgniteSnapshotManager#restoreSnapshot() )
  2. Validation of encrypted snapshot ( IgniteSnapshotManager#checkSnapshot() )

Same master key is still required.

}

/**
* @return Chipered encryption key for this cache or cache group. {@code Null} if not encrypted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, replace "Chipered" with "Ciphered"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}

this.encrGrpIds.addAll(encrGrpIds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You guard initialization of encGrpIds with this, then it's assumed that code runs in concurrent env. But the map is an ordinary HashMap, and it's allowed to being modified (addAll) in separate thread without sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous comment you said that it's possible for encrGrpIds be nulled after encrGrpIds != null. So we need a guard for this case there too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Once encrGrpIds gets assigned, it never takes null. User cannot null it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no encrypted groups ids reside in SnapshotMeta any more. inactual

* @return {@code True} if cache group is encrypted. {@code False} otherwise.
*/
public boolean isCacheGroupEncrypted(int grpId) {
Set<Integer> encrGrpIds = this.encrGrpIds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a purpose of this local var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid NPE if class member has been nulled after encrGrpIds != null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the reference encrGrpIds is not synced: reference isn't volatile, method doesn't acquire a lock. So this reference isn't observable for other threads within this method, so this var doesn't make sense.

Make var volatile, or use lock there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Local variable which is not affected by other thred. It guaranties the reference won't change during several reads. Please check #9377 , EncryptedSnapshotTest.testNPE(). Try uncomment // Set<Integer> encrGrpIds = this.encrGrpIds; in SnapshotMetadata.isCacheGroupEncrypted()

Copy link
Contributor Author

@Vladsz83 Vladsz83 Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is common technique to avoid NPE to me, introducing local varaible. But in our case, as mentioned, once encrGrpIds is assigned, it won't set to null. We can check it once. But for general purpose I wrote it correctly

Copy link
Member

@timoninmaxim timoninmaxim Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've initialized encrGrpIds with the DCL pattern (that requires the object to be declared with volatile keyword, and you missed it). It means that encrGrpIds is assigned from concurrent context, but you just copy thread local var (that is not observable from other threads and may can't see that encrGrpIds is initialized) to local var.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. This brings thread-safety. The test shows it. Makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No encrypted groups ids reside in SnapshotMeta any more. Inactual.

private final List<Integer> grpIds;

/** Encrypted group ids. */
private Set<Integer> encrGrpIds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be marked with GridToStringInclude? The same question for the masterKeyDigest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

for (SnapshotMetadata meta : nodeMetas)
for (SnapshotMetadata meta : nodeMetas) {
if (meta.masterKeyDigest() != null && !Arrays.equals(meta.masterKeyDigest(),
kctx0.config().getEncryptionSpi().masterKeyDigest())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEncryptionSpi().masterKeyDigest() performs KeystoreEncryptionSpi#makeDigest operation on every iteration for this loop. Let's make a var for it before this loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to get NPE for kctx0.config().getEncryptionSpi().masterKeyDigest() for config with disabled encryption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"performs KeystoreEncryptionSpi#makeDigest operation on every" But verifying snapshot is slow operation which involves many disk reading, page reading one by one from the partition files. Is it worth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It's possible to get NPE for kctx0.config().getEncryptionSpi().masterKeyDigest() for config with disabled encryption." Impossible. Toy can find if (cfg.getEncryptionSpi() == null) cfg.setEncryptionSpi(new NoopEncryptionSpi()); in IgniteEx.initializeDefaultSpi() and in EncryptedSnapshotTest.testValidatingSnapshotFailsWithNoEncryption

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth?

But masterKeyDigest() is method of public interface EncryptionSpi that can be overloaded with Ignite user, you never knows what is going under the hood. It can goes by to external storage over unreliable network. Let's prevent this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"prior to snapshot operation start: " + leftNodes));
}

if (!cctx.localNode().isClient() && cctx.kernalContext().encryption().isMasterKeyChangeInProgress()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the construction A && B || C the C operation will be invoked despite of whether the A is true of false. So it may lead to NPE, as cctx is null for client nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cctx is null for client, we fail at start: if (cctx.kernalContext().clientNode(). Also snapshots is disables for client nodes because clisents do not have data and have nothing to snapshot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we fail at start

Yes, so, do we need an explicit check for client, if it always pass or raise NPE? Maybe replace it with assert cctx != null?

Copy link
Contributor Author

@Vladsz83 Vladsz83 Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. We do not affect snapshot-on-client behavior in this ticket. It has been working well till now. Shapshots-on-client hasn't raised NPE before the ticket. I believe, because this DistributedProcess doesn't work on client regardless encrypted or not. There is cctx.kernalContext().clientNode() before. It hasn't been asserted earlier. I don't think we should care.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. GridCacheContext is null for client node in GridQueryProcessor only.

So, what about to fix boolean logic then - A && (B || C)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# Conflicts:
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotPartitionsVerifyTask.java
#	modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
#	modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
#	modules/core/src/test/java/org/apache/ignite/testsuites/IgniteBasicWithPersistenceTestSuite.java
"The previous change was not completed."));
}

if (ctx.cache().context().snapshotMgr().isSnapshotCreating()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if snapshot create/restore started after this check and before the next line masterKeyChangeRequest = req;? Do we need a lock (instead of volatile var) that syncs changing of master key and snapshot process?

I see it like this:

  1. SnapshotManager tries to acquire the lock on master key, then wait for lock if other process own it.
  2. MasterKeyChangeProcess tries to acquire lock, and if failed then return error to user (without wait for lock).

WDYT? Is the sync of those processes are already guaranteed with different approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master key change and snapshots are distributed processes based on discovery messages which are guaranteed be sequential. Once one process is active, another process will see this state. There is no pure concurrency. Master key change isn't allowed during snapshot. Snapshot is not allowed during master key change and re-encryption. You can check testSnapshotFailsDuringCacheKeyChange(), testSnapshotFailsDuringMasterKeyChange(), testMasterKeyChangeDuringSnapshot(), testMasterKeyChangeDuringRestore(), testReencryptionDuringRestore(), testReencryptionDuringSnapshot()


GridPlainClosure2<Collection<byte[]>, byte[], IgniteInternalFuture<Boolean>> startCacheClsr =
(grpKeys, masterKeyDigest) -> {
List<DynamicCacheChangeRequest> srvReqs = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please revert tabulation of this block? It's pretty hard to understand what is exactly changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we follow the checkstyle? This is the code format. What's wrong?

"prior to snapshot operation start: " + leftNodes));
}

if (!cctx.localNode().isClient() && cctx.kernalContext().encryption().isMasterKeyChangeInProgress()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we fail at start

Yes, so, do we need an explicit check for client, if it always pass or raise NPE? Maybe replace it with assert cctx != null?

* @return {@code True} if cache group is encrypted. {@code False} otherwise.
*/
public boolean isCacheGroupEncrypted(int grpId) {
Set<Integer> encrGrpIds = this.encrGrpIds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the reference encrGrpIds is not synced: reference isn't volatile, method doesn't acquire a lock. So this reference isn't observable for other threads within this method, so this var doesn't make sense.

Make var volatile, or use lock there.

}
}

this.encrGrpIds.addAll(encrGrpIds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous comment you said that it's possible for encrGrpIds be nulled after encrGrpIds != null. So we need a guard for this case there too.

encrKey != null ? masterKeyDigest : null);

if (encrKey != null)
req.encryptionKeyId(encrKey.id());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this setting inside the prepareCacheChangeRequest method. Now you have 2 places to fill DynamicCacheChangeRequest with info about encrypted key - key and id are set separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

for (SnapshotMetadata meta : nodeMetas)
for (SnapshotMetadata meta : nodeMetas) {
if (meta.masterKeyDigest() == null)
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the meta partitions from the grpIds collection (see 1113 line)? Now your changes affect only encrypted cached, what about non-encrypted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "meta partitions"? it is snapshot metadata. We check encryption, master keys here. However, another fixes applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Collection<ClusterNode> bltNodes = F.view(cctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
(node) -> CU.baselineNode(node, kctx0.state().clusterState()));

kctx0.task().setThreadContext(TC_SKIP_AUTH, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It duplicates lines below (L1132). What is a purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

*/
private static class SnapshotEncrKeyProvider implements EncryptionCacheKeyProvider {
/** Kernal context */
private GridKernalContext ctx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, make field final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* Cleans persistence directory including snapshots.
*/
protected void cleanPersistenceDir() throws Exception {
cleanPersistenceDir(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like an idea that we change a signature of this wide used method. It is extremely rare case to not clean snapshots in tests. What are alternatives? For example, create snapshots in different directory for tests that don't clean them, or introduce a method for copy-pasting snapshots. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we didn't change the signature. We added new method. You can use previous method as usual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# Conflicts:
#	modules/core/src/main/java/org/apache/ignite/internal/managers/encryption/GridEncryptionManager.java
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/file/EncryptedFileIOFactory.java
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/file/FilePageStoreManager.java
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotFutureTask.java
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
#	modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/AbstractSnapshotSelfTest.java
#	modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/EncryptedSnapshotTest.java
#	modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotCheckTest.java
#	modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
@Vladsz83 Vladsz83 changed the title IGNITE-14999v3 IGNITE-14999v3 : Support dynamic restoration of encrypted snapshots. Oct 8, 2021
# Conflicts:
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheUtils.java
# Conflicts:
#	modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
@NSAmelchev NSAmelchev merged commit 84d005f into apache:master Mar 5, 2022
@Vladsz83 Vladsz83 deleted the IGNITE-14999v3 branch March 5, 2022 13:04
xintrian pushed a commit to xintrian/ignite that referenced this pull request May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants