Prevent duplicate SharedShardContext.readerId #15520

jeeminso · 2024-02-06T01:48:11Z

Summary of the changes / Why this improves CrateDB

As shown, SharedShardContext.readerId is duplicated causing the exception:

To my understanding, the only purpose of readerId is to be used as the keys for CollectTask.searchers. If so, we can replace it with shardId which is unique by design and prevents the race that caused the duplicate ids. See #15520 (comment) for details.

Checklist

Added an entry in the latest docs/appendices/release-notes/<x.y.0>.rst for user facing changes
Updated documentation & sql_features table for user facing changes
Touched code is covered by tests
CLA is signed
This does not contain breaking changes, or if it does:
- It is released within a major release
- It is recorded in the latest docs/appendices/release-notes/<x.y.0>.rst
- It was marked as deprecated in an earlier release if possible
- You've thought about the consequences and other components are adapted
  (E.g. AdminUI)

romseygeek · 2024-02-06T14:57:38Z

server/src/main/java/io/crate/execution/jobs/SharedShardContexts.java

@@ -102,9 +101,8 @@ public SharedShardContext getOrCreateContext(ShardId shardId) throws IndexNotFou
        SharedShardContext sharedShardContext = allocatedShards.get(shardId);
        if (sharedShardContext == null) {
            IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex());
-            sharedShardContext = new SharedShardContext(indexService, shardId, readerId, wrapSearcher);
+            sharedShardContext = new SharedShardContext(indexService, shardId, shardId.hashCode(), wrapSearcher);


If these are always tied to a shard, and there is only a single context per shard for a given job, then do we even need the separate reader ID?

Thank you as you said it looks to me that we can eliminate reader ID completely. @mfussenegger could you clarify if this makes sense?

In the fetch case the readerId is used to encode into a FetchId which shard is used, so that we can lookup a document from the correct shard. See:

crate/server/src/main/java/io/crate/planner/ReaderAllocations.java

Lines 41 to 70 in b2f20a3

ReaderAllocations(TreeMap<String, Integer> bases,

Map<String, Map<Integer, String>> shardNodes,

Map<RelationName, Collection<String>> tableIndices) {

this.bases = bases;

this.tableIndices = tableIndices;

this.indicesToIdents = new HashMap<>(tableIndices.values().size());

for (Map.Entry<RelationName, Collection<String>> entry : tableIndices.entrySet()) {

for (String index : entry.getValue()) {

indicesToIdents.put(index, entry.getKey());

}

}

for (Map.Entry<String, Integer> entry : bases.entrySet()) {

readerIndices.put(entry.getValue(), entry.getKey());

}

for (Map.Entry<String, Map<Integer, String>> entry : shardNodes.entrySet()) {

Integer base = bases.get(entry.getKey());

if (base == null) {

continue;

}

for (Map.Entry<Integer, String> nodeEntries : entry.getValue().entrySet()) {

int readerId = base + nodeEntries.getKey();

IntSet readerIds = nodeReaders.get(nodeEntries.getValue());

if (readerIds == null) {

readerIds = new IntHashSet();

nodeReaders.put(nodeEntries.getValue(), readerIds);

}

readerIds.add(readerId);

}

}

}

crate/server/src/main/java/io/crate/execution/engine/fetch/FetchTask.java

Lines 293 to 297 in b2f20a3

int readerId = base + shardId.id();

SharedShardContext shardContext = shardContexts.get(readerId);

if (shardContext == null) {

try {

shardContext = sharedShardContexts.createContext(shardId, readerId);

crate/server/src/main/java/io/crate/execution/engine/fetch/FetchId.java

Line 39 in b2f20a3

public final class FetchId {

What'd be more interesting is how the increments can lead to duplicates. The whole code section is under the assumption that the preparation phase is run single threaded. See also #5248

If that assumption is no longer true - maybe due to #10373, or if the SharedShardContexts is accessed elsewhere, we probably need to change more than just the readerId.

Thank you for the pointers, I think it is due to #10373 which changes ShardCollectSource.getIterator() to return future type.

crate/server/src/main/java/io/crate/execution/engine/collect/sources/ShardCollectSource.java

Lines 435 to 441 in a177d69

CompletableFuture<BatchIterator<Row>> iterator = shardCollectorProvider

.awaitShardSearchActive()

.thenApply(batchIteratorFactory -> batchIteratorFactory.getIterator(

collectPhase,

requiresScroll,

collectTask

))

When more than one awaitShardSearchActive() completes, it would cause a race for a reader ID.

I tried to sequentialize the calls to SharedShardContexts.getOrCreateContext() which is the only place that assigns reader ids to SharedShardContext, 3da1073.

~~Hi @mfussenegger could you have a look if the fix is ok before I start adding tests? Now SharedShardContexts.getOrCreateContext() is only called from two places.~~ Sorry my last minute fix causes test failures, I will look at this first.

I think I'd prefer adding synchronization to the methods again. But first we should ensure that we've identified the real problem. Did you do that?

I think the real problem is #15520 (comment).

For example, If I attach a logger:

public SharedShardContext getOrCreateContext(ShardId shardId) throws IndexNotFoundException { LOGGER.info("Begin getOrCreateContext : " + Thread.currentThread()); SharedShardContext sharedShardContext = allocatedShards.get(shardId); if (sharedShardContext == null) { IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex()); sharedShardContext = new SharedShardContext(indexService, shardId, readerId, wrapSearcher); allocatedShards.put(shardId, sharedShardContext); readerId++; } LOGGER.info("End getOrCreateContext : " + Thread.currentThread()); return sharedShardContext; }

I can observe two thread being interleaved:

[2024-02-07T09:20:43,629][INFO ][i.c.e.j.SharedShardContexts] [Grand Parpaillon] Begin getOrCreateContext : Thread[#356,cratedb[Grand Parpaillon][listener][T#3],5,main] [2024-02-07T09:20:43,629][INFO ][i.c.e.j.SharedShardContexts] [Grand Parpaillon] Begin getOrCreateContext : Thread[#354,cratedb[Grand Parpaillon][listener][T#1],5,main] [2024-02-07T09:20:43,629][INFO ][i.c.e.j.SharedShardContexts] [Grand Parpaillon] End getOrCreateContext : Thread[#356,cratedb[Grand Parpaillon][listener][T#3],5,main] [2024-02-07T09:20:43,629][INFO ][i.c.e.j.SharedShardContexts] [Grand Parpaillon] End getOrCreateContext : Thread[#354,cratedb[Grand Parpaillon][listener][T#1],5,main]

right before the exception is thrown.

Can we then just revert parts of https://github.com/crate/crate/pull/5248/files and bring back the synchronization?

Thank you, reverted.

docs/appendices/release-notes/5.6.1.rst

BaurzhanSakhariev · 2024-02-08T20:05:02Z

#11677 will be also addressed by this?

jeeminso · 2024-02-08T21:48:00Z

#11677 will be also addressed by this?

It is hard to say without being able to reproduce.

I just look into SharedShardContexts being shared between fetch and collect could cause duplicate reader Ids (within SharedShardContexts.allocatedShards) ultimately throwing the same exception. I think it is not possible since we setup tasks one at a time:

crate/server/src/main/java/io/crate/execution/jobs/RootTask.java

Lines 203 to 206 in 6e7588d

    
           logger.trace("Starting task job={} phase={} name={}", jobId, phaseId, task.name()); 
        
           CompletableFuture<Void> started = task.start(); 
        
           if (started != null) { 
        
               return started.thenCompose(ignored -> start(taskIndex + 1));

Another possibility could be #15495 (comment) which I was looking into before Moll provided reproduction steps.

mfussenegger · 2024-02-12T13:57:43Z

server/src/main/java/io/crate/execution/jobs/SharedShardContexts.java

+            synchronized (this) {
+                sharedShardContext = allocatedShards.get(shardId);


Other allocatedShards accesses within the file afaik also need to be synchronized to ensure this works.

Thank you, It does look like createContext() needs to be synchronized but if there is a scenario where createContext() and getOrCreateContext() race for allocatedShards, shouldn't we fix it by not calling createContext()? Considering createContext() as a lightweight version of getOrCreateContext().

Oh I guess we need synchronization for allocatedShards.put() calls.

mfussenegger · 2024-02-13T09:38:27Z

docs/appendices/release-notes/5.6.2.rst

+- Fixed an issue that caused exceptions with messages like
+  'ShardCollectContext already added' in low heap situations causing multiple
+  shards to be idle and be active simultaneously.


I don't think this has anything to do with low heap, as the heap has no influence on whether shards are idle and active.

And I also don't think the amount of shards going from idle to active has any impact on if the race condition happens.

Suggested change

- Fixed an issue that caused exceptions with messages like

'ShardCollectContext already added' in low heap situations causing multiple

shards to be idle and be active simultaneously.

- Fixed a race condition that could lead to ``ShardCollectContext already

added`` errors when making a query after a table had been idle without any

accesses for a while.

mfussenegger · 2024-02-13T09:41:27Z

server/src/main/java/io/crate/execution/jobs/SharedShardContexts.java

@@ -40,15 +40,18 @@

 import com.carrotsearch.hppc.IntIndexedContainer;

+import io.crate.common.annotations.VisibleForTesting;
 import io.crate.metadata.IndexParts;

 @NotThreadSafe


Suggested change

@NotThreadSafe

by synchronizing part of `SharedShardContexts.getOrCreateContext()` that increments `readerId` and populates `allocatedShards`

jeeminso changed the title ~~Make SharedShardContext.readerId unique~~ Prevent duplicate SharedShardContext.readerId Feb 6, 2024

romseygeek reviewed Feb 6, 2024

View reviewed changes

jeeminso force-pushed the jeeminso/shard-context-read-id branch 9 times, most recently from 5811a37 to 381b0ee Compare February 8, 2024 03:11

jeeminso added the v/5.6 label Feb 8, 2024

jeeminso marked this pull request as ready for review February 8, 2024 03:11

BaurzhanSakhariev reviewed Feb 8, 2024

View reviewed changes

docs/appendices/release-notes/5.6.1.rst Outdated Show resolved Hide resolved

jeeminso force-pushed the jeeminso/shard-context-read-id branch from 381b0ee to 3928001 Compare February 8, 2024 14:09

jeeminso requested a review from mfussenegger February 12, 2024 13:52

mfussenegger reviewed Feb 12, 2024

View reviewed changes

jeeminso force-pushed the jeeminso/shard-context-read-id branch from a8007a7 to 46ebe4e Compare February 12, 2024 15:21

jeeminso requested a review from mfussenegger February 12, 2024 15:45

mfussenegger approved these changes Feb 13, 2024

View reviewed changes

Prevent duplicate SharedShardContext.readerId

636f55e

by synchronizing part of `SharedShardContexts.getOrCreateContext()` that increments `readerId` and populates `allocatedShards`

jeeminso force-pushed the jeeminso/shard-context-read-id branch from 46ebe4e to 636f55e Compare February 13, 2024 12:41

jeeminso added the ready-to-merge Let Mergify merge the PR once approved and checks pass label Feb 13, 2024

mergify bot merged commit bdd2c5e into master Feb 13, 2024
18 checks passed

mergify bot deleted the jeeminso/shard-context-read-id branch February 13, 2024 12:56

mergify bot mentioned this pull request Feb 13, 2024

Prevent duplicate SharedShardContext.readerId (backport #15520) #15549

Merged

jeeminso mentioned this pull request Feb 14, 2024

Assert collect task is not completed before fall-back to resume #15495

Merged

5 tasks

mfussenegger mentioned this pull request Jun 17, 2024

ShardCollectContext for 0 already added #11677

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent duplicate SharedShardContext.readerId #15520

Prevent duplicate SharedShardContext.readerId #15520

jeeminso commented Feb 6, 2024 •

edited

Loading

romseygeek Feb 6, 2024

jeeminso Feb 6, 2024

mfussenegger Feb 6, 2024

jeeminso Feb 6, 2024

jeeminso Feb 7, 2024

jeeminso Feb 7, 2024 •

edited

Loading

mfussenegger Feb 7, 2024

jeeminso Feb 7, 2024

mfussenegger Feb 7, 2024

jeeminso Feb 8, 2024

BaurzhanSakhariev commented Feb 8, 2024

jeeminso commented Feb 8, 2024

mfussenegger Feb 12, 2024

jeeminso Feb 12, 2024

jeeminso Feb 12, 2024

mfussenegger Feb 13, 2024

mfussenegger Feb 13, 2024

	ReaderAllocations(TreeMap<String, Integer> bases,
	Map<String, Map<Integer, String>> shardNodes,
	Map<RelationName, Collection<String>> tableIndices) {
	this.bases = bases;
	this.tableIndices = tableIndices;
	this.indicesToIdents = new HashMap<>(tableIndices.values().size());
	for (Map.Entry<RelationName, Collection<String>> entry : tableIndices.entrySet()) {
	for (String index : entry.getValue()) {
	indicesToIdents.put(index, entry.getKey());
	}
	}
	for (Map.Entry<String, Integer> entry : bases.entrySet()) {
	readerIndices.put(entry.getValue(), entry.getKey());
	}
	for (Map.Entry<String, Map<Integer, String>> entry : shardNodes.entrySet()) {
	Integer base = bases.get(entry.getKey());
	if (base == null) {
	continue;
	}
	for (Map.Entry<Integer, String> nodeEntries : entry.getValue().entrySet()) {
	int readerId = base + nodeEntries.getKey();
	IntSet readerIds = nodeReaders.get(nodeEntries.getValue());
	if (readerIds == null) {
	readerIds = new IntHashSet();
	nodeReaders.put(nodeEntries.getValue(), readerIds);
	}
	readerIds.add(readerId);
	}
	}
	}

	int readerId = base + shardId.id();
	SharedShardContext shardContext = shardContexts.get(readerId);
	if (shardContext == null) {
	try {
	shardContext = sharedShardContexts.createContext(shardId, readerId);

	CompletableFuture<BatchIterator<Row>> iterator = shardCollectorProvider
	.awaitShardSearchActive()
	.thenApply(batchIteratorFactory -> batchIteratorFactory.getIterator(
	collectPhase,
	requiresScroll,
	collectTask
	))

		synchronized (this) {
		sharedShardContext = allocatedShards.get(shardId);

Prevent duplicate SharedShardContext.readerId #15520

Prevent duplicate SharedShardContext.readerId #15520

Conversation

jeeminso commented Feb 6, 2024 • edited Loading

Summary of the changes / Why this improves CrateDB

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeeminso Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BaurzhanSakhariev commented Feb 8, 2024

jeeminso commented Feb 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeeminso commented Feb 6, 2024 •

edited

Loading

jeeminso Feb 7, 2024 •

edited

Loading