Skip to content

Fix delete dead lock.#12299

Merged
JackieTien97 merged 1 commit intoapache:masterfrom
ColinLeeo:colin_fix_delete_deadLock
Apr 9, 2024
Merged

Fix delete dead lock.#12299
JackieTien97 merged 1 commit intoapache:masterfrom
ColinLeeo:colin_fix_delete_deadLock

Conversation

@ColinLeeo
Copy link
Copy Markdown
Collaborator

@ColinLeeo ColinLeeo commented Apr 7, 2024

The current system has a deadlock problem when the deletion is concurrent with query and insertion:

  1. Query worker 's num is limit as core num.
  2. Deletion will acquire Dataregion's write lock and Datanode Schemacache's write lock.
  3. Insert or query will request additional queries to obtain metadata in the case of acquiring a read lock.

DeadLock cycle like this:

graph BT
A[#QueryWorker# get one core to run query]
B[#QueryWorker# hold read lock on dataregion]
A-->B
C[#DeleteData# hold write lock on dataregion]
D[#DeleteData# hold write lock on datanode schema]
C-->D
E[#InsertData# holdread lock on datanode schema]
F[#InsertData# get one core to run fetch schema]
E-->F
B-->C
F-->A
D-->E
Loading

In this pr:

  1. Optimize the lock order to prevent deadlocks
  2. Add invalidate last cache operation. The deletion operation does not need to clean up all caches, but should clean the corresponding last cache.
  3. You can use the following patch to reproduce the deadlock problem.
    0001-to-cause-dead-lock-when-delete.patch
    In this patch, datanodeCache will always be empty, so query or insert will fetch remote schema. Besides the query worker's num is limit to 1 and the program will wait about 20s/30s after acquiring the critical lock.
    To reproduce the problem, run these three queries in three sessions within 30s:
    session1: insert into root.sg1.t1(t) values(1);
    session2: delete from root.sg1.t1.**;
    session3: select * from root.sg1.**;

@ColinLeeo ColinLeeo force-pushed the colin_fix_delete_deadLock branch 2 times, most recently from 132f78e to f0c97a5 Compare April 7, 2024 14:47
Comment on lines +301 to +305
if (cacheEntry.getLastCacheContainer().getCachedLast() != null) {
cacheStats.decreaseMemoryUsage(
cacheEntry.getLastCacheContainer().getCachedLast().getSize());
}
DataNodeLastCacheManager.invalidateLastCache(cacheEntry);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't usesynchronized (entry) here?

@Override
public int updateValue(int index, SchemaCacheEntry value) {
DataNodeLastCacheManager.invalidateLastCache(value);
return 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should return the diff memory size instead of 0.

private final long deleteStartTime;
private final long deleteEndTime;

private final boolean isDeleteTimeseries;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments here to explain why this field doesn't need to be serialized (because it will always be false except for that in delete timesereis, and in which case we will set this to true in DataNodeInternalRPCServiceImpl.deleteDataForDeleteSchema.

Suggested change
private final boolean isDeleteTimeseries;
private final transient boolean isDeleteTimeseries;

}

private void resetLastCacheWhenLoadingTsFile() throws IllegalPathException {
private void resetLastCacheWhenLoadingTsFile() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call DataNodeSchemaCache.getInstance().unsetLastCacheInDataRegion(getDatabaseName())?

@ColinLeeo ColinLeeo force-pushed the colin_fix_delete_deadLock branch 3 times, most recently from a6f252a to 13e658b Compare April 8, 2024 07:53
measurementFilter = m -> true;
}
if (deviceFilter == null) {
deviceFilter = d -> ((PartialPath) d).equals(devicePath.getFullPath());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deviceFilter = d -> ((PartialPath) d).equals(devicePath.getFullPath());
deviceFilter = d -> ((PartialPath) d).equals(devicePath);

@ColinLeeo ColinLeeo force-pushed the colin_fix_delete_deadLock branch from 5fce2fc to afae3fe Compare April 9, 2024 07:07
@JackieTien97 JackieTien97 merged commit e106654 into apache:master Apr 9, 2024
@ColinLeeo ColinLeeo changed the title fix delete dead lock. Fix delete dead lock. Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants