Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: fix NullPointerException when calling InMemoryLockManager#release using Hadoop catalog #8494

Merged
merged 2 commits into from
Oct 30, 2023

Conversation

wangtaohz
Copy link
Contributor

@wangtaohz wangtaohz commented Sep 5, 2023

This is a fix for an NPE problem that occurs when calling InMemoryLockManager#release using Hadoop catalog, as described in this issue(#4550). The problem arises when lockManager#release is finally called after renaming metadata (HadoopTableOperations#renameToFinal).

The immediate cause of NPE is that the locked entityId has been removed from the HEARTBEATS, but still exists in LOCKS. Some concurrent operations, such as calling InMemoryLockManager#close, may result in this situation.

This NPE can potentially cause the manifest list files to be cleaned (SnapshotProducer#cleanAll). However, this could be a mistaken cleanup since when the NPE is thrown, the renaming metadata file may have already been completed. In this case, Iceberg considers the operation to be completed (although the version-hint.text has not yet been updated).

Closes #4550.

@github-actions github-actions bot added the core label Sep 5, 2023
Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes make sense, can you please add a simple assertThat(lockManager.release(null, ownerId)).isTrue(); check to TestInMemoryLockManager#testReleaseAndAcquire()?

@wangtaohz
Copy link
Contributor Author

changes make sense, can you please add a simple assertThat(lockManager.release(null, ownerId)).isTrue(); check to TestInMemoryLockManager#testReleaseAndAcquire()?

Thanks for your reply!

I would be happy to add some test cases, but if I understand correctly, the test for assertThat(lockManager.release(null, ownerId)).isTrue(); will fail, because entityId cannot be null when calling release and get it from LOCKS.

    @Override
    public boolean release(String entityId, String ownerId) {
      InMemoryLockContent currentContent = LOCKS.get(entityId);
      ...
      ...
    }

Do you mean that we should support releasing with a null value for the entityId? @nastra

@nastra
Copy link
Contributor

nastra commented Sep 11, 2023

The underlying issue is most likely a race condition because LOCKS and HEARTBEATS are updated from different places that aren't guarded by a synchronization lock. Fixing the issue as is proposed in this PR makes sense.

@jackye1995 or @amogh-jahagirdar could you review this as well please?

@wangtaohz
Copy link
Contributor Author

@nastra Can this PR be merged?

In fact, we have been using the fixed code for months. If it can be merged, it will bring convenience to our code maintenance. 😄

@nastra
Copy link
Contributor

nastra commented Oct 30, 2023

@wangtaohz sorry for the delay, I was waiting for input of others. Given that this has been open for quite a while, we can go ahead and merge this.

@nastra nastra merged commit 64e2deb into apache:main Oct 30, 2023
41 checks passed
@wangtaohz wangtaohz deleted the fix-4550 branch October 31, 2023 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants