Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: replace Modified+Load with ReloadIfChanged #898

Merged
merged 2 commits into from
May 14, 2021

Conversation

giuseppe
Copy link
Member

simplify the pattern if store.Modified() => store.Load() with a single function call store.ReloadIfChanged().

In addition to the code simplification, it solves a race condition that when only a RLock() is hold and that multiple goroutines could
run the sequence store.Modified() and store.Load(). In this case only one goroutine would see store.Modified( and run store.Load().
Other goroutines could access the old data while the goroutine that saw the change modifies the store data.

Closes: #880

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Fix
LGTM

@nalind
Copy link
Member

nalind commented May 12, 2021

The diagnosis at #880 (comment) looks good, but I'm missing how this keeps one goroutine that's using the data without having taken the loadMut mutex from having the slice it's iterating over replaced, while it's running and unaware, by another goroutine that has the mutex.

add a new method to reload the store information from the hard disk if
has changed.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
simplify the pattern if store.Modified() => store.Load() with a single
function call store.ReloadIfChanged().

In addition to the code simplification, it solves a race condition
that when only a RLock() is hold and that multiple goroutines could
run the sequence store.Modified() and store.Load().  In this case only
one goroutine would see store.Modified( and run store.Load().

Other goroutines could access the old data while the goroutine that
saw the change modifies the store data.

Closes: containers#880

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

The diagnosis at #880 (comment) looks good, but I'm missing how this keeps one goroutine that's using the data without having taken the loadMut mutex from having the slice it's iterating over replaced, while it's running and unaware, by another goroutine that has the mutex.

that is protected by the other locking mechanisms we have. A goroutine that is accessing the data must already hold a RO or a RW lock. A goroutine checks for ReloadIfChanged() while holding the RO lock. No changes could have happened to the storage between a goroutine got the RO lock and it accesses the data because another goroutine/process needed to get the RW lock before it could make any change. Since a goroutine is already holding a RO lock (thus blocking other writers), it means that any other goroutine getting into the ReloadIfChanged() code block won't hit the r.Modified() condition.

The issue I've seen only happens when there are multiple goroutines checking r.Modified() together and not syncing on the Load() that only one goroutine does.

Goroutine 1 Process 2 Goroutine 3
GetROLock()
ReloadIfChanged() [might trigger a Load()] GetRWLock()
[use the store data] [wait] GetROLock()
[use the store data] [wait] ReloadIfChanged()

ReloadIfChanged() should never cause a Load() because goroutine 1 already checked while holding the RO lock thus blocking Process 2.

One possible race condition though is that we attempt the Load also on errors:

	modified, err := r.Modified()
	if err != nil || modified {
		return r.Load()
	}

A temporary error could cause the race as it calls Load() while there are other goroutines running. I've pushed a new version where I change the condition to err == nil && modified

@giuseppe
Copy link
Member Author

does the explanation above make sense? Are there other cases where a race can happen?

@nalind
Copy link
Member

nalind commented May 14, 2021

I was thinking of the case where one thread is calling a method like store.Images(), which takes a read lock, but another is in a method like store.ImageSize(), which also takes a read lock, and one of them alters the data the object is holding while the other is iterating through it. It's not exclusive to this pair of methods, but that's a window I don't see us closing here.

@giuseppe
Copy link
Member Author

I was thinking of the case where one thread is calling a method like store.Images(), which takes a read lock, but another is in a method like store.ImageSize(), which also takes a read lock, and one of them alters the data the object is holding while the other is iterating through it. It's not exclusive to this pair of methods, but that's a window I don't see us closing here.

That is the case I've tried to explain with the table.

If one thread is calling store.Images() while it holds a read lock, it already went through the ReloadIfChanged() check. Holding the RO lock prevents other processes to modify the store. Another thread that gets the RO lock and gets into ReloadIfChanged(), will either 1) wait that for the first thread to complete ReloadIfChanged(), then Modified() returns false 2) don't block on the lock but Modified() still returns false so it doesn't reload the store while the first thread is already accessing its data.
This assumption is based on the fact that any reload is made while holding the RO so other threads/processes cannot modify the store in the meanwhile. The store could be modified only when there are no RO locks in place.

@nalind
Copy link
Member

nalind commented May 14, 2021

Hmm, okay, that makes sense. LGTM.

@rhatdan rhatdan merged commit 1b86066 into containers:master May 14, 2021
@vrothberg
Copy link
Member

@haircommander, that should fix the issue in CRI-O

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic: runtime error: index out of range
4 participants