store: replace Modified+Load with ReloadIfChanged #898

giuseppe · 2021-05-12T20:41:11Z

simplify the pattern if store.Modified() => store.Load() with a single function call store.ReloadIfChanged().

In addition to the code simplification, it solves a race condition that when only a RLock() is hold and that multiple goroutines could
run the sequence store.Modified() and store.Load(). In this case only one goroutine would see store.Modified( and run store.Load().
Other goroutines could access the old data while the goroutine that saw the change modifies the store data.

Closes: #880

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

rhatdan

Nice Fix
LGTM

nalind · 2021-05-12T21:41:01Z

The diagnosis at #880 (comment) looks good, but I'm missing how this keeps one goroutine that's using the data without having taken the loadMut mutex from having the slice it's iterating over replaced, while it's running and unaware, by another goroutine that has the mutex.

add a new method to reload the store information from the hard disk if has changed. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

simplify the pattern if store.Modified() => store.Load() with a single function call store.ReloadIfChanged(). In addition to the code simplification, it solves a race condition that when only a RLock() is hold and that multiple goroutines could run the sequence store.Modified() and store.Load(). In this case only one goroutine would see store.Modified( and run store.Load(). Other goroutines could access the old data while the goroutine that saw the change modifies the store data. Closes: containers#880 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe · 2021-05-13T07:08:39Z

The diagnosis at #880 (comment) looks good, but I'm missing how this keeps one goroutine that's using the data without having taken the loadMut mutex from having the slice it's iterating over replaced, while it's running and unaware, by another goroutine that has the mutex.

that is protected by the other locking mechanisms we have. A goroutine that is accessing the data must already hold a RO or a RW lock. A goroutine checks for ReloadIfChanged() while holding the RO lock. No changes could have happened to the storage between a goroutine got the RO lock and it accesses the data because another goroutine/process needed to get the RW lock before it could make any change. Since a goroutine is already holding a RO lock (thus blocking other writers), it means that any other goroutine getting into the ReloadIfChanged() code block won't hit the r.Modified() condition.

The issue I've seen only happens when there are multiple goroutines checking r.Modified() together and not syncing on the Load() that only one goroutine does.

Goroutine 1	Process 2	Goroutine 3
GetROLock()
ReloadIfChanged() [might trigger a `Load()`]	GetRWLock()
[use the store data]	[wait]	GetROLock()
[use the store data]	[wait]	ReloadIfChanged()

ReloadIfChanged() should never cause a Load() because goroutine 1 already checked while holding the RO lock thus blocking Process 2.

One possible race condition though is that we attempt the Load also on errors:

	modified, err := r.Modified()
	if err != nil || modified {
		return r.Load()
	}

A temporary error could cause the race as it calls Load() while there are other goroutines running. I've pushed a new version where I change the condition to err == nil && modified

giuseppe · 2021-05-14T09:41:45Z

does the explanation above make sense? Are there other cases where a race can happen?

nalind · 2021-05-14T14:00:23Z

I was thinking of the case where one thread is calling a method like store.Images(), which takes a read lock, but another is in a method like store.ImageSize(), which also takes a read lock, and one of them alters the data the object is holding while the other is iterating through it. It's not exclusive to this pair of methods, but that's a window I don't see us closing here.

giuseppe · 2021-05-14T14:15:21Z

I was thinking of the case where one thread is calling a method like store.Images(), which takes a read lock, but another is in a method like store.ImageSize(), which also takes a read lock, and one of them alters the data the object is holding while the other is iterating through it. It's not exclusive to this pair of methods, but that's a window I don't see us closing here.

That is the case I've tried to explain with the table.

If one thread is calling store.Images() while it holds a read lock, it already went through the ReloadIfChanged() check. Holding the RO lock prevents other processes to modify the store. Another thread that gets the RO lock and gets into ReloadIfChanged(), will either 1) wait that for the first thread to complete ReloadIfChanged(), then Modified() returns false 2) don't block on the lock but Modified() still returns false so it doesn't reload the store while the first thread is already accessing its data.
This assumption is based on the fact that any reload is made while holding the RO so other threads/processes cannot modify the store in the meanwhile. The store could be modified only when there are no RO locks in place.

nalind · 2021-05-14T15:13:17Z

Hmm, okay, that makes sense. LGTM.

vrothberg · 2021-05-17T07:42:51Z

@haircommander, that should fix the issue in CRI-O

giuseppe mentioned this pull request May 12, 2021

panic: runtime error: index out of range #880

Closed

giuseppe force-pushed the fix-race-reload branch from 0bc23dd to dae2ee4 Compare May 12, 2021 20:45

rhatdan approved these changes May 12, 2021

View reviewed changes

giuseppe added 2 commits May 13, 2021 09:05

store: new method ROFileBasedStore.ReloadIfChanged()

0f5231b

add a new method to reload the store information from the hard disk if has changed. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the fix-race-reload branch from dae2ee4 to d9bf196 Compare May 13, 2021 07:06

rhatdan merged commit 1b86066 into containers:master May 14, 2021

tyler92 mentioned this pull request Sep 5, 2022

Fix data race between ReloadIfChanged and read-only usage of layerStore #1322

Merged

This was referenced Nov 8, 2022

layerStore.Load under layerStore.RLock is racy #1332

Closed

In-process exclusion of readers and writers #1346

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: replace Modified+Load with ReloadIfChanged #898

store: replace Modified+Load with ReloadIfChanged #898

giuseppe commented May 12, 2021

rhatdan left a comment

nalind commented May 12, 2021

giuseppe commented May 13, 2021

giuseppe commented May 14, 2021

nalind commented May 14, 2021

giuseppe commented May 14, 2021

nalind commented May 14, 2021

vrothberg commented May 17, 2021

store: replace Modified+Load with ReloadIfChanged #898

store: replace Modified+Load with ReloadIfChanged #898

Conversation

giuseppe commented May 12, 2021

rhatdan left a comment

Choose a reason for hiding this comment

nalind commented May 12, 2021

giuseppe commented May 13, 2021

giuseppe commented May 14, 2021

nalind commented May 14, 2021

giuseppe commented May 14, 2021

nalind commented May 14, 2021

vrothberg commented May 17, 2021