Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Deadlock in TagValueIterator() and AddSeriesList() #26164

Open
line301 opened this issue Mar 19, 2025 · 2 comments
Open

Possible Deadlock in TagValueIterator() and AddSeriesList() #26164

line301 opened this issue Mar 19, 2025 · 2 comments
Assignees
Labels
1.x area/tsi area/2.x OSS 2.0 related issues and PRs

Comments

@line301
Copy link

line301 commented Mar 19, 2025

Issue Summary
I suspect a potential deadlock related to the TagValueIterator() function when interacting with AddSeriesList().

Possible Deadlock Scenario
The issue appears to arise due to conflicting RLock() and Lock() calls on f.mu within LogFile. Specifically:

  1. TagValueIterator() acquires an RLock() on f.mu.
  2. It then calls tk.TagValueIterator(), which attempts to acquire another RLock() on tk.f.mu (which is the same as f.mu).
  3. Meanwhile, AddSeriesList() is called and attempts to acquire a write Lock() on f.mu, while RLock() is still held.
  4. This can lead to a deadlock since Go’s sync.RWMutex does not allow acquiring a Lock() when an RLock() is already held.

Relevant Code
TagValueIterator() (log_file.go)

func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
    f.mu.RLock() // First RLock
    defer f.mu.RUnlock()
    
    mm, ok := f.mms[string(name)]
    if !ok {
        return nil
    }

    tk, ok := mm.tagSet[string(key)]
    if !ok {
        return nil
    }
    return tk.TagValueIterator() // Calls tk.TagValueIterator(), which also acquires RLock
}

tk.TagValueIterator() (log_file.go)

func (tk *logTagKey) TagValueIterator() TagValueIterator {
    tk.f.mu.RLock() // Second RLock (on the same f.mu)
    a := make([]logTagValue, 0, len(tk.tagValues))
    for _, v := range tk.tagValues {
        a = append(a, v)
    }
    tk.f.mu.RUnlock()

    return newLogTagValueIterator(a)
}

AddSeriesList() (log_file.go)

func (f *LogFile) AddSeriesList(...) {
    //..

    f.mu.Lock() // Write lock on f.mu
    defer f.mu.Unlock()

    //,,.
}

pprof Output When Deadlock Occurred

goroutine 106814401 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0xc00015020c?, 0x78?, 0x3?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).Lock(0xc023a48620?)
        /usr/local/go/src/sync/rwmutex.go:152 +0x71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).AddSeriesList(0xc0095a71d0, 0xc000150200, {0xc00863f800?, 0x13, 0x0?}, {0xc00863fb00?, 0x13, 0xc00e37daf8?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:545 +0x4a5
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).createSeriesListIfNotExists(0xc037ff10e0, {0xc00863f800, 0x13, 0x20}, {0xc00863fb00, 0x13, 0x20})
       influxdb-2.6.0/tsdb/index/tsi1/partition.go:725 +0x165
goroutine 106814631 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0x3318308?, 0x38?, 0xc?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
        /usr/local/go/src/sync/rwmutex.go:71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*logTagKey).TagValueIterator(0xc02a1a6fb8)
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:1385 +0x51
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).TagValueIterator(0xc0095a71d0?, {0xc04537e640?, 0xa?, 0x158ed72?}, {0xc03be04a20, 0x9, 0x28?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:432 +0x185

Currently, the only way to recover from this issue is to restart InfluxDB, which is problematic.

@devanbenz
Copy link

devanbenz commented Mar 20, 2025

I am unsure if there is a deadlock by upgrading the lock type of f.mu here. I don't see AddSeriesList called within the same scope as TagValueIterator. Shown by the following example https://goplay.tools/snippet/67evkRgdE9R the only way a deadlock occurs is when a.mu.RLock() is held and then a call in to a.two() is made. If the functions are called independently of each others scopes it does not deadlock.

Are you seeing:

fatal error: all goroutines are asleep - deadlock!

in your stack traces?

@devanbenz devanbenz self-assigned this Mar 20, 2025
@devanbenz devanbenz added area/tsi 1.x area/2.x OSS 2.0 related issues and PRs labels Mar 20, 2025
@line301
Copy link
Author

line301 commented Mar 20, 2025

I think the example at https://goplay.tools/snippet/DNGy9ENYYHj is more relevant. In this example, neither "Second RLock" nor "First Lock" is printed.

When this issue occurs, it seems that schema.measurementTagValues is called when a new series is added. In my running program, there was additional code that periodically checks tag values.

Could you check this condition?

To make it easier to reproduce the issue, I modified the Go code by adding a delay between the first and second RLock calls in TagValueIterator. Then, if one client inserts a new series while another client checks tag values, the issue occurs consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.x area/tsi area/2.x OSS 2.0 related issues and PRs
Projects
None yet
Development

No branches or pull requests

2 participants