Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard access to Target.{frames,cur} globally #1111

Merged
merged 4 commits into from
Apr 15, 2023
Merged

Guard access to Target.{frames,cur} globally #1111

merged 4 commits into from
Apr 15, 2023

Conversation

WalkerGriggs
Copy link
Contributor

@WalkerGriggs WalkerGriggs commented Jul 12, 2022

Some accesses to Target.frames and Target.cur are not guarded by Target.frameMu unintentionally. Which leads to data race.

Fixes: #1110
Fixes: #1296

@ZekeLu
Copy link
Member

ZekeLu commented Jul 13, 2022

@WalkerGriggs Good catch! Thank you very much!

How did you find the issue? Is it possible to add a regress test for this issue? It's good to have one. But it's also okay to submit the fix without a regress test if it's hard/impossible to write the test.

@ZekeLu
Copy link
Member

ZekeLu commented Jul 13, 2022

The failed check is due to #1105, we can ignore it here.

BTW, can you change the commit message to avoid concurrent map read in (*Target).domEvent?

@WalkerGriggs
Copy link
Contributor Author

WalkerGriggs commented Jul 13, 2022

@ZekeLu, we're not entirely sure. I can spend more time digging tomorrow if that's valuable information. My best guess is overlapping calls to pageEvent and domEvent, though they're executed synchronously so that wouldn't make much sense.

Spitballing here, but could two main event loops run in parallel if Run was called multiple times given the same target? Not sure how/if that's possible in our codebase, but just something that jumped to mind when scanning that event fn path.

WRT changing commit message: certainly can

@ZekeLu
Copy link
Member

ZekeLu commented Jul 13, 2022

I can spend more time digging tomorrow if that's valuable information.

Please do digging more so that we can understand the issue better. Can you run your app with the --race flag? It will give more details.

Spitballing here, but could two main event loops run in parallel if Run was called multiple times given the same target?

I'm not sure what will happen in this case, but I think we should avoid doing it. Is there a use case that you need to call Run in parallel?

@WalkerGriggs
Copy link
Contributor Author

Please do digging more so that we can understand the issue better.

Can do. This is part of a system that has been running for months (maybe years now), and this is the first occurrence. We've dumped the goroutines when it panicked, so maybe that'll reveal something.

Can you run your app with the --race flag? It will give more details.

Maybe, but unlikely.

I'm not sure what will happen in this case, but I think we should avoid doing it. Is there a use case that you need to call Run in parallel?

Oh, we avoid that; no use case here. Just throwing out some ideas, though probably not useful until we narrow things down.

@WalkerGriggs
Copy link
Contributor Author

@ZekeLu -- following up on this. We just ran into this same issue again. I would like to rebase to the latest master and merge this change in if possible.

@ZekeLu
Copy link
Member

ZekeLu commented Feb 23, 2023

Do you have more information to share this time?

fatal error: concurrent map read and map write

I just took a quick look. There are only 3 map writes:

chromedp/target.go

Lines 298 to 305 in 1738691

t.frameMu.Lock()
t.frames[e.Frame.ID] = e.Frame
if e.Frame.ParentID == "" {
// This frame is only the new top-level frame if it has
// no parent.
t.cur = e.Frame.ID
}
t.frameMu.Unlock()

chromedp/target.go

Lines 345 to 354 in 1738691

t.frameMu.Lock()
f := t.frames[id]
if f == nil {
// This can happen if a frame is attached or starts loading
// before it's ever navigated to. We won't have all the frame
// details just yet, but that's okay.
f = &cdp.Frame{ID: id}
t.frames[id] = f
}
t.frameMu.Unlock()

chromedp/chromedp.go

Lines 290 to 298 in 1738691

if !c.Target.isWorker {
tree, err := page.GetFrameTree().Do(cdp.WithExecutor(ctx, c.Target))
if err != nil {
return err
}
c.Target.frames[tree.Frame.ID] = tree.Frame
c.Target.cur = tree.Frame.ID
c.Target.documentUpdated(ctx)
}

The first 2 writes are in the pageEvent func. But pageEvent and domEvent won't run at the same time for the same target. So it's most likely that it is the write in newTarget to blame.

If this is the case, we should add a write lock here too. I think it's possible to write a test for this case. Would you like to give it a try?

BTW, the write in newTarget only happen when you use chromedp.WithTargetID. Have your used chromedp.WithTargetID in your project?

@WalkerGriggs
Copy link
Contributor Author

Do you have more information to share this time?

@ZekeLu, unfortunately not enough to confidently reproduce this specific error case in tests. The stack traces are thin at best, and these rare occurrences are causing issues in production.

Locking the frame read when there's the risk of concurrent access feels like a quick win here.

BTW, the write in newTarget only happen when you use chromedp.WithTargetID. Have your used chromedp.WithTargetID in your project?

We do use WithTargetID when we create a new context. This line is the top-most chromdp specific line in the stacktrace. I can probably put together the full trace dump, but might have to redact some chunks.

	tabCtx, _ := chromedp.NewContext(dpctx, chromedp.WithTargetID(newTarget.TargetID))

@ZekeLu ZekeLu mentioned this pull request Apr 13, 2023
Copy link
Member

@ZekeLu ZekeLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take the changes from #1297 too.

And if you don't mind, please fix other minor issues related frameMu listed below:

  1. it seems that a read lock is enough:

chromedp/chromedp.go

Lines 622 to 624 in 0b936bf

c.Target.frameMu.Lock()
frameID = c.Target.cur
c.Target.frameMu.Unlock()

  1. // frameMu protects frames, execContexts, and cur.
    // frameMu protects both frames and cur.

@WalkerGriggs
Copy link
Contributor Author

Rebased to master (0b936bf).

@ZekeLu making those changes now

@WalkerGriggs
Copy link
Contributor Author

@ZekeLu I noticed you've triggered the tests few times which all leak a similar temp dir.

I ran the tests locally with the ./contrib/docker-test.sh which all pass. Let me know if I can do anything to help validate the changes / get this PR over the line though -- happy to jump in

Copy link
Member

@ZekeLu ZekeLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's another known issue tracking by #1105. We can ignore the error here.

Thank you @WalkerGriggs, @xiusin and @alexandear!

@ZekeLu ZekeLu changed the title Wrap target frame map in domEvent with a read lock Guard access to Target.{frames,cur} globally Apr 15, 2023
@ZekeLu ZekeLu merged commit 4ea2300 into chromedp:master Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concurrent map writes Target.domEvent panics with a concurrent frame read/write
2 participants