Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(datarace): Fix datarace issue in spanstat.go #11615

Merged
merged 1 commit into from Jun 3, 2020

Conversation

sayboras
Copy link
Member

As mentioned in #10991, successDuration was not safely access from
eventqueue

Add RWLock to make this struct thread-safe

Closes #10991

Signed-off-by: Tam Mach sayboras@yahoo.com

Fix datarace issue in spanstat.go

@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@joestringer
Copy link
Member

Hi @sayboras , do you need help or feedback on this issue/PR? It is currently marked as a release blocker for v1.8 so we will want to determine whether it's vital to merge for v1.8 and if so, help to get it merged.

@sayboras sayboras force-pushed the bugfix/span-unsafe branch 2 times, most recently from 267eb27 to 8aa4d50 Compare June 2, 2020 09:49
@sayboras
Copy link
Member Author

sayboras commented Jun 2, 2020

do you need help or feedback on this issue/PR? It is currently marked as a release blocker for v1.8 so we will want to determine whether it's vital to merge for v1.8 and if so, help to get it merged.

@joestringer hi, if you can help on this. I added lock for SpanStat struct, however, was having below copylocks issue.

To resolve this, I need to use pointer (e.g. lock *lock.RWMutex), but then the drawback is to have constructor to initialize values. This must be done for all instances of SpanStat struct.

Looking forward to hearing your input. Thanks

copylocks issue
$ golangci-lint run ./... --disable-all -E govet 
daemon/cmd/fqdn.go:408:164: copylocks: notifyOnDNSMsg passes lock by value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
func (d *Daemon) notifyOnDNSMsg(lookupTime time.Time, ep *endpoint.Endpoint, epIPPort string, serverAddr string, msg *dns.Msg, protocol string, allowed bool, stat dnsproxy.ProxyRequestContext) error {
                                                                                                                                                                   ^
pkg/endpoint/policy.go:215:33: copylocks: literal copies lock value from stats.waitingForIdentityCache: github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                "waitingForIdentityCache":    stats.waitingForIdentityCache,
                                              ^
pkg/endpoint/policy.go:216:33: copylocks: literal copies lock value from stats.waitingForPolicyRepository: github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                "waitingForPolicyRepository": stats.waitingForPolicyRepository,
                                              ^
pkg/endpoint/policy.go:217:33: copylocks: literal copies lock value from stats.policyCalculation: github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                "policyCalculation":          stats.policyCalculation,
                                              ^
pkg/fqdn/dnsproxy/proxy.go:373:77: copylocks: call of p.NotifyOnDNSMsg copies lock value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                p.NotifyOnDNSMsg(time.Now(), nil, epIPPort, "", request, protocol, false, stat)
                                                                                          ^
pkg/fqdn/dnsproxy/proxy.go:382:77: copylocks: call of p.NotifyOnDNSMsg copies lock value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                p.NotifyOnDNSMsg(time.Now(), nil, epIPPort, "", request, protocol, false, stat)
                                                                                          ^
pkg/fqdn/dnsproxy/proxy.go:394:90: copylocks: call of p.NotifyOnDNSMsg copies lock value: github.com/cilium/cilium/pkg/fqdn/dnsproxy.ProxyRequestContext contains github.com/cilium/cilium/pkg/spanstat.SpanStat contains github.com/cilium/cilium/pkg/lock.RWMutex (govet)
                p.NotifyOnDNSMsg(time.Now(), ep, epIPPort, targetServerAddr, request, protocol, false, stat)

@sayboras sayboras marked this pull request as ready for review June 2, 2020 10:13
@sayboras sayboras requested a review from a team June 2, 2020 10:13
@sayboras sayboras requested review from a team as code owners June 2, 2020 10:13
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small non-blocking nit

pkg/spanstat/spanstat_test.go Outdated Show resolved Hide resolved
@aanm aanm added kind/bug This is a bug in the Cilium logic. needs-backport/1.8 release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 2, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.0 Jun 2, 2020
@pchaigno
Copy link
Member

pchaigno commented Jun 2, 2020

test-me-please

@coveralls
Copy link

coveralls commented Jun 2, 2020

Coverage Status

Coverage increased (+0.02%) to 36.922% when pulling db6872a on sayboras:bugfix/span-unsafe into 3da5bcf on cilium:master.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, passing stats by reference everywhere makes sense and is a reasonable way to resolve the govet issue.

It'd be good to squash the two commits together just to ensure that we won't hit any issues in the build process in future in case we need to bisect the tree across this commit.

@@ -43,19 +45,30 @@ func Start() *SpanStat {

// Start starts a new span
func (s *SpanStat) Start() *SpanStat {
s.mutex.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, If you compose the SpanStat from lock.RWMutex directly (ie no field name), then these can be s.Lock() / defer s.Unlock() and you don't have to worry about whether to call it mutex or lock or so on :-)

FWIW we aren't consistently composing objects from locks this way across the codebase so I'm not fussed about which approach we take here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's good to know. Composing really looks code look nicer. To make it easier for other reviewers, I will just leave as it is now. Will do it in another PR if required.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayboras I think the Runtime-4.9 failure is relevant here:

# github.com/cilium/cilium/pkg/fqdn/dnsproxy [github.com/cilium/cilium/pkg/fqdn/dnsproxy.test]
pkg/fqdn/dnsproxy/proxy_test.go:202:3: cannot use func literal (type func(time.Time, *endpoint.Endpoint, string, string, *dns.Msg, string, bool, ProxyRequestContext) error) as type NotifyOnDNSMsgFunc in argument to StartDNSProxy

You can reproduce by running privileged unit tests locally.

@sayboras
Copy link
Member Author

sayboras commented Jun 2, 2020

I think the Runtime-4.9 failure is relevant here:

# github.com/cilium/cilium/pkg/fqdn/dnsproxy [github.com/cilium/cilium/pkg/fqdn/dnsproxy.test]
pkg/fqdn/dnsproxy/proxy_test.go:202:3: cannot use func literal (type func(time.Time, *endpoint.Endpoint, string, string, *dns.Msg, string, bool, ProxyRequestContext) error) as type NotifyOnDNSMsgFunc in argument to StartDNSProxy

You can reproduce by running privileged unit tests locally.

Thanks for pointing out, it's my silly mistake. I didn't enable custom build tag in my Goland settings. :)

As mentioned in cilium#10991, successDuration was not safely access from
eventqueue

Add RWLock to make this struct thread-safe
Using pointer for spanstat to avoid copylocks issue highlighted by govet

Closes cilium#10991

Signed-off-by: Tam Mach <sayboras@yahoo.com>
@joestringer
Copy link
Member

test-me-please

@sayboras
Copy link
Member Author

sayboras commented Jun 3, 2020

@joestringer @pchaigno seems like all required jobs finished succesfully now, thanks for your inputs, and do let me know if anything else is required.

@christarazi christarazi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 3, 2020
@borkmann borkmann merged commit 6bc5c74 into cilium:master Jun 3, 2020
1.8.0 automation moved this from In progress to Merged Jun 3, 2020
@sayboras sayboras deleted the bugfix/span-unsafe branch June 3, 2020 08:20
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.0 Jun 3, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.0 Jun 3, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.0 Jun 4, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.0 Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.8.0
  
Merged
1.8.0
Backport done to v1.8
Development

Successfully merging this pull request may close these issues.

DATA RACE: s.successDuration unsafely accessed
8 participants