Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrent map writes in executor #2947

Merged

Conversation

ivan-valkov
Copy link
Contributor

What this PR does / why we need it:

We have a map in the executor package that can be written to concurrently. This leads to panics. To reproduce this on master you can run go test ./... -count=100 in the executor package. Example output:

fatal error: concurrent map writes
goroutine 6262 [running]:
runtime.throw(0x1772445, 0x15)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc000765ce0 sp=0xc000765cb0 pc=0x4397d2
runtime.mapassign_faststr(0x158d840, 0xc000960c30, 0x0, 0x0, 0x0)
        /usr/local/go/src/runtime/map_faststr.go:211 +0x3f1 fp=0xc000765d48 sp=0xc000765ce0 pc=0x416431
github.com/seldonio/seldon-core/executor/predictor.(*PredictorProcess).transformInput(0xc0006d45a0, 0xc0004c4000, 0x1940320, 0xc000416070, 0x0, 0x7f49c931ce98, 0xc0004c4000, 0x0)
        /home/ivan/code/seldon-core/executor/predictor/predictor_process.go:92 +0x3d3 fp=0xc000765df8 sp=0xc000765d48 pc=0x1457053
github.com/seldonio/seldon-core/executor/predictor.(*PredictorProcess).Predict(0xc0006d45a0, 0xc0004c4000, 0x1940320, 0xc000416070, 0x0, 0x0, 0x100000000000000, 0x0)
        /home/ivan/code/seldon-core/executor/predictor/predictor_process.go:332 +0x185 fp=0xc000765ea8 sp=0xc000765df8 pc=0x1459285
github.com/seldonio/seldon-core/executor/predictor.(*PredictorProcess).predictChildren.func1(0xc0006d45a0, 0xc0003fc4a0, 0xc0003fc7e0, 0x2, 0x2, 0xc000d26050, 0x0, 0x0, 0x0, 0x0, ...)
        /home/ivan/code/seldon-core/executor/predictor/predictor_process.go:214 +0xa5 fp=0xc000765f00 sp=0xc000765ea8 pc=0x1467ea5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000765f08 sp=0xc000765f00 pc=0x471821
created by github.com/seldonio/seldon-core/executor/predictor.(*PredictorProcess).predictChildren
        /home/ivan/code/seldon-core/executor/predictor/predictor_process.go:213 +0x33d

With this change we don't see this error anymore:

go test ./... -count=100
?       github.com/seldonio/seldon-core/executor/api    [no test files]
?       github.com/seldonio/seldon-core/executor/api/client     [no test files]
ok      github.com/seldonio/seldon-core/executor/api/grpc       0.021s
?       github.com/seldonio/seldon-core/executor/api/grpc/kfserving     [no test files]
?       github.com/seldonio/seldon-core/executor/api/grpc/kfserving/inference   [no test files]
ok      github.com/seldonio/seldon-core/executor/api/grpc/seldon        110.944s
?       github.com/seldonio/seldon-core/executor/api/grpc/seldon/proto  [no test files]
?       github.com/seldonio/seldon-core/executor/api/grpc/seldon/test   [no test files]
ok      github.com/seldonio/seldon-core/executor/api/grpc/tensorflow    0.032s
ok      github.com/seldonio/seldon-core/executor/api/kafka      0.019s
ok      github.com/seldonio/seldon-core/executor/api/metric     0.010s
ok      github.com/seldonio/seldon-core/executor/api/payload    0.019s
ok      github.com/seldonio/seldon-core/executor/api/rest       101.686s
?       github.com/seldonio/seldon-core/executor/api/test       [no test files]
ok      github.com/seldonio/seldon-core/executor/api/tracing    0.149s
ok      github.com/seldonio/seldon-core/executor/api/util       0.036s
?       github.com/seldonio/seldon-core/executor/cmd/executor   [no test files]
?       github.com/seldonio/seldon-core/executor/cmd/proxy      [no test files]
ok      github.com/seldonio/seldon-core/executor/k8s    0.035s
?       github.com/seldonio/seldon-core/executor/logger [no test files]
ok      github.com/seldonio/seldon-core/executor/predictor      3.227s
?       github.com/seldonio/seldon-core/executor/proto/tensorflow/serving       [no test files]

We should check for other possible maps that are written to concurrently and are not protected properly.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

no

@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ivan-valkov
You can assign the PR to them by writing /assign @ivan-valkov in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ivan-valkov ivan-valkov changed the base branch from master to v1.5.2-release February 15, 2021 11:30
@ivan-valkov ivan-valkov changed the base branch from v1.5.2-release to master February 15, 2021 11:30
@seldondev
Copy link
Collaborator

failed to trigger Pull Request pipeline

  • failed to create agent
  • failed to calculate in repo config
  • failed to load trigger config for repository SeldonIO/seldon-core for ref 642cd7d
  • failed to find any lighthouse configuration files in repo SeldonIO/seldon-core at sha 642cd7d
  • failed to process repo SeldonIO/seldon-core refref 642cd7d
  • failed to list files in directory /var/tmp/gitrepo059637228/.lighthouse
  • open /var/tmp/gitrepo059637228/.lighthouse
  • no such file or directory

@ivan-valkov ivan-valkov changed the title Fix concurrent map access in executor Fix concurrent map writes in executor Feb 16, 2021
@seldondev
Copy link
Collaborator

failed to trigger Pull Request pipeline

  • failed to create agent
  • failed to calculate in repo config
  • failed to load trigger config for repository SeldonIO/seldon-core for ref 642cd7d
  • failed to find any lighthouse configuration files in repo SeldonIO/seldon-core at sha 642cd7d
  • failed to process repo SeldonIO/seldon-core refref 642cd7d
  • failed to list files in directory /var/tmp/gitrepo059637228/.lighthouse
  • open /var/tmp/gitrepo059637228/.lighthouse
  • no such file or directory

@RafalSkolasinski
Copy link
Contributor

/test integration
/test notebooks

@ivan-valkov
Copy link
Contributor Author

@RafalSkolasinski I think we need this merged in master first, then I can rebase on top of this - #2954

@ivan-valkov
Copy link
Contributor Author

/test notebooks
/test integration

@ivan-valkov
Copy link
Contributor Author

/test notebooks

@ivan-valkov
Copy link
Contributor Author

/test integration

1 similar comment
@ivan-valkov
Copy link
Contributor Author

/test integration

@ivan-valkov
Copy link
Contributor Author

/test integration

@seldondev
Copy link
Collaborator

@ivan-valkov: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
integration ae5bd4f link /test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.

@ukclivecox ukclivecox merged commit 9abbfba into SeldonIO:master Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants