Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrades briefly go to a Failed state before Available but work the whole time #2044

Closed
ryandawsonuk opened this issue Jun 29, 2020 · 1 comment · Fixed by #2368
Closed
Assignees
Milestone

Comments

@ryandawsonuk
Copy link
Contributor

To test this first apply https://github.com/SeldonIO/seldon-core/blob/59b0d17348393ddd196324a156ac7157de242a52/testing/resources/graph8.json

Watch the status in another window kubectl get sdep mymodel -o=jsonpath='{.status.state}' -w

And in another window follow the seldon-controller-manager logs.

Then apply the same manifest with a trivial change e.g. double the resources.requests.memory.

The status will go through a cycle like CreatingAvailableAvailableAvailableCreatingCreatingAvailableFailedAvailable. So it very briefly goes to Failed.

The logs will reveal the below (although it happens quite a long time before the status goes through Failed):

2020-06-29T12:51:43.023Z        DEBUG   controller-runtime.manager.events       Warning {"object": {"kind":"SeldonDeployment","namespace":"default","name":"mymodel","uid":"38ef1a93-9609-46e7-ba05-58929da94160","apiVersion":"machinelearning.seldon.io/v1","resourceVersion":"45937648"}, "reason": "InternalError", "message": "Operation cannot be fulfilled on deployments.apps \"mymodel-mymodel-0-complex-model\": the object has been modified; please apply your changes to the latest version and try again"}
2020-06-29T12:51:43.028Z        ERROR   controllers.SeldonDeployment    Failed to update InferenceService status        {"SeldonDeployment": "default/mymodel", "error": "Operation cannot be fulfilled on seldondeployments.machinelearning.seldon.io \"mymodel\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/seldonio/seldon-core/operator/controllers.(*SeldonDeploymentReconciler).updateStatusForError
        /workspace/controllers/seldondeployment_controller.go:1525
github.com/seldonio/seldon-core/operator/controllers.(*SeldonDeploymentReconciler).Reconcile
        /workspace/controllers/seldondeployment_controller.go:1483
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:88
2020-06-29T12:51:43.028Z        ERROR   controller-runtime.controller   Reconciler error        {"controller": "seldon-controller-manager", "request": "default/mymodel", "error": "Operation cannot be fulfilled on deployments.apps \"mymodel-mymodel-0-complex-model\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/apimachinery@v0.17.2/pkg/util/wait/wait.go:88
@ryandawsonuk ryandawsonuk added bug triage Needs to be triaged and prioritised accordingly labels Jun 29, 2020
@axsaucedo axsaucedo modified the milestones: 1.2, 1.3 Jun 30, 2020
@ukclivecox
Copy link
Contributor

Maybe connected to #2095

@ukclivecox ukclivecox added priority/p1 and removed triage Needs to be triaged and prioritised accordingly labels Jul 9, 2020
@ukclivecox ukclivecox self-assigned this Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants