Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installing sdep straight after operator seems not fully reliable #669

Closed
ryandawsonuk opened this issue Jul 2, 2019 · 11 comments
Closed
Assignees
Projects
Milestone

Comments

@ryandawsonuk
Copy link
Contributor

We're (myself and @SachinVarghese ) are running

helm install --name seldon-core ../../helm-charts/seldon-core-operator/ --namespace seldon-system --set istio.gateway="kubeflow-gateway.kubeflow.svc.cluster.local" --set istio.enabled="true"

kubectl rollout status -n seldon-system statefulset/seldon-operator-controller-manager

sleep 5

helm install --name seldon-single-model ../../helm-charts/seldon-single-model/

Most of the time it works but occasionally the last line fails. When it does we have to delete both releases and run again. Not sure why. We need to recreate it. This is a note to return to in the future.

@ukclivecox
Copy link
Contributor

How does it fail? Any logs to add to this?

@ryandawsonuk
Copy link
Contributor Author

So it was

Error: release seldon-single-model failed: Internal error occurred: failed calling webhook "mutating-create-update-seldondeployment.seldon.io": Post https://webhook-server-service.seldon-system.svc:443/mutating-create-update-seldondeployment?timeout=30s: dial tcp 10.107.97.243:443: connect: connection refused

But turns out it has already been fixed by b1f09c1. I just had to update the fork I was running from.

FYI @SachinVarghese

@brunowego
Copy link
Contributor

@ryandawsonuk I get similar issue, any tip to solve this? Thanks.

Error from server (InternalError): error when creating "deployment.json": Internal error occurred: failed calling webhook "mutating-create-update-seldondeployment.seldon.io": Post https://webhook-server-service.seldon-system.svc:443/mutating-create-update-seldondeployment?timeout=30s: x509: certificate signed by unknown authority

@ukclivecox
Copy link
Contributor

Does the issue remain or can you eventually create deployments?
If it remains can you check the logs of the manager and check the Pod created is running ok?

@brunowego
Copy link
Contributor

@cliveseldon yep, the logs are tell me the issue:

$ kubectl logs -n seldon-system seldon-operator-controller-manager-0
{"level":"info","ts":1567514444.689117,"logger":"entrypoint","msg":"setting up client for manager"}
{"level":"info","ts":1567514444.7105503,"logger":"entrypoint","msg":"setting up manager"}
{"level":"info","ts":1567514444.9495192,"logger":"entrypoint","msg":"Registering Components."}
{"level":"info","ts":1567514444.9495943,"logger":"entrypoint","msg":"setting up scheme"}
{"level":"info","ts":1567514444.9499424,"logger":"entrypoint","msg":"setting up istio scheme"}
{"level":"info","ts":1567514444.9501355,"logger":"entrypoint","msg":"Setting up controller"}
{"level":"info","ts":1567514444.9512691,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"seldondeployment-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1567514444.952848,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"seldondeployment-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1567514444.9530406,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"seldondeployment-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1567514444.9532733,"logger":"entrypoint","msg":"setting up webhooks"}
{"level":"info","ts":1567514444.9534113,"logger":"entrypoint","msg":"Starting the Cmd."}
{"level":"info","ts":1567514445.055113,"logger":"kubebuilder.webhook","msg":"installing webhook configuration in cluster"}
{"level":"info","ts":1567514445.0553577,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"seldondeployment-controller"}
{"level":"info","ts":1567514445.16635,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"seldondeployment-controller","worker count":1}
2019/09/03 12:41:05 http: TLS handshake error from 10.244.2.78:60210: tls: first record does not look like a TLS handshake
2019/09/03 12:42:05 http: TLS handshake error from 10.244.2.78:60702: tls: first record does not look like a TLS handshake
2019/09/03 12:43:05 http: TLS handshake error from 10.244.2.78:32956: tls: first record does not look like a TLS handshake
2019/09/03 12:44:05 http: TLS handshake error from 10.244.2.78:33476: tls: first record does not look like a TLS handshake
2019/09/03 12:45:05 http: TLS handshake error from 10.244.2.78:33962: tls: first record does not look like a TLS handshake

@ukclivecox
Copy link
Contributor

These errors can be ignored: first record does not look like a TLS handshake

@brunowego
Copy link
Contributor

@cliveseldon I'm focusing in understand and solve this line below:

{"level":"info","ts":1567519576.9951558,"logger":"kubebuilder.admission.cert.writer","msg":"cert is invalid or expiring, regenerating a new one"}

@brunowego
Copy link
Contributor

brunowego commented Sep 3, 2019

I think this issue have relation with elastic/cloud-on-k8s#896 (comment):

$ kubectl get ValidatingWebhookConfiguration validating-webhook-configuration -o yaml
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
  creationTimestamp: "2019-09-03T14:13:10Z"
  generation: 1
  name: validating-webhook-configuration
  resourceVersion: "4761836"
  selfLink: /apis/admissionregistration.k8s.io/v1beta1/validatingwebhookconfigurations/validating-webhook-configuration
  uid: 04630e58-f83a-4912-8cfa-d3e0eba66fcd
webhooks:
- admissionReviewVersions:
  - v1beta1
  clientConfig:
    caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMwakNDQWJxZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFhTVJnd0ZnWURWUVFERXc5M1pXSm8KYjI5ckxXTmxjblF0WTJFd0hoY05NVGt3T1RBek1UUXhNekV3V2hjTk1qa3dPRE14TVRReE16RXdXakFhTVJndwpGZ1lEVlFRREV3OTNaV0pvYjI5ckxXTmxjblF0WTJFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3CmdnRUtBb0lCQVFDZlY1YjNxTTh2MHZoTFpQN2FIeXFRL3c0RFh0MXcvd091eEttNzlJUFB4OVhuRC9xcU55SnMKM3Jtb2NFSzVHdEM1S0dXb0FKR05USkxOMGp6cWNSNVl2Mzh0QmEvc0lsNGZiYlh5eE9BcmlvZ1UxaUdlTldGUwo5WFhWTWd6YkRrNWV2VGJ2Z2lvVE55SXBJRG9aaDZ2R2tnekI3bk9iQVJJSU1RcWtHdDQyZUxnMWFJeEFod3VCCmxBYkVUUmxyNGRmSy8raFliVEFsd0lwa3lkT05LZjQwOWN3U3h5NzhlWHo5eTBsRVVoT1grZVc2YkpoNFI1T3cKM3ArWndsVXQrc0ZyTGNWTDcrQTYwWGhJaVRNeU1BWVRMTG0wZ1JCSUhHcDZWbmdVWWFHTUdlMXgzQWZVck1JNApMekFLM0psZzZGR3o0bmZrQ25nbWVTbDh4UGN1YjhMcEFnTUJBQUdqSXpBaE1BNEdBMVVkRHdFQi93UUVBd0lDCnBEQVBCZ05WSFJNQkFmOEVCVEFEQVFIL01BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQkp6TzRiWDQ4QTRMdVMKQjdLWko4M3NWUmQ3cHFqKy9hSWdtbGNtSXpvYk1qd0hRaWN4Yk1hbWJWUVdvY0tCd1cwekYxMVdIQkZocEtzVwpmU2tCQUdTaFV2Slh4MXg3TFU5YWJLUVNkdUdLTkU4ZDhHeEYyTFZJdXEyekRLUDVjbk1PTERHMk9LU2hWR3ljClVwb1ZvOGxWbUpKeFV2V1lFd1JBYklxS3RmbVFDUnJpYnFxc2liY04xcFRCd3dMWWVaOUV5K1l5M1NydjUwMnYKQTRYN2hHMDFlcThRTTQ4TUNzQk1JQzl6YURZdHg5dHEvTFVlMVIvYUJCYjdEVFd1ZmlPcm8yNk9jdmZnS3ZQQwpZZzl1VU5pUjhzWEQyT25mYzdBZEhQcTFPYmt6OHAvdFV2U2F3RjNJYXh0TmIxK1RIKzkvb3JyMXE3bzJuZkNYCjRBaTZ2NHJiCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
    service:
      name: webhook-server-service
      namespace: seldon-system
      path: /validating-create-update-seldondeployment
      port: 443
  failurePolicy: Fail
  matchPolicy: Exact
  name: validating-create-update-seldondeployment.seldon.io
  namespaceSelector:
    matchExpressions:
    - key: control-plane
      operator: DoesNotExist
  objectSelector: {}
  rules:
  - apiGroups:
    - machinelearning.seldon.io
    apiVersions:
    - v1alpha2
    operations:
    - CREATE
    - UPDATE
    resources:
    - seldondeployments
    scope: '*'
  sideEffects: Unknown
  timeoutSeconds: 30

@ukclivecox
Copy link
Contributor

Could the solution be : elastic/cloud-on-k8s#896 (comment)

@ukclivecox ukclivecox reopened this Sep 3, 2019
@ukclivecox ukclivecox added this to To do in 0.4.1 via automation Sep 3, 2019
@ukclivecox ukclivecox added this to the 0.5.x milestone Sep 3, 2019
@ukclivecox ukclivecox self-assigned this Sep 5, 2019
@ukclivecox
Copy link
Contributor

@brunowego Can we close this if you solved your issue?

@ukclivecox
Copy link
Contributor

Closing. Please reopen if not solved.

0.4.1 automation moved this from To do to Done Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.4.1
  
Done
Development

No branches or pull requests

3 participants