Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run Yatai server in on-premise K8S #252

Closed
thechaos16 opened this issue Jun 8, 2022 · 6 comments
Closed

Failed to run Yatai server in on-premise K8S #252

thechaos16 opened this issue Jun 8, 2022 · 6 comments
Assignees

Comments

@thechaos16
Copy link

Hello, bentoML team.

I'm recently trying to use bentoML and Yatai on our on-premise K8S cluster, but somehow it failed because we don't have LB service on our cluster. Is there any guide or workarounds to deploy Yatai on non-cloud K8S?

Thank you.

Followings are a few error messages.

The error appears when I tried to push bento to yatai (yatai login is succeeded)
스크린샷 2022-06-07 오후 3 33 16

And I found that bentoml push queries to the pods naemd deployment-yatai-deployment-comp-operator under yatai-operator namespace, and it shows following error, and it shows there's no externalIP in yatai-ingress-controller-ingress-nginx-controller

2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	getting Deployment ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	Deployment getting successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	creating namespace yatai-components ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.318Z	INFO	controller-runtime.manager.controller.deployment	namespace yatai-components creation successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.322Z	INFO	controller-runtime.manager.controller.deployment	Installing CertManagerComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.322Z	INFO	controller-runtime.manager.controller.deployment	crd certificates.cert-manager.io already exists, so skipping install cert-manager	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.322Z	INFO	controller-runtime.manager.controller.deployment	Installed CertManagerComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.325Z	INFO	controller-runtime.manager.controller.deployment	Installing YataiDeploymentOperatorComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.326Z	INFO	controller-runtime.manager.controller.deployment	installing crd from file helm-charts/yatai-deployment-operator/crds/deployments.yaml ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.361Z	INFO	controller-runtime.manager.controller.deployment	crd bentodeployments.serving.yatai.ai updated successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.361Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.368Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai, status: deployed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.369Z	INFO	controller-runtime.manager.controller.deployment	Installed YataiDeploymentOperatorComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.373Z	INFO	controller-runtime.manager.controller.deployment	Installing CSIDriverImagePopulatorComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.373Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai-csi-driver-image-populator ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.376Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai-csi-driver-image-populator, status: deployed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.377Z	INFO	controller-runtime.manager.controller.deployment	Installed CSIDriverImagePopulatorComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.380Z	INFO	controller-runtime.manager.controller.deployment	Installing IngressControllerComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.382Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai-ingress-controller ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.390Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai-ingress-controller, status: failed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.393Z	INFO	controller-runtime.manager.controller.deployment	Installed IngressControllerComponent successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.396Z	INFO	controller-runtime.manager.controller.deployment	Installing MinioComponent ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.396Z	INFO	controller-runtime.manager.controller.deployment	installing crd from file helm-charts/minio-operator/crds/minio.min.io_tenants.yaml ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.627Z	INFO	controller-runtime.manager.controller.deployment	crd tenants.minio.min.io updated successfully	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.627Z	INFO	controller-runtime.manager.controller.deployment	getting helm release yatai-minio ...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.639Z	INFO	controller-runtime.manager.controller.deployment	found helm release yatai-minio, status: failed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.640Z	INFO	controller-runtime.manager.controller.deployment	getting ingress-controller service external ip...	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": ""}
2022-06-07T06:36:31.640Z	ERROR	controller-runtime.manager.controller.deployment	getting ingress-controller service external ip failed	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": "", "error": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!", "errorVerbose": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*IngressControllerComponent).getIngressControllerServiceIps\n\t/workspace/controllers/deployment_controller.go:294\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*MinioComponent).Install\n\t/workspace/controllers/deployment_controller.go:510\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile\n\t/workspace/controllers/deployment_controller.go:211\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile\n\t/workspace/controllers/deployment_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
github.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile
	/workspace/controllers/deployment_controller.go:211
github.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile
	/workspace/controllers/deployment_controller.go:126
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214
2022-06-07T06:36:31.641Z	ERROR	controller-runtime.manager.controller.deployment	Failed to install MinioComponent	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": "", "error": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!", "errorVerbose": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*IngressControllerComponent).getIngressControllerServiceIps\n\t/workspace/controllers/deployment_controller.go:294\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*MinioComponent).Install\n\t/workspace/controllers/deployment_controller.go:510\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile\n\t/workspace/controllers/deployment_controller.go:211\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile\n\t/workspace/controllers/deployment_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
github.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile
	/workspace/controllers/deployment_controller.go:126
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214
2022-06-07T06:36:31.649Z	ERROR	controller-runtime.manager.controller.deployment	Reconciler error	{"reconciler group": "component.yatai.ai", "reconciler kind": "Deployment", "name": "deployment", "namespace": "", "error": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!", "errorVerbose": "the external ip of service yatai-ingress-controller-ingress-nginx-controller on namespace yatai-components is empty!\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*IngressControllerComponent).getIngressControllerServiceIps\n\t/workspace/controllers/deployment_controller.go:294\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*MinioComponent).Install\n\t/workspace/controllers/deployment_controller.go:510\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).doReconcile\n\t/workspace/controllers/deployment_controller.go:211\ngithub.com/bentoml/yatai-deployment-comp-operator/controllers.(*DeploymentReconciler).Reconcile\n\t/workspace/controllers/deployment_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214
@timliubentoml
Copy link
Contributor

timliubentoml commented Jun 8, 2022

Hi @thechaos16! We don't see that case very often, but we've got a particular config that might help:

ingress:
  enabled: false

This will disable the creation of the ingress, which is for people who don't want to expose yatai with an external ip. Not sure all of your environment, but could you try that as a helm option?

@timliubentoml timliubentoml self-assigned this Jun 8, 2022
@yubozhao
Copy link
Contributor

yubozhao commented Jun 9, 2022

cc @yetone

@yetone
Copy link
Member

yetone commented Jun 10, 2022

@thechaos16 Thanks for your report! Yatai deployment operators always need a load balancer, another solution is not to use the built-in Minio, but to manually specify the s3 configuration.

https://github.com/bentoml/Yatai/blob/main/docs/admin-guide.md#aws-s3

@thechaos16
Copy link
Author

Thank you for the quick reply.

@timliubentoml, I've tried to disable ingress from https://github.com/bentoml/yatai-chart/blob/main/values.yaml#L91, but it still shows the same error. I guess updating helm chart of yatai-chart cannot control operators' setup.

@yetone, I passed external S3 info by filling https://github.com/bentoml/yatai-chart/blob/main/values.yaml#L50-L58 blocks, but it still fails. Could you let me know if there is another way to not use the built-in Minio? In my K8S dashboard, there are two pods (minio-operator and yatai-minio-console) running.

@artsparkAI
Copy link
Contributor

@thechaos16 There is an error in the docs that shows setting ENDPOINT as https://s3.amazonaws.com but you need to actually set it to s3.amazonaws.com

@MightyTedKim
Copy link

MightyTedKim commented Sep 23, 2022

for me this was resolved after i deleted the default postgres pvc.
the log comes out as no user postgres in yatai.

$ k logs pod/yatai-7f97bc87fb-qkc25  -n yatai-system
Error: migrate up db: cannot create migrate: pq: password authentication failed for user "postgres"

deleted the whole yatai, yatai postgresql

$ kubectl create secret generic yatai-postgresql  --from-literal=passwordExistingSecret=cqUIVv6S4q -n yatai-system

copied the initial secret and created a new postgresql secret.
when i put existing secret with the new secret, it logins in as charm

values.yaml
postgresql:
  enabled: true
  nameOverride: ""
  postgresqlUsername: postgres
  postgresqlDatabase: yatai
  ## In case of postgresql.enabled = true, allow the usage of existing secrets for postgresql
  ##
  existingSecret: yatai-postgresql #""

i managed to run it with values.yaml.
didnt work if i only change the values.yaml and updating it with argocd.

$ kubectl create secret generic yatai-ceph-secret --from-literal=accesskey=access-key --from-literal=secretkey=secret-key -n yatai-system
$ values.yaml
externalS3:
  enabled: true #false
  endpoint: '192.168.*.*9:300*1' #my ceph object storage endpoint(or minio)
  region: ''
  bucketName: 'hgkim'
  secure: false #true
  existingSecret: 'yatai-ceph-secret'
  existingSecretAccessKeyKey: 'accesskey' #'access_key'
  existingSecretSecretKeyKey: 'secretkey' #'secret_key'

after i do bentoml push
it shows on the ui and object storage under bentoml/default

 bentoml push iris_classifier:latest
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Successfully pushed model "iris_clf:h7hjmrr276ld23vw"                                                                                                                           │
│ Successfully pushed Bento "iris_classifier:khydmnr276cwg3vw"                                                                                                                    │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Pushing Bento "iris_classifier:khydmnr276cwg3vw" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 5.8/5.8 kB • ? • 0:00:00
     Uploading model "iris_clf:h7hjmrr276ld23vw" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2.0/2.0 kB • ? • 0:00:00

@yetone yetone closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants