Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.4-branch] Kubeflow Pipeline Does Not Work #66

Closed
AlexandreBrown opened this issue Jan 10, 2022 · 2 comments
Closed

[v1.4-branch] Kubeflow Pipeline Does Not Work #66

AlexandreBrown opened this issue Jan 10, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@AlexandreBrown
Copy link
Contributor

AlexandreBrown commented Jan 10, 2022

Describe the bug
I installed Kubeflow 1.4 using v1.4-branch RDS & S3 Setup.
I tried executing the sample run [Tutorial] V2 lightweight Python components but it fails.

See Kubeflow 1.4 Progress #27

Steps To Reproduce

  1. Install Kubeflow 1.4 by following the steps described here
  2. Run the sample pipeline using default values [Tutorial] V2 lightweight Python components

Expected behavior
The pipeline succeeds.

Environment

  • Kubeflow v1.4-branch
  • Kubernetes version 1.21
  • Using EKS (yes/no), if so version? yes, platform version eks.4
  • AWS service targeted (S3, RDS, etc.) S3, RDS

Screenshots
image

Additional context

  • Pipeline Setp Logs :
    logs.txt

  • kubectl get pods --all-namespaces Result :
    pods.txt

    • Notice that katib-mysql-7894994f88-tkngr is in CreateContainerConfigError
    • kubectl describe pod is showing the following error for this pod : Error: couldn't find key MYSQL_ROOT_PASSWORD in Secret kubeflow/katib-mysql-secrets
  • Some data seem to have been written to the bucket but the error logs suggest otherwise.
    image

My parms.env looks like this :

dbHost=insert_the_host.us-east-2.rds.amazonaws.com

bucketName=test-bucket-name-here
minioServiceHost=s3.amazonaws.com
minioServiceRegion=us-east-2

My secret.env is the default values.

Also it seems that even if I specify the pipeline-root in the pipeline UI, I get an error in the pipeline step logs MissingRegion: could not find region configuration

@AlexandreBrown AlexandreBrown added the bug Something isn't working label Jan 10, 2022
@surajkota
Copy link
Contributor

Hi @AlexandreBrown thanks for reporting this issue.
katib-my-sql pod is expected to be in the state you described above since we are using the RDS instance in AWS. The deployment is good if katib-db-manager pod is running. katib-my-sql pod should be cleaned up in future

(from the logs in screenshot) I wonder why the artifact output location is not an S3 URI. The link you pasted for the logs is broken - https://github.com/awslabs/kubeflow-manifests/files/7841025/logs

Can you confirm the following?

  1. the s3 bucket is in us-east-2
  2. mlpipeline-minio-artifact secret in kubeflow namespace has access key and secret key for an IAM user with S3 access?

If both of the above are true, can you try re-installing pipelines component to see if the error is reproduced?

kustomize build distributions/aws/apps/pipelines | kubectl delete -f -
kustomize build distributions/aws/apps/pipelines | kubectl apply -f -

@surajkota
Copy link
Contributor

Closing this in favour of upcoming master branch and since it has been tested by you in #78. Please reopen the issue if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants