Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata db is not populated after running a pipeline #110

Closed
Tracked by #27
AlexandreBrown opened this issue Mar 1, 2022 · 1 comment
Closed
Tracked by #27

Metadata db is not populated after running a pipeline #110

AlexandreBrown opened this issue Mar 1, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@AlexandreBrown
Copy link
Contributor

AlexandreBrown commented Mar 1, 2022

Describe the bug
When using Pipelines integration, if we specify a metadata db name that is different than the Kubeflow default of metadb then the metadata like artifacts types will still be pushed to a db named metadb.

Steps To Reproduce

  1. Specify a name different than metadb when setting up the rds db, let's use the db name kubeflow for this example
  2. Install Kubeflow (tested using main branch + manifest v1.4.1)
  3. Notice how the database kubeflow gets created correctly
    image
    image
  4. Create a pipeline run that pushes some artifacts
    Eg:
@component()
def test_comp(
    value: float,
    model: Output[Model],
    model_metrics: Output[Metrics]
):
    print(f"model_metrics.path {model_metrics.path}")
    print(f"model.path {model.path}")
        
    model_metrics.log_metric("test_loss", value)

    with open(model_metrics.path, 'w') as metrics_file:
        metrics_file.write(str(model_metrics.metadata))
    with open(model.path, 'w') as model_file:
        model_file.write("Some model data")
  1. Verify that the artifacts metadata was pushed to kubeflow db

  2. Notice how the db kubeflow does not have a new row in Artifacts
    image

  3. Verify the table Artifacts in metadb database instead
    image

  4. Notice how the metadata was pushed to the metadb database instead.

Expected behavior
I expected the metadata to be uploaded to the kubeflow db.

Environment

  • Kubernetes version 1.21
  • Using EKS (yes/no), if so version? yes eks.4
  • AWS service targeted (S3, RDS, etc.) RDS

Screenshots
image

Additional context
I could be wrong but I suspect that some changes are required to allow the renaming of the metadata db name.
If we take a look at upstream/apps/pipeline/upstream/base/metadata/base/metadata-grpc-deployment.yaml we see the following :

env:
- name: DBCONFIG_USER
  valueFrom:
    secretKeyRef:
      name: mysql-secret
      key: username
- name: DBCONFIG_PASSWORD
  valueFrom:
    secretKeyRef:
      name: mysql-secret
      key: password
- name: MYSQL_DATABASE
  valueFrom:
    configMapKeyRef:
      name: pipeline-install-config
      key: mlmdDb
- name: MYSQL_HOST
  valueFrom:
    configMapKeyRef:
      name: pipeline-install-config
      key: dbHost
- name: MYSQL_PORT
  valueFrom:
    configMapKeyRef:
      name: pipeline-install-config
      key: dbPort

So it looks like the MYSQL_DATABASE is retrieved from upstream/apps/pipeline/upstream/base/installs/generic/pipeline-install-config.yaml
From this config map we can see that the value is hard coded

mlmdDb: metadb

Maybe we'd need to add mlmdDb as a param in awsconfigs/apps/pipeline/params.env ?

@AlexandreBrown AlexandreBrown added the bug Something isn't working label Mar 1, 2022
@AlexandreBrown AlexandreBrown changed the title Pipeline metadata db is not populated when running a pipeline Custom metadata db name is not populated when running a pipeline Mar 1, 2022
@AlexandreBrown AlexandreBrown changed the title Custom metadata db name is not populated when running a pipeline metadata db name is not populated when running a pipeline Mar 1, 2022
@AlexandreBrown AlexandreBrown changed the title metadata db name is not populated when running a pipeline Metadata db name is not populated when running a pipeline Mar 1, 2022
@AlexandreBrown AlexandreBrown changed the title Metadata db name is not populated when running a pipeline Metadata db is not populated when running a pipeline Mar 1, 2022
@AlexandreBrown AlexandreBrown changed the title Metadata db is not populated when running a pipeline Metadata db is not populated after running a pipeline Mar 1, 2022
@AlexandreBrown
Copy link
Contributor Author

I can confirm that adding the mlmdDb param in the params.env fixes it, will make a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant