Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/datahub): Support postgres; build(postgres): Modernize postgres docker setup #8762

Merged
merged 4 commits into from
Sep 6, 2023

Conversation

asikowitz
Copy link
Collaborator

Ran postgres locally via:

datahub docker quickstart -f docker/docker-compose-without-neo4j.yml -f docker/docker-compose.postgres.override.yml --no-pull-images

I tried to model docker-compose.postgres.override.yml after docker-compose-without-neo4j.override.yml. Ideally we'd have compose files for with vs. without neo4j, mysql vs. postgres, m1 vs not, etc. Not trying to do that right now, so right now you can only run postgres without neo4j... but I think that's ok. I also moved it to the main directory, to make it more clear it's meant to replace one of the "override" files, and deleted it from the postgres/ directory since not sure what that file is doing otherwise.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata devops PR or Issue related to DataHub backend & deployment labels Aug 30, 2023
@@ -1,25 +1,7 @@
DATAHUB_UPGRADE_HISTORY_KAFKA_CONSUMER_GROUP_ID=generic-duhe-consumer-job-client-gms
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is now meant to be used in conjunction with docker.env, rather than instead of it

@@ -66,8 +66,6 @@ services:
dockerfile: docker/datahub-upgrade/Dockerfile
env_file: datahub-upgrade/env/docker-without-neo4j.env
depends_on:
mysql-setup:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only mention mysql in the .override files. This depends_on gets added by docker-compose-without-neo4j.override.yml

@@ -10,10 +10,10 @@ WORKDIR /go/src/github.com/jwilder/dockerize
RUN go install github.com/jwilder/dockerize@$DOCKERIZE_VERSION

FROM alpine:3
COPY --from=binary /go/bin/dockerize /usr/local/bin
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't change anything, just making this look more the like mysql-setup dockerfile

@@ -1,2 +1,3 @@
POSTGRES_USER: datahub
POSTGRES_PASSWORD: datahub
PGUSER: datahub
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For healthcheck, might be needed to make the default psql user datahub

Comment on lines +42 to +43
# Ensures stable order, chronological per (urn, aspect)
# Version 0 last, only when createdon is the same. Otherwise relies on createdon order
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres comments use a different syntax lol

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for quoting right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres comments use -- or /* */ while mysql supports those plus #. I moved the comments completely out of the query string so that we don't have to worry about it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these file names are getting out of hand lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeahh, this one is perhaps overkill but I wanted to make it clear it takes the place of an "override" file. Ideally, I think we want something like:
docker-compose.yaml
docker-compose.with-neo4j.yaml
docker-compose.postgres.yaml
docker-compose.mysql.yaml
docker-compose.mysql-m1.yaml
and you can just compose them for which combination you want. I didn't want to deal with that in this PR though, so went with clarity over conciseness.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike all these compose files as well. This is the same problem that developed from k8 manifest files. The solution to that is to use templating to render the output files given some configuration files, i.e. helm. I've also seen jsonnet be used to programmatically render templates into yaml/json/etc. Out of scope, but I'd like to see some thoughts around a method to specify options (perhaps similar to helm) and then the docker compose file is rendered as a single output to that configuration.

Comment on lines +42 to +43
# Ensures stable order, chronological per (urn, aspect)
# Version 0 last, only when createdon is the same. Otherwise relies on createdon order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for quoting right?

environment:
- DATAHUB_SERVER_TYPE=${DATAHUB_SERVER_TYPE:-quickstart}
- DATAHUB_TELEMETRY_ENABLED=${DATAHUB_TELEMETRY_ENABLED:-true}
depends_on:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these depends on and env rules can go in the base docker-compose file right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but I didn't want to make any unnecessary changes, especially here where idk what these environment variables are doing / how to test if they work. This is just from copying what's in docker-compose-without-neo4j.override.yaml since I want the postgres version to function like the mysql version, as much as possible

@asikowitz asikowitz requested a review from hsheth2 August 31, 2023 20:06
@asikowitz asikowitz merged commit ac025e5 into datahub-project:master Sep 6, 2023
60 checks passed
@asikowitz asikowitz deleted the datahub-source-postgres branch September 6, 2023 16:18
spadhi7 added a commit to spadhi7/datahub that referenced this pull request Oct 4, 2023
* tag 'v0.11.0': (188 commits)
  fix(spark-test): upgrade gradle and fix spark smoke test (datahub-project#8777)
  fix(gms): Fixed Recently Viewed section for users with '@' in the URN. (datahub-project#8754)
  feat: add feedback widget (datahub-project#8732)
  fix(custom-search): fix custom search to be able to use unquoted query (datahub-project#8805)
  docs(db-retention): update with default setting (datahub-project#8797)
  feat(openapi): entity endpoints & analytics raw (datahub-project#8537)
  feat(search): Also de-duplicate the field queries based on field names (datahub-project#8788)
  fix(ingest): drop `wrap_aspect_as_workunit` method (datahub-project#8766)
  feat(ingest): drop sql_metadata parser (datahub-project#8765)
  docs: minor fix on versioning navbar and dropdown (datahub-project#8790)
  chore(ingest): upgrade sqlglot fork (datahub-project#8775)
  docs: add datahub source to integrations page (datahub-project#8787)
  fix(ingest/bigquery): fix partition and median queries for profiling (datahub-project#8778)
  fix(ingest/tableau): fix tableau native CLL for snowflake, add type annotations (datahub-project#8779)
  refactor(ingest): Add support for group-owners in dataflow entities (datahub-project#8154)
  feat(systemMetadata): Adding a lastRunId field system metadata  (datahub-project#8672)
  feat(airflow-plugin): add package type information (datahub-project#8795)
  fix(ingest/datahub): Support postgres; build(postgres): Modernize postgres docker setup (datahub-project#8762)
  docs(session): add documentation for session token duration and fix default (datahub-project#8791)
  chore(analytics): bump version (datahub-project#8786)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants