Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datahub-ingestion): refactor datahub ingestion slim images #8515

Merged
merged 33 commits into from
Aug 17, 2023

Conversation

david-leifker
Copy link
Collaborator

@david-leifker david-leifker commented Jul 27, 2023

  • Split datahub-ingestion in order to reduce image sizes:
    base -----> base slim --> slim
         \-----> base full  --> full

base: Includes base-requirements.txt python modules

base full: adds java, oracle client

base slim: no additional components

slim: selected base datahub components only:

  • datahub-rest
  • datahub-kafka
  • snowflake
  • bigquery
  • redshift, mysql, postgres
  • hive, glue
  • clickhouse
  • dbt
  • looker, lookml, tableau, powerbi, superset
  • datahub-business-glossary

full: all datahub components

General Changes

  • Re-enable managed_ingestion cypress test (depends on actions)
  • Use head image for tests, if no changes to datahub-ingestion-*
  • Build schemas required for datahub-ingestion module outside of docker (removes jdk build step)
  • Pull datahub-ingestion workflow into unified
  • Prevent build directories and hidden files from being copied into datahub-ingestion images
  • Added gradle task quickstartSlim to test running smoke tests on the datahub-ingestion-slim image
  • Swapped jre for jdk in datahub-ingestion image
  • Added new entry point for datahub-ingestion to support new env variables ACTIONS_CONFIG & ACTIONS_EXTRA_PACKAGES

Reduction in image size increases available disk space for the smoke-tests:

Run df -h . && docker images
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   72G   12G  86% /
REPOSITORY                             TAG                                IMAGE ID       CREATED          SIZE
5bedb4                                 45c7c3a35dc8407daa5d19641147c3a0   034286ab64f5   11 minutes ago   10.5MB
linkedin/datahub-frontend-react        pr8515                             479aa2df46fc   19 minutes ago   738MB
acryldata/datahub-ingestion            pr8515-slim                        21e494fd2c73   [24](https://github.com/datahub-project/datahub/actions/runs/5842044191/job/15843021190#step:24:25) minutes ago   2.77GB
linkedin/datahub-gms                   pr8515                             7e69e9a6436d   30 minutes ago   407MB
linkedin/datahub-mce-consumer          pr8515                             0675ca4e1988   31 minutes ago   411MB
linkedin/datahub-mae-consumer          pr8515                             487d37deb165   32 minutes ago   409MB
acryldata/datahub-upgrade              pr8515                             7b0fe7d62d2d   35 minutes ago   474MB
linkedin/datahub-elasticsearch-setup   pr8515                             00541aff4013   2 days ago       [25](https://github.com/datahub-project/datahub/actions/runs/5842044191/job/15843021190#step:24:26).3MB
acryldata/datahub-mysql-setup          pr8515                             c7aed8fb4c16   2 days ago       97MB
linkedin/datahub-kafka-setup           pr8515                             f473ec8cddf7   2 days ago       747MB
mysql                                  5.7                                920[34](https://github.com/datahub-project/datahub/actions/runs/5842044191/job/15843021190#step:24:35)fe9a41f   9 days ago       581MB

Screenshot 2023-08-12 at 11 34 56 AM

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added docs Issues and Improvements to docs product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Jul 27, 2023
adjust docker byPath to byName
remove duplicate v in dockerClean
set build-arg after prune
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@david-leifker david-leifker changed the title feat(datahub-ingestion): refactor slim image to add vs remove pyspark feat(datahub-ingestion): refactor datahub ingestion slim images Aug 15, 2023
docker/datahub-ingestion-base/Dockerfile Outdated Show resolved Hide resolved
docker/datahub-ingestion-base/Dockerfile Outdated Show resolved Hide resolved
docker/datahub-ingestion-base/Dockerfile Show resolved Hide resolved
docker/datahub-ingestion-base/Dockerfile Show resolved Hide resolved
docker/datahub-ingestion-base/Dockerfile Outdated Show resolved Hide resolved
docs/cli.md Outdated Show resolved Hide resolved
docs/cli.md Outdated Show resolved Hide resolved
@anshbansal anshbansal merged commit 40caf35 into master Aug 17, 2023
56 checks passed
@anshbansal anshbansal deleted the refactor-actions-slim branch August 17, 2023 05:10
hsheth2 added a commit to hsheth2/datahub that referenced this pull request Aug 17, 2023
neojunjie added a commit to aqden/datahub that referenced this pull request Oct 2, 2023
feat(datahub-ingestion): refactor datahub ingestion slim images (datahub-project#8515)
neojunjie added a commit to aqden/datahub that referenced this pull request Oct 2, 2023
neojunjie added a commit to aqden/datahub that referenced this pull request Oct 2, 2023
Revert "feat(datahub-ingestion): refactor datahub ingestion slim images (datahub-project#8515)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment docs Issues and Improvements to docs product PR or Issue related to the DataHub UI/UX release-0.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants