Skip to content

DataHub v0.9.6

Compare
Choose a tag to compare
@maggiehays maggiehays released this 14 Jan 00:59
· 2638 commits to master since this release
5951379

⚠️ This Release has been patched. Please upgrade to 0.9.6.1 ⚠️

As of January 19th, 2023 0.9.6.1 is now the official release build, and should be used over 0.9.6. Upgrade to 0.9.6.1 when possible to avoid issues creating and using secrets.



Release Highlights

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set:
GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

User Experience

  • We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.

image

  • [Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity

image

  • Improved error messaging for bulk editing via the UI

Metadata Ingestion

  • Update to data profiling to allow configurable number of sample values to be returned
  • Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution!
  • Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution!
  • Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify!
  • Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!

Developer Experience

  • Fixes quickstart/docker compose issues for M1 machines
  • Improvements in reliability and performance of the Restli Service endpoints for ingestion:
    • Scale Restli Service thread pool based on CPU
    • Add retry (exp backoff) to Restli Entity Client
    • MCE no longer relies on GMS for Restli service
    • Converted Restli Service from standalone servlet to Spring injectable
    • Docker build externalized (significantly faster on m1, <7 minute build times, based on this)
    • Frontend asset generation refactor (causing tests to fail intermittently)

What's Changed

  • feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
  • chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
  • Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
  • refactor(restli-mce-consumer) by @david-leifker in #6744
  • fix(ci): reduce smoke test run time by @anshbansal in #6841
  • fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
  • feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
  • feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
  • refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
  • fix(kafka): datahub-upgrade job by @david-leifker in #6864
  • feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
  • chore(ingest): loosen requirements file by @hsheth2 in #6867
  • feat(ingest): upgrade pydantic version by @cccs-eric in #6858
  • fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
  • chore(ingest): loosen additional requirements by @hsheth2 in #6868
  • feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
  • docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
  • feat(CI): add venv caching by @szalai1 in #6843
  • feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
  • fix(runid): always update runid, except when queued by @david-leifker in #6876
  • fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
  • chore(ci): update dependencies docs-website by @anshbansal in #6871
  • feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
  • docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
  • test(mce-consumer): mockbeans by @david-leifker in #6878
  • feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
  • refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
  • refactor(smoke): use env variables by @anshbansal in #6866
  • fix(lint): pin pydantic version by @anshbansal in #6886
  • refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
  • fix(ingest): okta undefined variable error by @anshbansal in #6882
  • fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
  • fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
  • test(misc): misc test updates by @david-leifker in #6890
  • deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
  • chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
  • test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
  • fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
  • fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
  • fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
  • fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
  • feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
  • chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
  • docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
  • fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
  • fix(es-setup): create data stream on non-aws by @szalai1 in #6926
  • Adding missing Platform logos by @maggiehays in #6892
  • feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
  • Fix compose context for kafka-setup by @szalai1 in #6923
  • feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
  • chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
  • chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
  • fix(ci): managed ingestion test fix by @anshbansal in #6946
  • feat(ingest): add include_table_location_lineage flag for SQL common by @hsheth2 in #6934
  • feat(ingest): allow extracting snowflake tags by @frsann in #6500
  • chore(ingest): unpin pydantic dep by @hsheth2 in #6909
  • chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
  • fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
  • chore(ingest): remove inferred args to MCPW, part 2 by @hsheth2 in #6905
  • fix(ingest/unity): simplify MCP generation and reporting by @hsheth2 in #6911
  • chore(ci): parallelise build and test workflow to reduce time by @anshbansal in #6949
  • fix(frontend): sasl.client.callback.handler.class by @szalai1 in #6962
  • chore(react): remove outdated cypress tests and dependency by @anshbansal in #6948
  • fix(ci): restrict GE to fix build issues by @anshbansal in #6967
  • feat(queries): [Experimental] Allow customization of # of queries in Query tab via env var by @gabe-lyons in #6964
  • feat(ingest/postgres): emit lineage for postgres views by @LucasRoesler in #6953
  • feat(ingest/vertica): support projections and lineage in vertica by @vishalkSimplify in #6785
  • fix(ingest): add missing dep for powerbi by @hsheth2 in #6969
  • Docs fixes week of 12 22 by @laulpogan in #6963
  • fix(ingest): unfreeze bigquery/snowflake column dataclass by @mayurinehate in #6921
  • chore(frontend) Remove unused dependencies from package.json by @chriscollins3456 in #6974
  • chore: misc fixes by @anshbansal in #6966
  • feat(ingest/glue): emit s3 lineage for s3a and s3n schemes by @danielli-ziprecruiter in #6788
  • fix(kafka-setup): Make kafka-setup run with multiple threads by @pedro93 in #6970
  • feat(ingest): mark database_alias and env as deprecated by @hsheth2 in #6901
  • fix(docs): Updating Tag, Glossary Term docs to point to correct GraphQL methods by @jjoyce0510 in #6965
  • chore(deps): bump certifi from 2020.12.5 to 2022.12.7 in /metadata-ingestion/src/datahub/ingestion/source/feast_image by @dependabot in #6979
  • fix(ingest): profiling - Fixing issue with the wrong timestamp stored in check by @treff7es in #6978
  • config(quickstart): enable auto-reindex for quickstart by @david-leifker in #6983
  • feat(privileges) - Create a privilege to manage glossary children recursively by @mkamalas in #6731
  • chore(ingest): finish removing feast-legacy by @hsheth2 in #6985
  • feat(ingest): add import descriptions of two or more nested messages by @wngus606 in #6959
  • feat(docs) Add feature guide for Manual Lineage by @chriscollins3456 in #6933
  • docs(rfc): Serialising GMS Updates with Preconditions by @mattmatravers in #5818
  • fix(ingest/kafka-connect) support newer version of debezium by @jaegwonseo in #6943
  • fix(docs): build and broken snowflake docs fix by @anshbansal in #6997
  • fix(ingest): bigquery - views in case more than 1 datasets with views by @anshbansal in #6995
  • fix(docs): Renaming Business Glossary Doc by @jjoyce0510 in #7001
  • fix(ingest/snowflake): fix type annotations + refactor get_connect_args by @hsheth2 in #7004
  • fix(docs): Changing the platform event topic name in kafka custom topic docs by @blankon123 in #7007
  • fix(docs): fix name of privilege referenced in posts doc by @aditya-radhakrishnan in #7002
  • fix(SSO): Correctly redirect to originally requested URL in SSO by @jjoyce0510 in #7011
  • fix(ingest): remove dead code from tests by @hsheth2 in #7005
  • feat(ingestion): Tableau # Embed links by @mohdsiddique in #6994
  • feat(auth) Update auth cookies to have same-site none for chrome extension by @chriscollins3456 in #6976
  • docs(website): DPG WIP by @maggiehays in #6998
  • docs: resize datahub logo by @hsheth2 in #7014
  • fix(kafka-setup): Remove reference to non-existing topic by @pedro93 in #7019
  • fix(ingest): powerbi # use display name field as title for powerbi report page by @looppi in #7017
  • feat(auth) Allow session ttl to be configurable by env variable by @chriscollins3456 in #7022
  • fix(ui): URL Encode all Entity Profile URLs by @jjoyce0510 in #7023
  • fix(ui ingest): Fix test connection when stateful ingest is enabled by @jjoyce0510 in #7013
  • docs(sso) move root user warning to earlier in SSO guides by @maggiehays in #7028
  • fix(ingest/looker): add clarity in chart input parsing logs by @hsheth2 in #7003
  • chore(ingest): remove duplicate data_platform.json file by @hsheth2 in #7026
  • feat(ingestion): PowerBI # Remove corpUserInfo aspect ingestion by @mohdsiddique in #7034
  • fix(metadata-models): remove unnecessary bin folder by @jjoyce0510 in #7035
  • fixing typos by @maggiehays in #7030

New Contributors

Full Changelog: v0.9.5...v0.9.6

What's Changed

  • feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
  • chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
  • Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
  • refactor(restli-mce-consumer) by @david-leifker in #6744
  • fix(ci): reduce smoke test run time by @anshbansal in #6841
  • fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
  • feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
  • feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
  • refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
  • fix(kafka): datahub-upgrade job by @david-leifker in #6864
  • feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
  • chore(ingest): loosen requirements file by @hsheth2 in #6867
  • feat(ingest): upgrade pydantic version by @cccs-eric in #6858
  • fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
  • chore(ingest): loosen additional requirements by @hsheth2 in #6868
  • feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
  • docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
  • feat(CI): add venv caching by @szalai1 in #6843
  • feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
  • fix(runid): always update runid, except when queued by @david-leifker in #6876
  • fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
  • chore(ci): update dependencies docs-website by @anshbansal in #6871
  • feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
  • docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
  • test(mce-consumer): mockbeans by @david-leifker in #6878
  • feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
  • refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
  • refactor(smoke): use env variables by @anshbansal in #6866
  • fix(lint): pin pydantic version by @anshbansal in #6886
  • refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
  • fix(ingest): okta undefined variable error by @anshbansal in #6882
  • fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
  • fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
  • test(misc): misc test updates by @david-leifker in #6890
  • deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
  • chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
  • test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
  • fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
  • fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
  • fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
  • fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
  • feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
  • chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
  • docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
  • fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
  • fix(es-setup): create data stream on non-aws by @szalai1 in #6926
  • Adding missing Platform logos by @maggiehays in #6892
  • feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
  • Fix compose context for kafka-setup by @szalai1 in #6923
  • feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
  • chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
  • chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
  • fix(ci): managed ingestion test fix by @anshbansal in #6946
  • feat(ingest): add include_table_location_lineage flag for SQL common by @hsheth2 in #6934
  • feat(ingest): allow extracting snowflake tags by @frsann in #6500
  • chore(ingest): unpin pydantic dep by @hsheth2 in #6909
  • chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
  • fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
  • chore(ingest): remove inferred args to MCPW, part 2 by @hsheth2 in #6905
  • fix(ingest/unity): simplify MCP generation and reporting by @hsheth2 in #6911
  • chore(ci): parallelise build and test workflow to reduce time by @anshbansal in #6949
  • fix(frontend): sasl.client.callback.handler.class by @szalai1 in #6962
  • chore(react): remove outdated cypress tests and dependency by @anshbansal in #6948
  • fix(ci): restrict GE to fix build issues by @anshbansal in #6967
  • feat(queries): [Experimental] Allow customization of # of queries in Query tab via env var by @gabe-lyons in #6964
  • feat(ingest/postgres): emit lineage for postgres views by @LucasRoesler in #6953
  • feat(ingest/vertica): support projections and lineage in vertica by @vishalkSimplify in #6785
  • fix(ingest): add missing dep for powerbi by @hsheth2 in #6969
  • Docs fixes week of 12 22 by @laulpogan in #6963
  • fix(ingest): unfreeze bigquery/snowflake column dataclass by @mayurinehate in #6921
  • chore(frontend) Remove unused dependencies from package.json by @chriscollins3456 in #6974
  • chore: misc fixes by @anshbansal in #6966
  • feat(ingest/glue): emit s3 lineage for s3a and s3n schemes by @danielli-ziprecruiter in #6788
  • fix(kafka-setup): Make kafka-setup run with multiple threads by @pedro93 in #6970
  • feat(ingest): mark database_alias and env as deprecated by @hsheth2 in #6901
  • fix(docs): Updating Tag, Glossary Term docs to point to correct GraphQL methods by @jjoyce0510 in #6965
  • chore(deps): bump certifi from 2020.12.5 to 2022.12.7 in /metadata-ingestion/src/datahub/ingestion/source/feast_image by @dependabot in #6979
  • fix(ingest): profiling - Fixing issue with the wrong timestamp stored in check by @treff7es in #6978
  • config(quickstart): enable auto-reindex for quickstart by @david-leifker in #6983
  • feat(privileges) - Create a privilege to manage glossary children recursively by @mkamalas in #6731
  • chore(ingest): finish removing feast-legacy by @hsheth2 in #6985
  • feat(ingest): add import descriptions of two or more nested messages by @wngus606 in #6959
  • feat(docs) Add feature guide for Manual Lineage by @chriscollins3456 in #6933
  • docs(rfc): Serialising GMS Updates with Preconditions by @mattmatravers in #5818
  • fix(ingest/kafka-connect) support newer version of debezium by @jaegwonseo in #6943
  • fix(docs): build and broken snowflake docs fix by @anshbansal in #6997
  • fix(ingest): bigquery - views in case more than 1 datasets with views by @anshbansal in #6995
  • fix(docs): Renaming Business Glossary Doc by @jjoyce0510 in #7001
  • fix(ingest/snowflake): fix type annotations + refactor get_connect_args by @hsheth2 in #7004
  • fix(docs): Changing the platform event topic name in kafka custom topic docs by @blankon123 in #7007
  • fix(docs): fix name of privilege referenced in posts doc by @aditya-radhakrishnan in #7002
  • fix(SSO): Correctly redirect to originally requested URL in SSO by @jjoyce0510 in #7011
  • fix(ingest): remove dead code from tests by @hsheth2 in #7005
  • feat(ingestion): Tableau # Embed links by @mohdsiddique in #6994
  • feat(auth) Update auth cookies to have same-site none for chrome extension by @chriscollins3456 in #6976
  • docs(website): DPG WIP by @maggiehays in #6998
  • docs: resize datahub logo by @hsheth2 in #7014
  • fix(kafka-setup): Remove reference to non-existing topic by @pedro93 in #7019
  • fix(ingest): powerbi # use display name field as title for powerbi report page by @looppi in #7017
  • feat(auth) Allow session ttl to be configurable by env variable by @chriscollins3456 in #7022
  • fix(ui): URL Encode all Entity Profile URLs by @jjoyce0510 in #7023
  • fix(ui ingest): Fix test connection when stateful ingest is enabled by @jjoyce0510 in #7013
  • docs(sso) move root user warning to earlier in SSO guides by @maggiehays in #7028
  • fix(ingest/looker): add clarity in chart input parsing logs by @hsheth2 in #7003
  • chore(ingest): remove duplicate data_platform.json file by @hsheth2 in #7026
  • feat(ingestion): PowerBI # Remove corpUserInfo aspect ingestion by @mohdsiddique in #7034
  • fix(metadata-models): remove unnecessary bin folder by @jjoyce0510 in #7035
  • fixing typos by @maggiehays in #7030
  • feat(ingest): Ingest Previews for Looker Charts, Dashboards, and Explores by @jjoyce0510 in #6941
  • fix(graphql):fix issue: autorender aspect could not be displayed on t… by @yangjiandan in #6993
  • fix(config): adding quotes by @david-leifker in #7038
  • fix(config): adding quotes by @david-leifker in #7040
  • fix(ingest/bigquery): Turning some usage warning message to debug log as it caused confusion by @treff7es in #7024
  • feat(ingest/vertica): Adding Vertica as source in Datahub UI by @Rajasekhar-Vuppala in #7010
  • Removed a double set for two fields by @bda618 in #7037

New Contributors

Full Changelog: v0.9.5...v0.9.6