Skip to content

Releases: datahub-project/datahub

DataHub v0.9.1

01 Nov 15:34
4b31204
Compare
Choose a tag to compare

Release Highlights

Known Issues

  • In embedded search experiences (Glossary Terms, Domains, Lineage), filters can become "locked" in place once selected. This is addressed in v0.9.2

User Experience

  • Column-level Impact Analysis is here! You can now see the full end-to-end list of column dependencies; watch the demo here

  • When creating a Glossary Term from the UI, you can now add the description in the same step

  • We now support adding Domains to Glossary Terms

  • You can now preview Entity Names and Types in browser tabs

  • Login with SSO button on the login page.

Bug Fixes

  • Assertions Tab functionality is restored
  • SSO: Continuous login loop bug reported when the session cookie size exceed 4096 characters has been address.
  • Ingestion scheduler for > 30 ingestion sources is now fixed. Previously there was a bug causing certain ingestion to become unscheduled.

Metadata Ingestion

  • New Ingestion Source: Databricks Unity Catalog - check out the docs here
  • Tableau: Column-level lineage and Stateful Ingestion are now supported
  • LookML: Improved column-level lineage
  • BigQuery: we have promoted bigqery-beta to bigquery
  • Snowflake: Stateful Ingestion now supports deleting Containers

DataHub Docs Site

We continue to push improved feature guides to the DataHub docs site, including:

What's Changed

Read more

DataHub v0.9.0

13 Oct 11:26
0427122
Compare
Choose a tag to compare

Release Highlights

Known Issues

Assertions Tab UX bug

This release introduced a bug in the assertions tab causing assertion results to be hidden. This will be addressed in the subsequent release.

Release Notes

We’re excited to announce the release of DataHub v0.9.0!

This minor release includes an upgrade to Java 11 and surfacing Column-Level Lineage support within the DataHub UI.

Here are some additional highlights:

User Experience

  • Column-Level Lineage is now surfaced within the DataHub UI!
  • Advanced Search now supports searching by Column-level details (i.e. name, description, tag, etc.), as well as complex AND/OR statements. For example:
    • Show results that match any filters
    • Show results that match all filters
    • Owner is either of Shannon or Mark
    • Oner is not Shannon nor Mark
    • Try it in demo here
  • You can now add invite users and assign them to a default DataHub Role
  • Improvements to site performance during the Browse experience

Developer Experience

  • DataHub has been upgraded to Java 11!
  • Improved tracking of GraphQL errors for bug resolution
  • CorpUser and CorpGroup are now available via the Python SDK

Metadata Ingestion

  • Automatically extract Column-Level Lineage from Snowflake & Looker sources
  • dbt Meta Mapping is now supported at the Column Level - this means you can automatically extract Tags and Glossary Terms from your dbt model and surface them in DataHub

What's Changed

  • fix(ingest): bigquery-beta - Getting datasets with biquery client by @treff7es in #6039
  • feat(roles): add ability to invite users into a role by @aditya-radhakrishnan in #6015
  • refactor(java11) - convert most modules to java 11 by @leifker in #5836
  • docs(readme): Fixing broken article link by @davrax in #6042
  • refactor(ingest): streamline pydantic configs by @hsheth2 in #6011
  • docs(ingest): add example of dbt column_meta_mapping by @hsheth2 in #6038
  • refactor(ingest): use aspect map in transformers by @hsheth2 in #6040
  • feat(ui): Adding placeholder entity for DataPlatform by @jjoyce0510 in #6045
  • feat(ingest): implement compression for CheckpointState by @alexey-kravtsov in #6007
  • feat(advanced-search): adding select value modal by @gabe-lyons in #6026
  • fix(ingest): bigquery-beta - Additional fixes for Bigquery beta by @treff7es in #6051
  • feat(advanced search): adding advanced search filter component & prereqs for it by @gabe-lyons in #6055
  • docs(ingest): add path spec examples for s3 by @mayurinehate in #6050
  • fix(deps): metadata-io - remove parquet dependency by @shirshanka in #6046
  • fix(ingestion): Tableau test case execution fix by @mohdsiddique in #6005
  • feat(ingest): list referenced env variables in recipe by @hsheth2 in #6043
  • fix(ingest): compat with mypy 0.981 by @hsheth2 in #6056
  • fix(elasticsearch_index): create datahub_usage_event index where datahub_analytics_enabled set to false by @GyuhoonK in #5974
  • docs(approval workflows): adding approval workflow docs by @gabe-lyons in #5896
  • feat(retention): disable applying retention on bootstrap by @anshbansal in #6066
  • fix(ingest): correct tableau browse paths by @hsheth2 in #6064
  • fix(ingest): bigquery-beta - handling complex types properly by @treff7es in #6062
  • docs: create SECURITY.md by @laulpogan in #6069
  • fix(containers): show soft deleted status of containers by @gabe-lyons in #6072
  • docs(ingest): clarify bigquery-beta multiproject setup by @hsheth2 in #6071
  • chore(setup): change defaults for partitions by @anshbansal in #6074
  • refactor(browse): Improving Browse Feature Performance by @jjoyce0510 in #6073
  • feat(ingest): add column-level lineage support for snowflake by @mayurinehate in #6034
  • feat(ingest): looker - support for simple column level lineage by @shirshanka in #6084
  • fix(elastic-setup) Fixing env var logic by @pedro93 in #6079
  • Revert "chore(setup): change defaults for partitions (#6074)" by @pedro93 in #6086
  • fix(mae-consumer): fix regression on base64 encoding by @codesorcery in #6061
  • fix(elasticsearch) Analytics indices creation on AWS ES by @tomas-kubin in #5502
  • docs(ingest): note that Athena doesn't support lineage by @hsheth2 in #6081
  • fix(ingest): alias for mssql-odbc source by @hsheth2 in #6080
  • fix(ingest): presto-on-hive - Setting display name properly by @treff7es in #6065
  • fix(schema filter): fix schema infinite rerender by @gabe-lyons in #6082
  • feat(monitoring): track graphql errors in metrics by @szalai1 in #6087
  • feat(advanced search): Add component to show all advanced search filters & add new filter by @gabe-lyons in #6058
  • fix(ingest): bump lkml version by @hsheth2 in #6091
  • fix(ingest): lookml - extract column correctly by @shirshanka in #6093
  • feat(retention): change default policy, add API to apply retention by @anshbansal in #6088
  • fix(lineage): fix missed casing in lineage registry by @gabe-lyons in #6078
  • fix(ingest): bigquery-beta - Lowering a bit memory footprint of bigquery usage by @treff7es in #6095
  • feat(ingest): remove hardcoded env variable default for cli version by @shirshanka in #6075
  • docs: add information about mapping ports for datahub-gms by @shirshanka in #6092
  • chore(deps): upgrade graphql-java deps to 19.0 by @shirshanka in #6099
  • chore(deps): upgrade neo4j to 4.4.x by @shirshanka in #6101
  • feat(docs): Improve documentation about Search by @szalai1 in #5889
  • feat(ingest): add async option to ingest proposal endpoint by @RyanHolstien in #6097
  • chore(deps): upgrade opentelemetry dependencies by @shirshanka in #6100
  • refactor(recommendations): Bump default max recommendations count for Platforms by @jjoyce0510 in #6113
  • feat(ingest): add Sandbox support by @rgudic in #6105
  • fix(mae): use JAVA_TOOL_OPTIONS instead of JDK_JAVA_OPTIONS by @szalai1 in #6114
  • feat(advanced-search): Complete Advanced Search: backend changes & tying UI together by @gabe-lyons in #6068
  • feat(search): improved search snippet FE logic by @gabe-lyons in #6109
  • feat(ingest): add CorpUser and CorpGroup to the Python SDK by @ttaubermarshall-stripe in #5930
  • fix(ingest): hide deprecated path_spec option from config by @hsheth2 in #5944
  • feat(posts): add posts feature to DataHub by @aditya-radhakrishnan in #6110
  • fix(ingest): remove unused mysql golden file by @hsheth2 in #6106
  • fix(ingestion): fix percent change computation in stale_entity_removal by @rslanka in #6121
  • refactor(ingest): use pydantic utilities for NamingPattern by @hsheth2 in #6013
  • fix(ingest): presto-on-hive - not failing on Hive type parsing error by @treff7es in #6118
  • fix(ingest): ignore usage and operation for snowflake datasets withou… by @mayurinehate in https://github.com...
Read more

DataHub v0.8.45

23 Sep 22:26
af6a423
Compare
Choose a tag to compare

Release Highlights

User Experience

  • Allow Term Groups to be the target of permissions
  • Customize browser favicon via REACT_APP_FAVICON_URL param
  • Some UX improvements for charts & dashboards entity pages to reduce confusion
  • Performance improvements on the lineage visualization
  • Search bar for dataset schema tab

Developer Experience

  • Add rest endpoint for restoring indices of a single entity (/aspects?action=restoreIndices)
  • Create new platform instances via CLI
  • Improved impact analysis performance due to an added caching layer
  • Support for Patch as seen in August 2022 town hall.

Metadata Ingestion

  • Introduces bigquery-beta source
  • Looker source memory usage dramatically reduced
  • Report memory usage during ingestion
  • Improve Tableau lineage
  • Usage statistics for Tableau
  • LookML can automatically clone your Git repository. LookML is now supported in UI-based ingestion.
  • dbt supports column-level meta mappings
  • Support for deletion & rollback of time series data
  • Upgrade to browse path forms

[see next page for list of commits]

What's Changed

  • fix(privileges) Add Term Groups as targetable entities for privileges by @chriscollins3456 in #5806
  • fix(javadocs): remove ampersand from pdl causing issue in doc generation for openapi by @RyanHolstien in #5808
  • chore(ingest): remove archived docs by @hsheth2 in #5793
  • feat(ingest): add rewrite option for metadata file check by @hsheth2 in #5763
  • feat(cli): add support for sampled reporting to keep logs manageable by @shirshanka in #5800
  • docs(refactor): Refactor Tags Feature Guide by @maggiehays in #5781
  • docs(feature-guide) Impact Analysis by @maggiehays in #5765
  • feat(theming): set custom favicon via env var by @gabe-lyons in #5810
  • test(smoke-test): check debug arg in executor requests by @hsheth2 in #5811
  • fix(ingest): bigquery-beta - Fixing dependencies by @treff7es in #5814
  • feat(ingest): looker - reduce memory requirements by @shirshanka in #5815
  • feat(restore-indices): add endpoint for restore indices, add basic check for graph by @anshbansal in #5805
  • fix(frontend): download node only when USE_SYSTEM_NODE is set to false by @szalai1 in #5817
  • doc: Make Airflow link clickable by @daha in #5803
  • feat(ingest):looker - reduce mem usage, misc reporting improvements by @shirshanka in #5823
  • feat(model, ingest): populate sizeInBytes in snowflake, fall back to table level profiling for large tables by @mayurinehate in #5774
  • chore(docker): make curl/wget commands quiet in docker by @hsheth2 in #5819
  • chore: cleanup references to the old ember app by @hsheth2 in #5797
  • fix(ingest): spark-lineage: Adding additional debug logs to spark lineage by @treff7es in #5772
  • fix(docker): add missing port mappings for non-neo4j quickstart by @hsheth2 in #5799
  • fix(ingest): looker - report dashboard scanning correctly by @shirshanka in #5829
  • feat(cli): report memory usage during ingest by @shirshanka in #5828
  • fix(ingest): presto-on-hive - Fixing mysql filter by @treff7es in #5825
  • docs(big query): add needed delete permission to list by @maaaikoool in #5826
  • chore(ingest): set isort combine_as_imports by @hsheth2 in #5820
  • fix(ingest): use AwsConnectionConfig instead of AwsSourceConfig by @hsheth2 in #5813
  • feat(ingest): looker test connection by @hsheth2 in #5768
  • feat(ingest): improve tableau lineage, workbooks query, fix pagination by @mayurinehate in #5756
  • fix(ingest): profiling - memory usage reduction by @shirshanka in #5830
  • feat(monitoring): enable JMX and OTEL for frontend pods by @szalai1 in #5834
  • fix(standalone-consumers): Exclude Solr from spring boot application config & make them run on M1 by @pedro93 in #5827
  • feat(hooks): Add toggle for enabling/disabling platform event hook by @pedro93 in #5840
  • feat(transformers): Add semantics & transform_aspect support in transformers by @mohdsiddique in #5514
  • feat(ci): auto label PRs by @anshbansal in #5839
  • feat(inputs): improving clarity on inputs for dashboards by @gabe-lyons in #5841
  • feat(ingest): add utility for converting MCEs to MCPs by @hsheth2 in #5812
  • chore(smoke): add additional log in smoke test by @hsheth2 in #5842
  • fix(ingest): fix doc generation import ordering issue with postgres by @hsheth2 in #5846
  • feat(docker) Adds Sasl support to base ingestion image by @pedro93 in #5855
  • fix(graphql) Fix null pointer exception when fetching entity aspect via graphql by @chriscollins3456 in #5857
  • fix(ingest): reporting should work with timestamps by @shirshanka in #5860
  • fix(patch-entity-registry): Remove exception for entities with key aspects. by @pghazanfari in #5831
  • fix(browse): Fixing browse path to remove requirement for simple name suffix by @jjoyce0510 in #5634
  • fix(ingest): bigquery - Fixing sharded regexp pattern config by @treff7es in #5861
  • perf(elastic search graph service): improving perf of lineage query by @gabe-lyons in #5858
  • chore(ingest): remove outdated GE compatibility hack by @hsheth2 in #5862
  • ci(ingest): test with python 3.10 by @hsheth2 in #5863
  • docs: improve doc generation, add better docs for snowflake, looker by @shirshanka in #5867
  • feat(ci): tweak auto-label globs by @anshbansal in #5849
  • fix(m1): preflight works with brew postgres@14 by @shirshanka in #5868
  • feat(smoke-tests) Make smoke tests use standalone consumers by @pedro93 in #5856
  • fix(domains): adding 10,000+ text when domain list caps out elastic count capacity by @gabe-lyons in #5838
  • docs(notifications): slack notification docs by @anshbansal in #5871
  • feat(docker): Update Dockerfiles to use java 11 runtime by @pedro93 in #5853
  • Scroll issue on Glossary related entity page by @Ankit-Keshari-Vituity in #5804
  • fix(ingest): include urns in rest sink failure logs by @hsheth2 in #5848
  • fix(docker): Bumps JRE 11 to latest by @pedro93 in #5875
  • feat(ingest): support reading config file from stdin by @hsheth2 in #5847
  • fix(ingest): remove dbt delete_tests_as_datasets option by @hsheth2 in #5865
  • fix(ingest): avrogen handling for missing fields with default values by @hsheth2 in #5844
  • refactor(ingest): add ALL_ENV_TYPES constant by @hsheth2 in #5866
  • feat(cli) Make docker compose quiet by @pedro93 in #5869
  • feat(datahub-protobuf): add support for shadow jar, publish by @shirshanka in #5882
  • feat(jars): better jar versioning for datahub-client, spark-lineage and protobuf by @shirshanka in #5883
  • fix(dev-docker): set right context for frontend dev build by @szalai1 in #5885
  • fix(ci): fix jar release action dependencies by @shirshanka in #5884
  • feat(schema) Add search filter to Schema tab by @chriscollins3456 in #5845
  • feat(ui) Add ...
Read more

DataHub v0.8.44

01 Sep 04:13
5bf5fc2
Compare
Choose a tag to compare

Release Highlights

Known Issues

Standalone Kafka Consumers

We have identified that using standalone Kafka consumers (MCP/MCL messages) has been a broken feature since v0.8.44. Root cause is some spring bean dependencies that were not correctly excluded.

This has gone undetected in our testing infrastructure because our tests do not run with standalone consumers since then until recently.
The underlying issue has been fixed by #5827 and we are now running all our smoke tests with standalone consumers, since #5856 to prevent this from happening in the future. The fix will be released in v0.8.46.

[Helm] DataHub Actions Container

We recently rolled out support for running ingestion in debug mode. This requires a bump in the datahub-actions container to either HEAD (latest) or v0.0.7. The correct version is set correctly as the default in v0.2.103.

User Experience

  • Improvements to UI-based ingestion: view live logs during execution, view ingestion summary (ie. number of entities ingested), and rollback functionality. Also surfaces CLI-run ingestion jobs.
  • New look on Homepage: Domains have been promoted to the top of the fold, so they are listed above Entity cards and Platform cards
  • Improvements to searching for Looker resources - when searching for a measure or dimension, we will now surface Looks & Dashboards that reference those fields
  • The DataHub Docs Site has a new look! We are reorganizing content to make it easier and more intuitive for DataHub Developers and End-Users alike to navigate our resources.
  • Improved Error Handling on the UI - a much nicer messaging when exceptions are caught by the frontend application.
  • Misc minor bug fixes and improvements

Developer Experience

  • Eternal personal access tokens are now supported
  • Deprecated support for Python 3.6 (we expect this to have little-to-no impact on the Community based on pip download data)

Metadata Ingestion

  • Improved documentation for Domains transformer
  • Stateful Ingestion now supported for Glue
  • data-lake Source has been deprecated in favor of s3 source
  • Chart Entity now supports chartUsageStatistics
  • dbt ingestion supports auto-extracting owner from the meta block
  • Improved Snowflake Connector is now available; we expect this to provide a reduction in ingestion run-time and lower levels of complexity

What's Changed

Read more

DataHub v0.8.43

10 Aug 10:07
d20071a
Compare
Choose a tag to compare

v0.8.43

Highlights

User Experience

  • Bulk edit support - you can now add or remove Owners, Glossary Terms, Tags, Domains, Deprecation Status to multiple entities with a few clicks!
  • Improved user experience to create secrets and ingestion schedules

Developer/Community Experience

  • A new Java-based file emitter, generating a JSON file that can be used in the “File” metadata ingestion source
  • Delta Lake fixes to make it more stable and to extract table history to populate the operation aspect

Metadata Ingestion

  • When ingesting metadata from the DataHub UI, you will now see an “Ingestion Run Summary” which shows the run outcome, number of entities successfully ingested, and the ability to download logs collected during the run
  • New Dataset Domain Transformer - assign a Domain to Datasets during ingestion

Full Commit Log

What's Changed

v0.8.42

03 Aug 21:49
f1abdc9
Compare
Choose a tag to compare

v0.8.42

Highlights

User Experience

  • Improved Search Experience - preview cards now display usage and freshness information
  • Update to Schema History - incorporated Community feedback to remove “Blame” terminology
  • Improved UI-Based Ingestion - easily configure metadata ingestion from Snowflake, BigQuery, Looker, and Tableau with an easy-to-follow form; YAML is still supported!

Developer/Community Experience

  • Python 3.6 is no longer supported for ingestion – we expect this to impact fewer than 1% of DataHub users (based on PyPi download stats). Please upgrade to Python 3.7 or newer
  • Update to GitHub Issue management - issues will be marked as “Inactive” after 30 days of no activity and will be automatically closed following an additional 30 days of inactivity
  • We’ve updated our Slack Guidelines! Read them here

Metadata Ingestion

  • You can now test your Snowflake connection via the CLI and UI-based Ingestion to ensure you have proper access levels required for general ingestion, profiling, and usage. We will be expanding this functionality to other cloud-based ingestion sources in upcoming cycles.
  • Hard delete will now discover and remove soft deleted entities
  • Resolved issue of assertion error with dbt stateful ingestion

Full Commit Log

What's Changed

Read more

v0.8.41

15 Jul 15:05
6e07ec5
Compare
Choose a tag to compare

Highlights

User Experience

  • Performance improvements in the UI
  • Improvements in CSV connector for easier ingestion - description, ownership, domain support added
  • UI form for Snowflake Managed Ingestion so you don't have to make changes in YAML
  • Viewing Siblings

Developer Experience

  • Ability to stop quickstart instead of nuking
  • Customizing mapped ports in quickstart
  • New models for dashboard usage
  • Circuit breaker and python api for Assertion and Operation

Metadata Ingestion

  • Improvements in bigquery connector to only profile some tables
  • Intermittent 401 errors during ingestion fixed
  • New salesforce connector

What's Changed

Read more

v0.8.40

30 Jun 02:59
11356e3
Compare
Choose a tag to compare

Highlights

Fixes bug in 0.8.39 that prevented standalone MAE consumers from being deployed.

User Experience

Support for deleting Tags and Domains via the UI
Support for editing Domain name via the UI
Visualize Glossary Term source on the Glossary Term Entity Page

Developer Experience

Fix for issue where standalone MAE consumers could not be deployed

Metadata Ingestion

Script to re-index sibling associations for dbt nodes that had already been ingested before 0.8.39

What's Changed

Full Changelog: v0.8.39...v0.8.40

v0.8.39

24 Jun 22:28
68762a2
Compare
Choose a tag to compare

Release Highlights

Known Issues

When using stand-alone MAE consumers (mae-consumer-job) this release will not work; this has been resolved in v0.8.40.

User Experience

  • NEW: support for surfacing outcomes of dbt Tests in dataset entity pages (see it in action here)
  • NEW: Improved navigation of dbt resources: dbt models and their associated warehouse tables are now merged into a unified entity (see it here). This will automatically be enabled for all newly ingested entities. To view this for entities you have already ingested, you will need to run a restore indices job.
  • Improvement to Impact Analysis: When looking at the Lineage tab, you can now easily toggle between “Upstream” and “Downstream” entities (try it out here)

Developer Experience

  • NEW: Java Kafka Emitter – Use this when you want to decouple your metadata producer from the uptime of your datahub metadata server by utilizing Kafka as a highly available message bus

Metadata Ingestion

  • NEW: Make bulk edits to your metadata via CSV (read more)
  • Snowflake ingestion improvements: configure profiling to run only if they have been updated within the prior N days
  • Managed ingestion update: removed need for sink block

What's Changed

Read more

[!] DataHub v0.8.38

09 Jun 22:44
d05cd08
Compare
Choose a tag to compare

Notice: There is a known issue in this release. Listing access tokens for a user may not return the correct results to the UI due to an unreliable query to DataHub's search backend. This will be resolved in v0.8.39. Note that this does not mean that access tokens will not work or are in any way compromised - the functionality of generating and using access tokens is not impacted.

The below release notes are copied from v0.8.37 release notes.

Highlights

User Experience

This release comes packed full of new features and updates.

  • NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
  • NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
  • NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
  • UPDATE - Rename “Manage” navigation item to “Govern”
  • [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
  • [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
  • FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
  • Minor fixes & improvements to UI for adding policy users + groups.

Metadata Ingestion

  • Support Snowflake ingest via Oauth
  • Misc fixes and improvements to existing ingestion sources

Disclaimers:

With this upgrade, we've added a new mechanism for authenticating users: native authentication. By default, this is enabled, which will allow new users to be created by Admin and for the user to login.

If you were previously disabling BOTH JaaS (via AUTH_JAAS_ENABLED = false) AND OIDC, and you still do not want to require a username + password to login, you'll need to add a new environment variable to datahub-frontend-react container: AUTH_NATIVE_ENABLED=false.

What's Changed

Full Changelog: v0.8.37...v0.8.38